Harpoon’s Data Layers

From an architectural standpoint Harpoon has been heavily inspired by the concept of Lambda Architecture from Nathan Marz and his great book BigData. Based  on our experience developing BigData projects, we found a recurring pattern which most of the time it’s very close to the idea of Lambda Architecture.

The various components of Hadoop plus other Hadoop based technologies can be logically organized in layers as suggested by the Lambda Architecture.


Batch layer

Contains all the technologies for analysing massive amount of data using  batch oriented approach, here the king is the Map/Reduce platform provided by Hadoop.

Search Layer

It is not actually part of the Lambda Architecture, but in Harpoon, this is the layer responsible for data indexing and it provides the fundamental search based data discovery feature.

Serving Layer

It is meant to provide a quick data access (near-realtime query access) to multiple concurrent users/applications.

The idea behind this logical organisation is the fact that most of the organisations use Hadoop for inexpensively collecting huge amount of data, performing some data set reduction by using map/reduce and publishing the results to some low latency query platform for consumption by a broader audience.

Harpoon Rest Apis

Harpoon is classically organised in a client/server relationship. The user interface is a HTML5/javascript based rich client running into the browser which talks to the server through a set of REST based APIs (JSON/HTTP). So, all the product functionalities are exposed as REST APIs as well as through a user interface.

The Harpoon’s APIs can be logically grouped in the following groups:


APIs for importing data either from any RDBMS or from files of any structure into the Harpoon’s workspace.

Index & Search

APIs for data indexing and searching, they collectively realise the search based data discovery.

Batch Analyse

APIs for defining complex jobs to be run on top of the Hadoop’s map/reduce infrastructure.


APIs for publishing data into one or more serving layers.


APIs for data indexing and searching, they collectively realize the search based data discovery.

Through this rich set of apis it’s possible to make automatic the typical workflow of data ingestion, data set reduction and publishing of results to the serving layer. Once the data has been published to a serving layer, like Impala, they are easily accessible by any third party tool like Excel, traditional BI tools, etc.

The user can manipulate the data sitting in front of its browser and a system integrator could easily integrate the Harpoon’s apis with a process engine for the full automation of the data collection, analysis and publishing process.

Harpoon provides a strict and controlled way to lay out the data on top of HDFS. All the data is also listed and registered on the Harpoon metadata repository. In this way the user is forced to organise the data following a well specified pattern. All the data is organised in terms of databases and tables. However, everything is stored using the standard Hadoop apis allowing Hadoop native applications to access the data inside the Harpoon’s workspace.

Besides being an Hadoop based BI tool, Harpoon, through its powerful REST APIs, dramatically reduces the time for integrating Hadoop inside the enterprise following the modern approach of the Lambda Architecture.

Harpoon Security

Eligotech Harpoon builds on Hadoop authentication and authorization, offering a flexible way of controlling and managing the access to Data and Operations, also fully integrated with LDAP en Active Directory providing organisations and users an easy way of managing their credentials.

All traffic between the end user’s browser and de Eligotech Harpoon application server is secure, based on HTTPS.
At Data level, Eligotech Harpoon allows a fine grain security mechanism and a very simple way for sharing data across multiple users. If I own a table I could decide to make it visible to only one specific user or group.

This is the internal security model of Harpoon, then depending on the platform Harpoon is targeting, those rights will be converted on proper rights on the target platforms:


A database is stored as a directory inside HDFS and its tables as subdirectories. The Harpoon rights are converted in proper rights at the HDFS level.

When the underlying HDFS supports the new access control list feature Harpoon will map the fine grain access control to proper HDFS ACLs: giving table access to a single user is mapped to a proper ACL at the HDFS level. If Harpoon works under secure impersonation, the actual user owns all the data it creates.

A nice consequence is the fact that the data are always consistent with the standard HDFS security model.  If anyone tries to access the data outside of Harpoon it will be stopped by HDFS itself. The data is always safe regardless they are accessed through Harpoon or not.


Harpoon provides a mechanism for publishing tables to Impala. Publishing a Harpoon’s table to Impala means the automatic creation of a external table inside the Impala metadata repository.

So, besides the generation of the proper DDL SQL statements for the external table creation Harpoon generates as well proper SQL statements for making that table only accessible by the people that have read access on the corresponding Harpoon’s table.

Harpoon, in this case, converts the rights at the HDFS level in corresponding rights at the Sentry level.

Try it!

Wanna try it?
Contact us !

Our team of experts in Big Data technologies and BI will help to identify your Big Data problem and assist you in setting up a proof of concept in your datacenter or on the cloud.

  • trance

  • techno

  • synth-pop

  • soundtrack

  • smooth-jazz

  • rock

  • rap

  • r-b

  • psychedelic

  • pop-rock

  • pop

  • new-age

  • musicians

  • metal

  • melodic-metal

  • lounge

  • jazz-funk

  • jazz

  • index.php

  • house

  • hip-hop

  • heavy-metal

  • hard-rock

  • get.php

  • electronic

  • dubstep

  • drumbass

  • downtempo

  • disco

  • country

  • clubdance

  • classical

  • chillout

  • chanson

  • breakbeat

  • blues

  • ambient

  • alternative-rock