03 Nov / Security in Eligotech Harpoon
Security in Hadoop is really a complex matter; it involves all the moving parts composing the Hadoop eco-system, like HDFS, Impala, etc.
Eligotech Harpoon’s philosophy is to avoid any proprietary mechanisms that would introduce vendor lock-in. For that reason, securing the data in the Eligotech Harpoon platform leverages all the mechanisms that the different Hadoop components provide for protecting data.
The data in Eligotech Harpoon is organized in directories and files, stored on HDFS and additionally in databases and tables in case we use tools like Hive/Impala. The resources we want to protect from unauthorized access are the files which actually contains the data.
Without a considerable effort in terms of integration activities it’s not easy to provide a common security mechanism that works seamlessly across all the different Hadoop components, let us explain with an example:
If you decide to always store your data as files on HDFS then you can easily use the mechanisms HDFS provides: Kerberos, LDAP, HTTPS, TLS, etc. for protecting the data. Kerberos strong authentication combined with the POSIX-like rights associated to the files, is an effective way for protecting the data from being accessed by unauthorized users.
Then if you decide to access this same data by using Impala you will discover that the file-based protection is not compatible with the Impala’s security mechanisms. In fact, Impala does not support secure impersonation, so, without delving into too many details, it means that the Impala user (the user which the Impala daemons run under) needs to have read access to all the data. So, anyone getting access to Impala could read all the data regardless of the actual ownership. Fortunately, Impala provides strong mechanisms for protecting data, but uses another approach. Instead of protecting data at the file system level, Impala uses a system called Apache Sentry, which provides access control, based on the databases and tables abstractions (GRANT and REVOKE).
One of the functionalities Eligotech Harpoon is offering, is to propagate the security rules from one platform to another, so securing an Eligotech Harpoon table on HDFS will trigger the proper actions on Sentry for protecting that table when it’s accessed through Impala.
Eligotech Harpoon Security Model
Eligotech Harpoon HDFS Resources
As said before, Eligotech Harpoon organizes data in structure of databases and tables. They are ultimately directories and files under HDFS.
A database and a table have an owner, usually the user who created it, and permissions to read or write that resource. Those permissions can be also assigned to specific users and/or group of users. This allows a fine grain security mechanism and a very simple way for sharing data across multiple users.
This is the internal security model of Eligotech Harpoon, then depending on the system Eligotech Harpoon is targeting those rights will be converted on proper rights on the target systems:
- HDFS. A Eligotech Harpoon database is stored as a directory inside HDFS and its tables as subdirectories. The Eligotech Harpoon rights are converted in proper rights at the HDFS level. When the underlying HDFS is enabled to supports the access control list feature, Eligotech Harpoon will map the fine grain access control to proper HDFS ACL’s; giving table access to a single user is mapped to a proper ACL at the HDFS level. If Eligotech Harpoon works under secure impersonation, the actual user owns all the data it will create.
A pleasant consequence is the fact that the data is always consistent with the standard HDFS security model. If anyone tries to access the data outside of Eligotech Harpoon, it will be denied access by HDFS itself. The data is always safe, regardless if access is happening through Eligotech Harpoon or not.
- Impala/Apache Sentry. Eligotech Harpoon provides a mechanism for publishing tables to Impala. Publishing a Eligotech Harpoon table to Impala, means the automatic creation of a external table inside the Impala metadata repository. So, besides the generation of the proper DDL SQL statements for the external table creation, Eligotech Harpoon also generates the proper SQL statements for making that table only accessible by the people that have read access on the corresponding Eligotech Harpoon’s table. Eligotech Harpoon converts the security rules at the HDFS level in corresponding rules at the Sentry level.
Eligotech Harpoon Resources Permissions
Eligotech Harpoon not only manages resources stored under HDFS, but it also provides sophisticated mechanisms for data manipulation and discovery. All the operations a user can do on the data are translated into entities called jobs. A job in Eligotech Harpoon can be a SQL query run on Impala, a complex Map/Reduce computation, a Spark job, etc. The advanced Eligotech Harpoon’s job manager treats all these kind of jobs uniformly by providing the user with a unified interface where you can track and reschedule jobs.
A job in Eligotech Harpoon also has strong security concerns. First, a job has an owner and the job can be applied only on the data that user has read access for, second, the job can be re-run or viewed only by authorized users.
Eligotech Harpoon then treats a job as a resource similarly to what it does with databases and tables. The job has an owner and read/write rights and the owner can give read/write rights to specific users/groups.
Eligotech Harpoon Operations Security
Not only does Eligotech Harpoon protect resources, (databases, tables and jobs) against unauthorized accesses, but it can also restrict users in which operations they are able to initiate..
Eligotech Harpoon implements an RBAC based access control mechanism for controlling which operations the users can or cannot access. For each possible operation there is a corresponding permission entry, e.g.: import data, chart creation, impala publishing, run user defined plugin, etc.
Then you add these entries to a specific role.. A role identifies which operations a user can access within Eligotech Harpoon. The roles, then, can be associated either to groups of users or to single users.
How Eligotech Harpoon adapts to the different security levels of Hadoop
Eligotech Harpoon can accommodate different Hadoop security levels:
- Basic Hadoop authentication plus default group resolution
This is the default situation with simple Hadoop setups. In this case, all the data is owned by the Eligotech Harpoon user (the user running the Eligotech Harpoon’s back-end) and the access control is totally managed by Eligotech Harpoon including the management of users and groups.
- Kerberos-based Hadoop authentication plus default group resolution
Similar to 1, but in this case the Eligotech Harpoon user is strongly authenticated by Kerberos. All the user and group management is internally managed by Eligotech Harpoon and is the only owner of the resources under HDFS.
- Secure Impersonation
Another option is to configure Eligotech Harpoon to impersonate the user that logs in. All the resources will then be created, with the owner corresponding to the user that logged into Eligotech Harpoon. In this case, it’s still possible to use the internal users/groups repository however it would need to keep aligned the operating system users and groups.
In fact, by default HDFS get the groups associated to a user by querying the operating system. So, if the users and groups listed under Eligotech Harpoon are not present at the operating system level, the secure impersonation and the permission check wouldn’t work properly.
For handling this, Eligotech Harpoon provides full integration with LDAP and ActiveDirectory, sharing a common directory tree between Hadoop and Eligotech Harpoon provides a simple and effective way for sharing users and groups.
- Fully secured setup
Eligotech Harpoon can provide all its security features when Hadoop is configured in a secured way (Kerberos) and when it uses LDAP/AD for group resolution. These are the benefits:
- By combining Kerberos and secure impersonation, Eligotech Harpoon is trusted by Hadoop and all the data manipulations will be done on behalf of the actual users. Any provided auditing system (i.e. Cloudera Navigator) could also track of the users’ activities.
- By configuring HDFS for LDAP/AD based group resolution, and by configuring Impala to use LDAP/AD for user authentication allows Eligotech Harpoon to map HDFS permission rights to proper Impala’s grants and This provides full protection to the data regardless if they are accesses directly or accessed as files or tables through Impala.
As security is still a hot topic in the Hadoop world and many vendors are trying to come up with solutions, Eligotech is closely following the discussions around this topic to make sure to keep Eligotech Harpoon aligned with new developments.