Hortonworks. We do Hadoop. HDP with Advanced Security Comprehensive Security for Enterprise Hadoop Hortonworks. We do Hadoop.
Agenda Our approach across security pillars Component Deep Dive Questions
Security needs are changing YARN unlocks the data lake Multi-tenant: Multiple applications for data access Changing and complex compliance environment ETL of non-sensitive data can yield sensitive data 5 areas of security focus Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Summer 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters
Security in Hadoop with HDP + Argus (XA Secure) Centralized Security Administration Authentication Who am I/prove it? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion Kerberos in native Apache Hadoop HTTP/REST API Secured with Apache Knox Gateway HDFS Permissions, HDFS ACL, Audit logs in with HDFS & MR Hive ATZ-NG Wire encryption in Hadoop Open Source Initiatives Partner Solutions HDP 2.1 As-Is, works with current authentication methods HDFS, Hive and Hbase Fine grain access control RBAC Centralized audit reporting Policy and access history Future Integration Argus
Map to Nevada Energy Requirements Questions HDP Security Component End User Security LDAP Integration Kerberos, Argus (XA) Group level access Argus(XA) Multiple level of access Multiple Environments Developer Security Access control for creating tables Limit of creating scheme, creating folders
HDP w/ Advanced Security Security Features HDP w/ Advanced Security Authentication Kerberos Support ✔ Perimeter Security – For services and rest API Authorizations Fine grained access control HDFS, Hbase and Hive Role base access control Column level Permission Support Create, Drop, Index, lock, user Auditing Resource access auditing Extensive Auditing Policy auditing
HDP w/ Advanced Security Security Features HDP w/ Advanced Security Data Protection Wire Encryption ✔ Volume Encryption File/Column Encryption Partners Reporting Global view of policies and audit data Manage User/ Group mapping Global policy manager, Web UI Delegated administration
Authentication w/ Kerberos
Kerberos Primer KDC NN Client DN 5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens if access permitted NN 1. kinit - Login and get Ticket Granting Ticket (TGT) 3. Get NameNode Service Ticket (NN-ST) Client Client talks to KDC with Kerberos Library Orange line – Client to KDC communication Green line – Client to HDFS communication, does not talk to Kerberos/KDC DN 2. Client Stores TGT in Ticket Cache 4. Client Stores NN-ST in Ticket Cache 6. Read/write block given Block Access Token and block ID Client’s Kerberos Ticket Cache
Kerberos Summary Provides Strong Authentication Establishes identity for users, services and hosts Prevents impersonation on unauthorized account Supports token delegation model Works with existing directory services Basis for Authorization Strong Authentication = Password never sent over the wire
Hadoop Authentication Users authenticate with the services CLI & API: Kerberos kinit or keytab Web UIs: Kerberos SPNego or custom plugin (e.g. SSO) Services authenticate with each other Prepopulated Kerberos keytab e.g. DN->NN, NM->RM Services propagate authenticated user identity Authenticated trusted proxy service e.g. Oozie->RM, Knox->WebHCat Job tasks present delegated user’s identity/access Delegation tokens e.g. Job task -> NN, Job task -> JT/RM Strong authentication is the basis for authorization Client (User) Kerberos or Custom Name Node Data Node (Service) Kerberos Name Node Oozie (Service) Kerberos + (User) doas Job Tracker Task Name Node (User) Delegation Token
User Management Most customers use LDAP for user info LDAP guarantees that user information is consistent across the cluster An easy way to manage users & groups The standard user to group mapping comes from the OS on the NameNode Kerberos provides authentication PAM can automatically log user into Kerberos
Kerberos + Active Directory Use existing directory tools to manage users Use Kerberos tools to manage host + service principals AD / LDAP Cross Realm Trust KDC Hosts: host1@HADOOP.EXAMPLE.COM Users: smith@EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Client Authentication Hadoop Cluster
Groups Define groups for each required role Hadoop has pluggable interface Mapping from user to group not stored within Hadoop Defaults to the OS information on master node Typically driven from LDAP on Linux Existing Plugins ShellBasedUnixGroupsMapping - /bin/id JniBasedUnixGroupsMapping – system call LdapGroupsMapping – LDAP call CompositeGroupMapping – combines Unix & LDAP group mapping Strong authentication and role-based groups provide protections enabling shared clusters
Groups AD / LDAP Client Hadoop Cluster Lookup Groups User Store Access NameNode Plugin rw Access Client Hadoop Cluster
Kerberos FAQ Where do I install KDC? User Provisioning On a master type node User Provisioning Hook up to Corporate AD/LDAP to leverage existing User Provisioning Growing a cluster Provision new services and nodes in MIT KDC, copy keytabs to new nodes Is Kerberos a SPOF? Kerberos support HA, with delegation tokens the KDC load is reduced
Perimeter REST API Security Knox Gateway Overview Perimeter REST API Security
What does Perimeter Security really mean? Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall REST API REST API User Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host Gateway Hadoop Services Firewall
Enterprise Integration Why Knox? Enhanced Security Centralized Control Protect network details Partial SSL for non-SSL services WebApp vulnerability filter Central REST API auditing Service-level authorization Alternative to SSH “edge node” Simplified Access Enterprise Integration Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificate LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibility
Current Hadoop Client Model FileSystem and MapReduce Java APIs HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) Typical use of APIs is via “Edge Node” that is “inside” cluster Users SSH to Edge Node and execute API commands from shell User SSH Edge Node Hadoop
Hadoop REST APIs Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations, JDBC/ODBC over HTTP HBase HBase REST API operations Oozie Job submission and management, and Oozie administration. Learn more about Oozie. Useful for connecting to Hadoop from the outside the cluster
Hadoop REST API Security: Drill-Down Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 DMZ Firewall Firewall HBase Edge Node/Hadoop CLIs RPC LB Knox Gateway GW REST Client HTTP HTTP HTTP Hadoop Cluster 2 Node the arrows to Hadoop Cluster are simplifications Actually there will be multiple arrow – one per port open between Knox and Hadoop Services it supports (WebHDFS, WebHCAT, HiveServer2, HBase, Oozie) & more in future Masters NN HBase RM Oozie Web HCat HS2 Slaves LDAP Enterprise Identity Provider LDAP/AD DN NM
Authorization and Auditing
Authorization and Audit Fine grain access control HDFS – Folder, File Hive – Database, Table, Column HBase – Table, Column Family, Column Audit Extensive user access auditing in HDFS, Hive and HBase IP Address Resource type/ resource Timestamp Access granted or denied Flexibility in defining policies Control access into system
Central Security Administration HDP Advanced Security Delivers a ‘single pane of glass’ for the security administrator Centralizes administration of security policy Ensures consistent coverage across the entire Hadoop stack
Setup Authorization Policies file level access control, flexible definition Control permissions
Monitor through Auditing
Authorization and Auditing w/ XA XA Administration Portal Enterprise Users Legacy Tools RDBMS XA Audit Server XA Policy Server Integration API HDFS XA Plugin* Hadoop Components HBase XA Plugin Knox XA Plugin* Storm XA Plugin Hive Server2 XA Plugin* Falcon Hadoop distributed file system (HDFS) XA Plugin * - Future Integration YARN : Data Operating System
Simplified Workflow - Hive Audit Database Audit logs pushed to DB 5 1 Admin sets policies for Hive db/tables/columns XA Policy Manager XA Agent IT users access Hive via beeline command tool 2 Hive Authorizes with XAAgent User Application 3 2 Hive Server2 Users access Hive data using JDBC/ODBC HiveServer2 provide data access to users 4 29
Data Protection HDP allows you to apply data protection policy at three different layers across the Hadoop stack Layer What? How ? Storage Encrypt data while it is at rest Partners, OS level encrypt, Custom Code Transmission Encrypt data as it moves Supported in HDP 2.1 Upon Access Apply restrictions when accessed Partners, Open Source Initiatives
Points of Communication Hadoop Cluster Client Nodes Nodes 1 WebHDFS 2 DataTransferProtocol 2 DataTransfer 3 RPC 3 RPC 4 JDBC/ODBC 4 M/R Shuffle
Data Transmission Protection in HDP 2.1 WebHDFS Provides read/write access to HDFS Optionally enable HTTPS Authenticated using SPNEGO (Kerberos for HTTP) filter SSL based wire encryption RPC Communications between NNs, DNs, etc. and Clients SASL based wire encryption DTP encryption with SASL JDBC/ODBC Also available SASL based encryption Shuffle Mapper to Reducer over HTTP(S) with SSL
Data Storage Protection Encrypt at the physical file system level (e.g. dm-crypt) Encrypt via custom HDFS “compression” codec Encrypt at Application level (including security service/device) ABC DEF DEF ABC Security Service (Partner) ETL App ENCRYPT DECRYPT HDFS ABC 1a3d
Current Open Source Initiatives HDFS Encryption Transparent encryption of data at rest in HDFS via Encryption zones. Being worked in the community Dependency on Key Management Server and Keyshell Key Management Server Key Provider API Hive Column Level Encryption HBase Column Level Encryption Transparent Column Encryption, needs more testing/validation Command line Key Operations
Resources
Security Page
Hortonworks Security Investment Plans Investment themes HDP + XA Comprehensive Security for Enterprise Hadoop Previous Phases Kerberos Authentication HDFS, Hive & Hbase authorization Wire Encryption for data in motion Knox for perimeter security Basic Audit in HDFS & MR SQL Style Hive Authorization ACLs for HDFS Delivered Goals: Comprehensive Security Meet all security requirements across Authentication, Authorization, Audit & Data Protection for all HDP components XA Secure Phase Centralized Security Admin for HDFS, Hive & HBase Centralized Audit Reporting Delegated Policy Administration Delivered XA Secure Central Administration Provide one location for administering security policies and audit reporting for entire platform Future Phases Encryption in HDFS, Hive & Hbase Centralized security administration of entire Hadoop platform Centralized auditing of entire platform Expand Authentication & SSO integration choices Tag based global policies (e.g. Policy for PII) Consistent Integration Integrate with other security & identity management systems, for compliance with IT policies …all IN Hadoop