Configuring a secure, multitenant cluster for the enterprise James Kinley // Principal Solutions Architect.

Slides:



Advertisements
Similar presentations
Hive Security Yongqiang He Software Engineer Facebook Data Infrastructure Team.
Advertisements

File Server Organization and Best Practices IT Partners June, 02, 2010.
Access Control Chapter 3 Part 3 Pages 209 to 227.
CMSC 414 Computer (and Network) Security Lecture 13 Jonathan Katz.
Privileged Account Management Jason Fehrenbach, Product Manager.
Securing the Hadoop Ecosystem
Resource Management with YARN: YARN Past, Present and Future
Access Control Intro, DAC and MAC System Security.
CMSC 414 Computer and Network Security Lecture 12 Jonathan Katz.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
Advanced Databases Basic Database Administration Guide to Oracle 10g 1.
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Lecture 7 Access Control
Lecture slides prepared for “Computer Security: Principles and Practice”, 2/e, by William Stallings and Lawrie Brown, Chapter 4 “Overview”.
Authentication and authorization Access control consists of two steps, authentication and authorization. Subject Do operation Reference monitor Object.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
Understanding Active Directory
Understanding Active Directory
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
HalFILE 3.0 Active Directory Integration. halFILE 3.0 AD – What is it? Centralized organization of network objects and security – servers, computers,
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Edwin Sarmiento Microsoft MVP – Windows Server System Senior Systems Engineer/Database Administrator Fujitsu Asia Pte Ltd
MCTS Guide to Configuring Microsoft Windows Server 2008 Active Directory Chapter 6: Windows File and Print Services.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Module 12: Designing an AD LDS Implementation. AD LDS Usage AD LDS is most commonly used as a solution to the following requirements: Providing an LDAP-based.
Module 6: Designing Active Directory Security in Windows Server 2008.
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
Information Assurance Research Group 1 NSA Security-Enhanced Linux (SELinux) Grant M. Wagner Information Assurance.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
5.1 © 2004 Pearson Education, Inc. Exam Designing a Microsoft ® Windows ® Server 2003 Active Directory and Network Infrastructure Lesson 5: Planning.
© Wiley Inc All Rights Reserved. MCSE: Windows Server 2003 Active Directory Planning, Implementation, and Maintenance Study Guide, Second Edition.
Identity Solution in Baltic Theory and Practice Viktors Kozlovs Infrastructure Consultant Microsoft Latvia.
Requirements for Secure, Multi-Tenant Hadoop
Planning a Microsoft Windows 2000 Administrative Structure Designing default administrative group membership Designing custom administrative groups local.
1 © 2014 Cloudera, Inc. All rights reserved. Preventing a Big Data Security Breach.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 4 – Access Control.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
Module 14: Securing Windows Server Overview Introduction to Securing Servers Implementing Core Server Security Hardening Servers Microsoft Baseline.
Module 7 Planning and Deploying Messaging Compliance.
Permissions Lesson 13. Skills Matrix Security Modes Maintaining data integrity involves creating users, controlling their access and limiting their ability.
Academic Year 2014 Spring Academic Year 2014 Spring.
System Center Lesson 4: Overview of System Center 2012 Components System Center 2012 Private Cloud Components VMM Overview App Controller Overview.
30 April 1998IBM1 Directory Services Best Practices Ellen Stokes, Directory Architect IBM Austin
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Implementing Server Security on Windows 2000 and Windows Server 2003 Fabrizio Grossi.
1 Chapter 13: RADIUS in Remote Access Designs Designs That Include RADIUS Essential RADIUS Design Concepts Data Protection in RADIUS Designs RADIUS Design.
Computer Security: Principles and Practice
8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. BI Publisher Server: Administration and Security.
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1.
Monitoring Hive: Metrics and WebUI
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Part III BigData Analysis Tools (YARN) Yuan Xue
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Configuring the User and Computer Environment Using Group Policy Lesson 8.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
OMOP CDM on Hadoop Reference Architecture
Spark and YARN: Better Together
Access Control Model for the Hadoop Ecosystem
Chapter 10 Data Analytics for IoT
Chapter 5 : Designing Windows Server-Level Security Processes
Power BI Security Best Practices
Enterprise security for big data solutions on Azure HDInsight
IIS.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
How to Protect Big Data in a Containerized Environment
Managing Data by Using NTFS
Designing IIS Security (IIS – Internet Information Service)
ZORAN BARAC DATA ARCHITECT at CIN7
Presentation transcript:

Configuring a secure, multitenant cluster for the enterprise James Kinley // Principal Solutions Architect

2 © 2014 Cloudera, Inc. All rights reserved. About me James Kinley Principal Solutions Architect, EMEA Hadoop user since 2010 Clouderan since 2012 Background in UK defence industry and cyber security github.com/jrkinley uk.linkedin.com/in/jameskinley

3 © 2014 Cloudera, Inc. All rights reserved. Introduction: Data Hub Objectives Sharing Data better insight Sharing Compute better utilisation and performance Consolidated Operations reduced cost and complexity

4 © 2014 Cloudera and/or its affiliates. All rights reserved. Multitenancy in Hadoop refers to a set of features that enable multiple groups from within the same organisation to share the common set of resources in a cluster without negatively impacting service-levels, violating security constraints, or even revealing the existence of each other, all via policy rather than physical separation.

5 © 2014 Cloudera, Inc. All rights reserved. Multitenant Cluster Architecture Security & Governance HDFS Information Architecture (IA) Authentication Authorisation Auditing Quota management Resource Isolation & Management Static partitioning Dynamic partitioning Impala admission control PARTNER LOGO

6 © 2014 Cloudera, Inc. All rights reserved. Security & Governance HDFS Information Architecture: file and directory structure Authentication: proves users are who they say they are [Kerberos, Identity Management (LDAP)] Authorisation: determines what users can see and do [HDFS Permissions, RBAC (Apache Sentry), Encryption] Auditing: determines who did what, and when [Cloudera Navigator]

7 © 2014 Cloudera, Inc. All rights reserved. Security & Governance HDFS Information Architecture (IA) drwxr-x---+ tadmin tgroup /users/{tenantId} drwxr-x--- tadmin tgroup /users/{tenantId}/archive drwxrwx---+ tadmin hive /users/{tenantId}/warehouse drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition} drwxr-x---+ tadmin tgroup /users/{tenantId}/landing drwxrwx--- tadmin tgroup /users/{tenantId}/processing drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId} drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output

8 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Authentication: Kerberos & LDAP drwxr-x---+ tadmin tgroup /users/{tenantId} drwxr-x--- tadmin tgroup /users/{tenantId}/archive drwxrwx---+ tadmin hive /users/{tenantId}/warehouse drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition} drwxr-x---+ tadmin tgroup /users/{tenantId}/landing drwxrwx--- tadmin tgroup /users/{tenantId}/processing drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId} drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output

9 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Authorisation: HDFS permissions drwxr-x---+ tadmin tgroup /users/{tenantId} drwxr-x--- tadmin tgroup /users/{tenantId}/archive drwxrwx---+ tadmin hive /users/{tenantId}/warehouse drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition} drwxr-x---+ tadmin tgroup /users/{tenantId}/landing drwxrwx--- tadmin tgroup /users/{tenantId}/processing drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId} drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output

10 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Authorisation: HDFS extended ACLs drwxr-x---+ tadmin tgroup /users/{tenantId} drwxr-x--- tadmin tgroup /users/{tenantId}/archive drwxrwx---+ tadmin hive /users/{tenantId}/warehouse drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition} drwxr-x---+ tadmin tgroup /users/{tenantId}/landing drwxrwx--- tadmin tgroup /users/{tenantId}/processing drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId} drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output Give “tingest” user permission over the landing directory: $ hdfs dfs -setfacl -m user:tingest:rwx /users/{tenantId}/landing Give “hive” group permission over the landing directory: $ hdfs dfs -setfacl –m group:hive:rwx /users/{tenantId}/landing

11 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Authorisation: Apache Sentry (incubating) Fine-grained, role-based access control (RBAC) Users can see only the data and metadata to which they have been granted the privilege Currently works with Apache Hive, Cloudera Impala, and Cloudera Search File or Service (GRANT/REVOKE) based policy providers Role-based privilege model {user} > {groups} > {roles} > object > privilege object = {server, database, table, URI} privilege = {select, insert, all} Supports grant permission delegation for multitenant clusters

12 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Authorisation: Apache Sentry (incubating) drwxr-x---+ tadmin tgroup /users/{tenantId} drwxr-x--- tadmin tgroup /users/{tenantId}/archive drwxrwx---+ tadmin hive /users/{tenantId}/warehouse drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition} drwxr-x---+ tadmin tgroup /users/{tenantId}/landing drwxrwx--- tadmin tgroup /users/{tenantId}/processing drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId} drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output Delegate grant and revoke privilege to tenant’s admin role: > GRANT ALL ON DATABASE {db} TO ROLE {tadmin} WITH GRANT OPTION;

13 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Authorisation: Encryption Network encryption (HDFS and MR) At-rest encryption for HDFS Cloudera Navigator Encrypt & KeyTrustee (Gazzang) Project Rhino (Cloudera + Intel) HDFS-level encryption (HDFS HADOOP-10150) Encryption zones (HDFS-6386) Hardware-accelerated (HADOOP-10693)

14 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Authorisation: HDFS encryption zone drwxr-x---+ tadmin tgroup /users/{tenantId} drwxr-x--- tadmin tgroup /users/{tenantId}/archive drwxrwx---+ tadmin hive /users/{tenantId}/warehouse drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition} drwxr-x---+ tadmin tgroup /users/{tenantId}/landing drwxrwx--- tadmin tgroup /users/{tenantId}/processing drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId} drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output

15 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Governance: HDFS disk quota management Restrict tenants use of storage Prevents misuse of the shared filesystem HDFS supports two quota mechanisms Disk space quotas Name quotas

16 © 2014 Cloudera, Inc. All rights reserved. Security & Governance Governance: HDFS disk quota management drwxr-x---+ tadmin tgroup /users/{tenantId} drwxr-x--- tadmin tgroup /users/{tenantId}/archive drwxrwx---+ tadmin hive /users/{tenantId}/warehouse drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition} drwxr-x---+ tadmin tgroup /users/{tenantId}/landing drwxrwx--- tadmin tgroup /users/{tenantId}/processing drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId} drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output

17 © 2014 Cloudera, Inc. All rights reserved. Resource Isolation & Management Dividing up finite cluster resource to ensure predictable behaviour Goals: Guarantee service levels for critical workflows Support fair allocation of resources between different groups of users Prevent users from depriving other users access to the cluster

18 © 2014 Cloudera, Inc. All rights reserved. Resource Isolation & Management Static partitioning Static service pools Statically partition resource for HBase, HDFS, Impala, Search, and YARN Enforced by Linux cgroups

19 © 2014 Cloudera, Inc. All rights reserved. Resource Isolation & Management Dynamic partitioning Dynamic resource pools Dynamically apportion resource [statically] allocated to Impala and YARN Named pool of resource + scheduling policy Resource allocation based on weight User to pool placement policy ACLs SLOs (use of pre-emption)

20 © 2014 Cloudera, Inc. All rights reserved. Resource Isolation & Management Impala admission control Limits concurrent queries and memory usage Additional queries are queued Configured per pool max_requests mem_limit max_queued Avoids resource oversubscription (OOM) during heavy usage Pool placement policy mechanism same as YARN RM Use with static partitioning (independently from YARN) Or integrate with YARN for resource management via Llama

21 © 2014 Cloudera, Inc. All rights reserved. Resource Isolation & Management Classification User to pool placement rules Based on user, group, or specified tag: MR: mapreduce.job.queuename Impala: REQUEST_POOL

22 © 2014 Cloudera, Inc. All rights reserved. Resource Isolation & Management Queues YARN Max running apps Max memory Max vcores Impala admission control Max running queries Max memory Max queue size

23 © 2014 Cloudera, Inc. All rights reserved. Resource Isolation & Management Dynamic resource pools Scheduling policy Dominant Resource Fairness (DRF) Fair Scheduler (FAIR) First-in, First-out (FIFO) Recommendations: Disable undeclared pools Enable the default pool

Thank you.