How to Protect Big Data in a Containerized Environment

Slides:



Advertisements
Similar presentations
Overview Network security involves protecting a host (or a group of hosts) connected to a network Many of the same problems as with stand-alone computer.
Advertisements

Efficient Kerberized Multicast Olga Kornievskaia University of Michigan Giovanni Di Crescenzo Telcordia Technologies.
Windows 2000 Security --Kerberos COSC513 Project Sihua Xu June 13, 2014.
Akshat Sharma Samarth Shah
Authenticating Users. Objectives Explain why authentication is a critical aspect of network security Explain why firewalls authenticate and how they identify.
Spring 2000CS 4611 Security Outline Encryption Algorithms Authentication Protocols Message Integrity Protocols Key Distribution Firewalls.
This presentation will take a look at to prevent your information from being discovered by and investigator.
Chapter 7 HARDENING SERVERS.
Spring 2003CS 4611 Security Outline Encryption Algorithms Authentication Protocols Message Integrity Protocols Key Distribution Firewalls.
KerberSim CMPT 495 Fall 2004 Jerry Frederick. Project Goals Become familiar with Kerberos flow Create a simple Kerberos simulation.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 5: Active Directory Logical Design.
Protect Your Business-Critical Data in the Cloud with SoftNAS, a Full-Featured, Highly Available Solution for the Agile Microsoft Azure Platform MICROSOFT.
Identity Solution in Baltic Theory and Practice Viktors Kozlovs Infrastructure Consultant Microsoft Latvia.
Network Security. 2 SECURITY REQUIREMENTS Privacy (Confidentiality) Data only be accessible by authorized parties Authenticity A host or service be able.
Architecture Models. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
What does it mean to virtualize the Hadoop File System?
National Computational Science National Center for Supercomputing Applications National Computational Science GSI Online Credential Retrieval Requirements.
1 Kerberos – Private Key System Ahmad Ibrahim. History Cerberus, the hound of Hades, (Kerberos in Greek) Developed at MIT in the mid 1980s Available as.
Web Services Security Patterns Alex Mackman CM Group Ltd
Module 3 Planning for Active Directory®
Introduction to Active Directory
1 Chapter 13: RADIUS in Remote Access Designs Designs That Include RADIUS Essential RADIUS Design Concepts Data Protection in RADIUS Designs RADIUS Design.
Configuring, Managing and Maintaining Windows Server® 2008 Servers Course 6419A.
1 Active Directory Service in Windows 2000 Li Yang SID: November 2000.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
SECURITY. Security Threats, Policies, and Mechanisms There are four types of security threats to consider 1. Interception 2 Interruption 3. Modification.
Advanced Operating Systems Chapter 6.1 – Characteristics of a DFS Jongchan Shin.
The Derivitec Risk Portal Provides Powerful, Cost-Effective Risk Management Solutions, Powered by Azure, that Deploy in Minutes MICROSOFT AZURE ISV PROFILE:
ArcGIS for Server Security: Advanced
Protecting a Tsunami of Data in Hadoop
Jun Rao co-founder at Confluent, Inc
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Data Virtualization Tutorial… SSL with CIS Web Data Sources
Stop Those Prying Eyes Getting to Your Data
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
Security Outline Encryption Algorithms Authentication Protocols
Introduction to Distributed Platforms
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Cryptography Why Cryptography Symmetric Encryption
Radius, LDAP, Radius used in Authenticating Users
Introduction to Networking
Veeam Backup Repository
CS691 M2009 Semester Project PHILIP HUYNH
OpenNebula Offers an Enterprise-Ready, Fully Open Management Solution for Private and Public Clouds – Try It Easily with an Azure Marketplace Sandbox MICROSOFT.
Enterprise security for big data solutions on Azure HDInsight
VCE Dumps
Running on the Powerful Microsoft Azure Platform,
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Be Better: Achieve Customer Service Excellence and Create a Lean RMA and Returns Process with Renewity RMA and the Power of Microsoft Azure MICROSOFT AZURE.
CS691 M2009 Semester Project PHILIP HUYNH
Message Digest Cryptographic checksum One-way function Relevance
Auth0 Is Identity Made Simple for Developers, Built by Developers and Supported by the High Availability and Performance of Microsoft Azure MICROSOFT AZURE.
PowerHub on Microsoft Azure Enables Renewable Energy Professionals to Track and Manage Projects from a Centralized Platform Accessible Anywhere MICROSOFT.
9.2 SECURE CHANNELS Medisetty Swathy.
Data Security for Microsoft Azure
Secure Electronic Procurement of Transcripts, HRD Attestations, and Certificates of Origin, Made Easy with Myeasydocs and Power of Microsoft Azure MICROSOFT.
Unitrends Enterprise Backup Solution Offers Backup and Recovery of Data in the Microsoft Azure Cloud for Better Protection of Virtual and Physical Systems.
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
Crypteron is a Developer-Friendly Data Breach Solution that Allows Organizations to Secure Applications on Microsoft Azure in Just Minutes MICROSOFT AZURE.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Kerberos Kerberos is an authentication protocol for trusted hosts on untrusted networks.
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
Security.
Media365 Portal by Ctrl365 is Powered by Azure and Enables Easy and Seamless Dissemination of Video for Enhanced B2C and B2B Communication MICROSOFT AZURE.
IS4680 Security Auditing for Compliance
Advanced Computer Networks
Content Distribution Network
Presentation transcript:

How to Protect Big Data in a Containerized Environment Thomas Phelan Chief Architect, BlueData @tapbluedata

Outline Securing a Big Data Environment Data Protection Transparent Data Encryption Transparent Data Encryption in a Containerized Environment Takeaways

In the Beginning … Hadoop was used to process public web data No compelling need for security No user or service authentication No data security

Then Hadoop Became Popular Security is important.

Layers of Security in Hadoop Access Authentication Authorization Data Protection Auditing Policy (protect from human error)

Hadoop Security: Data Protection Reference: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_edh_overview.html

Focus on Data Security Confidentiality Integrity Availability Confidentiality is lost when data is accessed by someone not authorized to do so Integrity Integrity is lost when data is modified in unexpected ways Availability Availability is lost when data is erased or becomes inaccessible Reference: https://www.us-cert.gov/sites/default/files/publications/infosecuritybasics.pdf

Hadoop Distributed File System (HDFS) Data Security Features Access Control Data Encryption Data Replication

Access Control Simple Kerberos Identity determined by host operating system Kerberos Identity determined by Kerberos credentials One realm for both compute and storage Required for HDFS Transparent Data Encryption

Data Encryption Transforming data

Data Replication 3 way replication Erasure Coding Can survive any 2 failures Erasure Coding Can survive more than 2 failures depending on parity bit configuration

HDFS with End-to-End Encryption Confidentiality Data Access Integrity Data Access + Data Encryption Availability Data Access + Data Replication

Data Encryption How to transform the data? Cleartext Ciphertext 101011100010010001011100010100011101010101010100011101010101110 Cleartext XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Ciphertext

Data Encryption – At Rest Data is encrypted while on persistent media (disk)

Data Encryption – In Transit Data is encrypted while traveling over the network

The Whole Process Ciphertext

HDFS Transparent Data Encryption (TDE) End-to-end encryption Data is encrypted/decrypted at the client Data is protected at rest and in transit Transparent No application level code changes required

HDFS TDE – Design Goals: Only an authorized client/user can access cleartext HDFS never stores cleartext or unencrypted data encryption keys

HDFS TDE – Terminology Encryption Zone A directory whose file contents will be encrypted upon write and decrypted upon read An EZKEY is generated for each zone

HDFS TDE – Terminology EZKEY – encryption zone key DEK – data encryption key EDEK – encrypted data encryption key

HDFS TDE - Data Encryption The same key is used to encrypt and decrypt data The size of the ciphertext is exactly the same as the size of the original cleartext EZKEY + DEK => EDEK EDEK + EZKEY => DEK

HDFS TDE - Services HDFS NameNode (NN) Kerberos Key Distribution Center (KDC) Hadoop Key Management Server (KMS) Key Trustee Server

HDFS TDE – Security Concepts Division of Labor KMS creates the EZKEY & DEK KMS encrypts/decrypts the DEK/EDEK using the EZKEY HDFS NN communicates with the KMS to create EZKEYs & EDEKs to store in the extended attributes in the encryption zone HDFS client communicates with the KMS to get the DEK using the EZKEY and EDEK.

HDFS TDE – Security Concepts The name of the EZKEY is stored in the HDFS extended attributes of the directory associated with the encryption zone The EDEK is stored in the HDFS extended attributes of the file in the encryption zone $ hadoop key … $ hdfs crypto …

HDFS Examples Simplified for the sake of clarity: Kerberos actions not shown NameNode EDEK cache not shown

HDFS – Create Encryption Zone 3. Create EZKEY /encrypted_dir xattr: EZKEYNAME EZKEYNAME = KEY

HDFS – Create Encrypted File 1. Create file 2. Create EDEK 5. Return Success 4. Store EDEK /encrypted_dir/file encrypted data 3. Create EDEK /encrypted_dir/file xattr: EDEK

HDFS TDE – File Write Work Flow /encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME read unencrypted data /encrypted_dir/file write encrypted data 4. Decrypt DEK from EDEK 5. Return DEK

HDFS TDE – File Read Work Flow /encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME /encrypted_dir/file read encrypted data write unencrypted data 4. Decrypt DEK from EDEK 5. Return DEK

Bring in the Containers (i.e. Docker) Issues with containers are the same for any virtualization platform Multiple compute clusters Multiple HDFS file systems Multiple Kerberos realms Cross-realm trust configuration

Containers as Virtual Machines Note – this is not about using containers to run Big Data tasks:

Containers as Virtual Machines This is about running Hadoop / Big Data clusters in containers: cluster

Containers as Virtual Machines A true containerized Big Data environment:

KDC Cross-Realm Trust Different KDC realms for corporate, data, and compute Must interact correctly in order for the Big Data cluster to function CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals

KDC Cross-Realm Trust Different KDC realms for corporate, data, and compute One-way trust Compute realm trusts the corporate realm Data realm trusts corporate realm Data realm trusts the compute realm

KDC Cross-Realm Trust CORP.ENTERPRISE.COM Realm user@CORP.ENTERPRISE.COM COMPUTE.ENTERPRISE.COM Realm DATALAKE.ENTERPRISE.COM Realm KDC: COMPUTE.ENTERPRISE.COM KDC: DATALAKE.ENTERPRISE.COM Hadoop Cluster Hadoop Key Management Service HDFS: hdfs://remotedata/ rm@COMPUTE.ENTERPRISE.COM

Key Management Service Must be enterprise quality Key Trustee Server Java KeyStore KMS Cloudera Navigator Key Trustee Server

Containers as Virtual Machines A true containerized Big Data environment: DataLake CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals DataLake DataLake CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals

Key Takeaways Hadoop has many security layers HDFS Transparent Data Encryption (TDE) is best of breed Security is hard (complex) Virtualization / containerization only makes it potentially harder Compute and storage separation with virtualization / containerization can make it even harder still Jason to briefly cover agenda

Key Takeaways Be careful with a build vs. buy decision for containerized Big Data Recommendation: buy one already built There are turnkey solutions (e.g. BlueData EPIC) Jason to briefly cover agenda Reference: www.bluedata.com/blog/2017/08/hadoop-spark-docker-ten-things-to-know

@tapbluedata www.bluedata.com BlueData Booth #1508 in Strata Expo Hall