Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Protect Big Data in a Containerized Environment

Similar presentations


Presentation on theme: "How to Protect Big Data in a Containerized Environment"— Presentation transcript:

1 How to Protect Big Data in a Containerized Environment
Thomas Phelan Chief Architect, BlueData @tapbluedata

2 Outline Securing a Big Data Environment Data Protection
Transparent Data Encryption Transparent Data Encryption in a Containerized Environment Takeaways

3 In the Beginning … Hadoop was used to process public web data
No compelling need for security No user or service authentication No data security

4 Then Hadoop Became Popular
Security is important.

5 Layers of Security in Hadoop
Access Authentication Authorization Data Protection Auditing Policy (protect from human error)

6 Hadoop Security: Data Protection
Reference:

7 Focus on Data Security Confidentiality Integrity Availability
Confidentiality is lost when data is accessed by someone not authorized to do so Integrity Integrity is lost when data is modified in unexpected ways Availability Availability is lost when data is erased or becomes inaccessible Reference:

8 Hadoop Distributed File System (HDFS)
Data Security Features Access Control Data Encryption Data Replication

9 Access Control Simple Kerberos
Identity determined by host operating system Kerberos Identity determined by Kerberos credentials One realm for both compute and storage Required for HDFS Transparent Data Encryption

10 Data Encryption Transforming data

11 Data Replication 3 way replication Erasure Coding
Can survive any 2 failures Erasure Coding Can survive more than 2 failures depending on parity bit configuration

12 HDFS with End-to-End Encryption
Confidentiality Data Access Integrity Data Access + Data Encryption Availability Data Access + Data Replication

13 Data Encryption How to transform the data? Cleartext Ciphertext
Cleartext XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Ciphertext

14 Data Encryption – At Rest
Data is encrypted while on persistent media (disk)

15 Data Encryption – In Transit
Data is encrypted while traveling over the network

16 The Whole Process Ciphertext

17 HDFS Transparent Data Encryption (TDE)
End-to-end encryption Data is encrypted/decrypted at the client Data is protected at rest and in transit Transparent No application level code changes required

18 HDFS TDE – Design Goals:
Only an authorized client/user can access cleartext HDFS never stores cleartext or unencrypted data encryption keys

19 HDFS TDE – Terminology Encryption Zone
A directory whose file contents will be encrypted upon write and decrypted upon read An EZKEY is generated for each zone

20 HDFS TDE – Terminology EZKEY – encryption zone key
DEK – data encryption key EDEK – encrypted data encryption key

21 HDFS TDE - Data Encryption
The same key is used to encrypt and decrypt data The size of the ciphertext is exactly the same as the size of the original cleartext EZKEY + DEK => EDEK EDEK + EZKEY => DEK

22 HDFS TDE - Services HDFS NameNode (NN)
Kerberos Key Distribution Center (KDC) Hadoop Key Management Server (KMS) Key Trustee Server

23 HDFS TDE – Security Concepts
Division of Labor KMS creates the EZKEY & DEK KMS encrypts/decrypts the DEK/EDEK using the EZKEY HDFS NN communicates with the KMS to create EZKEYs & EDEKs to store in the extended attributes in the encryption zone HDFS client communicates with the KMS to get the DEK using the EZKEY and EDEK.

24 HDFS TDE – Security Concepts
The name of the EZKEY is stored in the HDFS extended attributes of the directory associated with the encryption zone The EDEK is stored in the HDFS extended attributes of the file in the encryption zone $ hadoop key … $ hdfs crypto …

25 HDFS Examples Simplified for the sake of clarity:
Kerberos actions not shown NameNode EDEK cache not shown

26 HDFS – Create Encryption Zone
3. Create EZKEY /encrypted_dir xattr: EZKEYNAME EZKEYNAME = KEY

27 HDFS – Create Encrypted File
1. Create file 2. Create EDEK 5. Return Success 4. Store EDEK /encrypted_dir/file encrypted data 3. Create EDEK /encrypted_dir/file xattr: EDEK

28 HDFS TDE – File Write Work Flow
/encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME read unencrypted data /encrypted_dir/file write encrypted data 4. Decrypt DEK from EDEK Return DEK

29 HDFS TDE – File Read Work Flow
/encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME /encrypted_dir/file read encrypted data write unencrypted data 4. Decrypt DEK from EDEK Return DEK

30 Bring in the Containers (i.e. Docker)
Issues with containers are the same for any virtualization platform Multiple compute clusters Multiple HDFS file systems Multiple Kerberos realms Cross-realm trust configuration

31 Containers as Virtual Machines
Note – this is not about using containers to run Big Data tasks:

32 Containers as Virtual Machines
This is about running Hadoop / Big Data clusters in containers: cluster

33 Containers as Virtual Machines
A true containerized Big Data environment:

34 KDC Cross-Realm Trust Different KDC realms for corporate, data, and compute Must interact correctly in order for the Big Data cluster to function CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals

35 KDC Cross-Realm Trust Different KDC realms for corporate, data, and compute One-way trust Compute realm trusts the corporate realm Data realm trusts corporate realm Data realm trusts the compute realm

36 KDC Cross-Realm Trust CORP.ENTERPRISE.COM Realm
COMPUTE.ENTERPRISE.COM Realm DATALAKE.ENTERPRISE.COM Realm KDC: COMPUTE.ENTERPRISE.COM KDC: DATALAKE.ENTERPRISE.COM Hadoop Cluster Hadoop Key Management Service HDFS: hdfs://remotedata/

37 Key Management Service
Must be enterprise quality Key Trustee Server Java KeyStore KMS Cloudera Navigator Key Trustee Server

38 Containers as Virtual Machines
A true containerized Big Data environment: DataLake CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals DataLake DataLake CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals

39 Key Takeaways Hadoop has many security layers
HDFS Transparent Data Encryption (TDE) is best of breed Security is hard (complex) Virtualization / containerization only makes it potentially harder Compute and storage separation with virtualization / containerization can make it even harder still Jason to briefly cover agenda

40 Key Takeaways Be careful with a build vs. buy decision for containerized Big Data Recommendation: buy one already built There are turnkey solutions (e.g. BlueData EPIC) Jason to briefly cover agenda Reference:

41 @tapbluedata BlueData Booth #1508 in Strata Expo Hall


Download ppt "How to Protect Big Data in a Containerized Environment"

Similar presentations


Ads by Google