Download presentation
Presentation is loading. Please wait.
Published byPierce Gibson Modified over 6 years ago
1
How to Protect Big Data in a Containerized Environment
Thomas Phelan Chief Architect, BlueData @tapbluedata
2
Outline Securing a Big Data Environment Data Protection
Transparent Data Encryption Transparent Data Encryption in a Containerized Environment Takeaways
3
In the Beginning … Hadoop was used to process public web data
No compelling need for security No user or service authentication No data security
4
Then Hadoop Became Popular
Security is important.
5
Layers of Security in Hadoop
Access Authentication Authorization Data Protection Auditing Policy (protect from human error)
6
Hadoop Security: Data Protection
Reference:
7
Focus on Data Security Confidentiality Integrity Availability
Confidentiality is lost when data is accessed by someone not authorized to do so Integrity Integrity is lost when data is modified in unexpected ways Availability Availability is lost when data is erased or becomes inaccessible Reference:
8
Hadoop Distributed File System (HDFS)
Data Security Features Access Control Data Encryption Data Replication
9
Access Control Simple Kerberos
Identity determined by host operating system Kerberos Identity determined by Kerberos credentials One realm for both compute and storage Required for HDFS Transparent Data Encryption
10
Data Encryption Transforming data
11
Data Replication 3 way replication Erasure Coding
Can survive any 2 failures Erasure Coding Can survive more than 2 failures depending on parity bit configuration
12
HDFS with End-to-End Encryption
Confidentiality Data Access Integrity Data Access + Data Encryption Availability Data Access + Data Replication
13
Data Encryption How to transform the data? Cleartext Ciphertext
Cleartext XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Ciphertext
14
Data Encryption – At Rest
Data is encrypted while on persistent media (disk)
15
Data Encryption – In Transit
Data is encrypted while traveling over the network
16
The Whole Process Ciphertext
17
HDFS Transparent Data Encryption (TDE)
End-to-end encryption Data is encrypted/decrypted at the client Data is protected at rest and in transit Transparent No application level code changes required
18
HDFS TDE – Design Goals:
Only an authorized client/user can access cleartext HDFS never stores cleartext or unencrypted data encryption keys
19
HDFS TDE – Terminology Encryption Zone
A directory whose file contents will be encrypted upon write and decrypted upon read An EZKEY is generated for each zone
20
HDFS TDE – Terminology EZKEY – encryption zone key
DEK – data encryption key EDEK – encrypted data encryption key
21
HDFS TDE - Data Encryption
The same key is used to encrypt and decrypt data The size of the ciphertext is exactly the same as the size of the original cleartext EZKEY + DEK => EDEK EDEK + EZKEY => DEK
22
HDFS TDE - Services HDFS NameNode (NN)
Kerberos Key Distribution Center (KDC) Hadoop Key Management Server (KMS) Key Trustee Server
23
HDFS TDE – Security Concepts
Division of Labor KMS creates the EZKEY & DEK KMS encrypts/decrypts the DEK/EDEK using the EZKEY HDFS NN communicates with the KMS to create EZKEYs & EDEKs to store in the extended attributes in the encryption zone HDFS client communicates with the KMS to get the DEK using the EZKEY and EDEK.
24
HDFS TDE – Security Concepts
The name of the EZKEY is stored in the HDFS extended attributes of the directory associated with the encryption zone The EDEK is stored in the HDFS extended attributes of the file in the encryption zone $ hadoop key … $ hdfs crypto …
25
HDFS Examples Simplified for the sake of clarity:
Kerberos actions not shown NameNode EDEK cache not shown
26
HDFS – Create Encryption Zone
3. Create EZKEY /encrypted_dir xattr: EZKEYNAME EZKEYNAME = KEY
27
HDFS – Create Encrypted File
1. Create file 2. Create EDEK 5. Return Success 4. Store EDEK /encrypted_dir/file encrypted data 3. Create EDEK /encrypted_dir/file xattr: EDEK
28
HDFS TDE – File Write Work Flow
/encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME read unencrypted data /encrypted_dir/file write encrypted data 4. Decrypt DEK from EDEK Return DEK
29
HDFS TDE – File Read Work Flow
/encrypted_dir/file xattr: EDEK 3. Request DEK from EDEK & EZKEYNAME /encrypted_dir/file read encrypted data write unencrypted data 4. Decrypt DEK from EDEK Return DEK
30
Bring in the Containers (i.e. Docker)
Issues with containers are the same for any virtualization platform Multiple compute clusters Multiple HDFS file systems Multiple Kerberos realms Cross-realm trust configuration
31
Containers as Virtual Machines
Note – this is not about using containers to run Big Data tasks:
32
Containers as Virtual Machines
This is about running Hadoop / Big Data clusters in containers: cluster
33
Containers as Virtual Machines
A true containerized Big Data environment:
34
KDC Cross-Realm Trust Different KDC realms for corporate, data, and compute Must interact correctly in order for the Big Data cluster to function CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals
35
KDC Cross-Realm Trust Different KDC realms for corporate, data, and compute One-way trust Compute realm trusts the corporate realm Data realm trusts corporate realm Data realm trusts the compute realm
36
KDC Cross-Realm Trust CORP.ENTERPRISE.COM Realm
COMPUTE.ENTERPRISE.COM Realm DATALAKE.ENTERPRISE.COM Realm KDC: COMPUTE.ENTERPRISE.COM KDC: DATALAKE.ENTERPRISE.COM Hadoop Cluster Hadoop Key Management Service HDFS: hdfs://remotedata/
37
Key Management Service
Must be enterprise quality Key Trustee Server Java KeyStore KMS Cloudera Navigator Key Trustee Server
38
Containers as Virtual Machines
A true containerized Big Data environment: DataLake CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals DataLake DataLake CORP.ENTERPRISE.COM End Users COMPUTE.ENTERPRISE.COM Hadoop/Spark Service Principals DATALAKE.ENTERPRISE.COM HDFS Service Principals
39
Key Takeaways Hadoop has many security layers
HDFS Transparent Data Encryption (TDE) is best of breed Security is hard (complex) Virtualization / containerization only makes it potentially harder Compute and storage separation with virtualization / containerization can make it even harder still Jason to briefly cover agenda
40
Key Takeaways Be careful with a build vs. buy decision for containerized Big Data Recommendation: buy one already built There are turnkey solutions (e.g. BlueData EPIC) Jason to briefly cover agenda Reference:
41
@tapbluedata BlueData Booth #1508 in Strata Expo Hall
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.