Download presentation
Presentation is loading. Please wait.
Published byAudra Dalton Modified over 9 years ago
LH* RE : A Scalable Distributed Data Structure with Recoverable Encryption Keys 1 ( Work in Progress, Jan 09) ( Provisional Patent Appl.) Sushil JajodiaWitold LitwinThomas Schwarz George Mason U. U. Paris Dauphine Santa Clara U.
Overview A new data structure A Scalable Distributed Data Structure – LH* Family Client-side Encryption – Using one or many symmetric encryption keys – Protects the privacy of client data stored on unknown servers Hence moderately trusted by the client 2
Overview Recoverable Encryption Keys – Safely backed up in the file – Recoverable on behalf of the client – Recoverable without the client on behalf of some Authority Revocable Keys – Idem Scalable file parameters – Preserving the assurance 3
Overview Applications on: – SDDS – P2P – Clouds – Grids Enterprise Data Medical Data Social Networks 4
Overview Basic Threat Model – Client site is safe – LH* Coordinator site is safe – Data hosting organization as the whole is safe (trusted) – Network is safe while a key is backed up or recovered – No malicious intruder To decrypt some records an intruder then needs: – Break an encryption key – Break into at least k servers k is client defined parameter 5
Overview The servers to break-in for a specific record can be anywhere in the file – At locations unknown to the intruder – Changing with splits – The intruder may need to break to all the servers The effort of breaking some specific k servers may – Still not suffice to break any record Most often – Suffice only for a few records When the client uses many encryption keys 6
Overview LH* RE data record manipulation costs no more messaging than in an LH* file Key recovery cost is about that of LH* scan – Possibly 2M messages for M servers in one or several rounds Storage overhead due to encryption is negligible In practice, LH* RE file should be safe 7
Overview LH* RE could be useful for: – Organizations with multiple clients & servers Typical case today – Clients of remote storage services P2P, Grid, Cloud … computing Amazon, Google, MS, IBM… Distributed Systems need client-side encryption and key recoverability – Both not yet well handled in practice 8
Generic LH* Scalable Distributed Hash Data Structure Data are stored in buckets on Server Sites numbered 0,1,2… Applications are at Client Sites – Peer Site may be client & server Data are in records with primary keys Record can be inserted, updated, deleted, searched or scanned Record C address m is LH (C ) 9
Generic LH* Overflowing inserts generate splits moving data into new buckets (on new sites) – Splits are ordered : 0, 0, 1, 0, 1,2,3,0,1,…,2 j -1,0… LH (C) dynamically changes Client may not know the actual file state It uses only its private file state image for addressing Addressing errors may result 10
Generic LH* Any addressing error is resolved by the servers in at most two forwarding messages – Only one for LH* RS P2P Every forwarding adjusts the client image Addressing errors do not repeat All together LH* is the fastest SDDS (P2P, Grid, Cloud...) addressing scheme. 11
LH* RE Coordinator may have additional capabilities – Certifying the address of every client – Maintaining PKI over the file If the network is not safe For client identity checking – … Records are LH* records with additional client identity field I Key-based addressing is as for LH* 12
LH* RE File starts with at least K buckets – K is file parameter – Basically, K is a power of 2 Data in every record are encrypted by the client – Through some good symmetric encryption key method Much faster than known public key schemes Primary keys and I are not encrypted 13
Encryption/Decryption Client uses a cached table T (t) with N encryption keys E i Some hash h (C) chooses t for R (C) – E.g., t = h (C) = C mod N Client encrypts/decrypts the non-key data field D in R (C) using E i into D’ field – Using strong encryption AES PGP … 14
Encryption/Decryption Client forms encrypted record R ’ (C) = (C, I, t, D’) – I is provable client identity – Or any info to provide by the future requestor to access R ’ 15
Encryption/Decryption The client manipulates the encrypted record R’ (C) basically as for LH* – Key-based search, insert, delete and update However, the scan operation over the non-key field does not operate anymore – Cannot search for the content – That is the basic purpose of LH* RE 16
Encryption Key Encoding Client encodes each encryption key E – Using secret sharing with k ≤ K shares k - 1 shares are different white noises N 1.. N k-1 – There is a new set of shares for every encryption key Higher assurance than if all keys used the same set of noises Such approach remains a possibility nevertheless – Not addressed in what follows, unless stated otherwise 17
Encryption Key Encoding The k - th share value is E' = N 1 … N k-1 E – denotes X OR Each share becomes share record S j = (C j, t, I, N i ) for j = 1, k - 1 S k = (C j, t, I, E‘ ) 18
Encryption Key Encoding Client chooses each key C j by some hash LH K defined as follows: – LH K hashes N j or E’ on initial buckets 0,1…K -1 – For any j > 1 and any l < j : LH K (C j ) ≠ LH K (C l ) Here C l is previously generated key for E being encoded – Every C j is unique in the file General constraint on LH* file – Could be relaxed 19
Encryption Key Encoding Client sends each S j for storage – As usual if the network is safe – Using any reasonable protocol for safe transmission otherwise SSL… Otherwise, the snooper could guess all the shares and decode an encryption key Forwarding does not need this procedure Neither the data record manipulation 20
Encryption Key Encoding Main Property – All share records of E that client sends out for storage end up at different servers Even if they are forwarded Regardless of future splits and merges they always remain at different servers – Despite the migrations during the splits Proof : details avoided here Basis : in LH*, no splits may migrate records in different buckets into the same bucket 21
Encryption Key Encoding Example – File extends over servers (buckets) 0,1,2,…12,13 – Shares of some key end up in servers 0,3,6,11 – Coming splits may only move these shares respectively to servers distant by 2 3, 2 4, 2 5 … 6 14,22… 0 16,32… 3 19,35… 11 27… 22
Encryption Key Recovery Concerns all the encryption keys of some client I’ Requestor can be the client itself – Having lost T for any reason Requestor can be a trusted authority A – In case of disappearance of I’ Dismissal of an employee Death or incapacity of a patient …. A requests then the recovery on behalf of new client I” 23
Encryption Key Recovery Requestor basically does not know k and N It requests then the LH-like scan with the deterministic termination – Searching for any share record where for some N’ I := I’ and t ≤ N ’ Choice of N’ is arbitrary – Basically, should be large enough to be > N – Alternatively, the client may use it to prevent the flooding by the incoming replies 24
Encryption Key Recovery If the requestor knows N and k the probabilistic termination suffices – Recovery may be cheaper In practice, with high probability, probabilistic termination should usually suffice – Why ? 25
Encryption Key Recovery The requestor could be fake – E.g., Monkey in the middle Each server receiving S verifies therefore the identity of the requestor – E.g., the IP address of the client with the coordinator Unless it caches the legal addresses Or they are integral part of the I-fields – Or it verifies the signature through PKI – … 26
Encryption Key Recovery Direct requests from servers to the coordinator generate 2N messages – Heavy load for the coordinator Alternative way is to aggregate the requests at the servers Sending fewer of those to the coordinator Even a single one only As below 27
Encryption Key Recovery Every server having a child waits for the request from it Every child requests the confirmation from its father Except for server 0, every server requests the confirmations from its father – By structure of LH* all these requests end up at server 0 – Server 0 forwards the request to the coordinator 28
Encryption Key Recovery The coordinator gets a single message – Regardless of N Its reply propagates downward similarly Notice that the scheme works assuming no malicious action at server – As we do unless we state otherwise Otherwise, e.g., server 0 could send fake OK Big trouble could follow 29
Encryption Key Recovery Once the server gets OK, it starts the actual bucket scan Sends all the records found to I’ or I’’ – If the network is not safe, it uses SSL or alike Snooper could collect the shares otherwise Sends an Ack of having received S otherwise 30
Encryption Key Recovery The client – Matches the records with same t – Recovers the t-th key By of all the shares sharing t Deterministic termination guarantees that there are k such shares – Sets N = t max where t max is the maximal t received 31
Encryption Key Revocation Revocation consists of change of the encryption key for every data record of a client May happen when – Client’s T went to wrong hands – Client right to use data abruptly expired Termination of employment … 32
Encryption Scalability More encryption keys for a larger file – To offset assurance deterioration Here: the number of keys that remain undisclosed if a key gets disclosed Suffices to append new keys to T and extend the hash function Existing encryption is not affected 33
Encoding Scalability More shares per key for a larger file – To offset assurance deterioration To set k = k + 1, it suffices: – Create for every i a new noise share N k – Read any but one share record S j of the t – th key – N j := N j N k – Store updated S j – Create and store new share record S k = (C k, t, I, N k ) 34
Encoding Scalability The process may be carried out by scanning successive buckets 0,1… – Requesting from new buckets only share records whose t was not dealt with yet. – Until we re-encode the entire T 35
Performance: Messaging Cost Same as for LH* for data records manipulation Plus kN + messages to backup T Basically, about 4N messages for key recovery scan – In about log N rounds Can be (much) less messages for probabilistic termination or client address caching at the servers 36
Processing Cost Processing overhead concerns – Mainly, the (symmetric) encryption/decryption Depends on encryption scheme used – From time to time, especially initially Key generation & encoding – Sporadically Key Recovery Key Revocation This analysis is an open issue at present 37
Storage Overhead Should be O (kN) on the servers Encryption keys & thus share records should be usually small compared to data records Same for other LH* RE specific fields within each data record Storage overhead on the servers should be usually negligible Client storage for T should be O (N) – Easily OK for even millions of encryption keys in a typical RAM 38
Encryption Strength Attack 1: Any Single Server Intrusion – By an Intruder or the Administrator Accidentally or willingly Impossible to decode any encryption key One has to break the encryption keys of the data records of interest – About impossible in practice for good encryption – Difficulty compounds when the client uses multiple encryption keys LH* RE data on a server are safe in this sense 39
Encryption Strength Attack 2 : Multiple Server Intrusion to decrypt a specific data record To decode E of any data record of interest intruder has to break into at least k servers – With the shares of E Otherwise, the brute force is the only issue If M > k, to break into k or more servers does not guarantee the success with a specific record – See the example later on in this talk 40
Encryption Strength The shares searched for may be anywhere in the file N o share has any info about the location of the other shares The intruder may need to break into every server If M = k, to break into k servers suffices for the success Hence it is safer to start the file with K > k 41
Encryption Strength Attack 3 : At least any k-server intrusion to decrypt any data records The decoding of some encryption keys hence disclosure of some data is possible – But not sure The likelihood and consequences depend on file state and parameters Assurance analysis may be the tool to find out more 42
Encryption Assurance Assuming impossible to break the encryption keys by brute force, What if an intruders breaks to l servers ? Assurance Analysis Measures – Confidence that no disclosure happens – Extend of disclosure otherwise 43
Encryption Assurance Basic measures – Probability a that no record gets disclosed – Expected fraction d of the file that gets disclosed – Expected fraction that remains undisclosed – Number of records that are disclosed or remain undisclosed 44
Encryption Assurance If l < k, then a = 1 If l ≥ k, then a depends on number of servers M, on N and on bucket size b at each server – Basically, larger are N or M and smaller is b, higher is the assurance In-depth analysis remains to be done 45
Example k = 4, 1 encryption key, 16 servers Assurance a against intrusion into k servers ? Usual randomness – Servers are equally likely to be intruded a = 1 – ( 4 /16 * 3 /15 * 2/14 * 1/13 ) = 1 – 1/1820 ≈ 0.9995 Expected disclosure : d = ¼ of the file Remains undisclosed : 1 – d = ¾ of the file 46
Example Use of 2 encryption keys a (1) ≈ 1 – 2/1820 ≈ 0.999 a ( 2) = 1 – (2/1820) 2 > 0.999999 a = 1 – 2/1820 – (2/1820) 2 ≈ 0.999 Expected disclosure d ≈ 1/8 of the file Now what about using 10 keys ? a ≈ 0.99 d ≈ 1/ 4 0 And what about 100 keys ? And what if the file becomes bigger ? – e.g. M 128 47
Conclusion New data structure Let the file to be scalable and distributed Let data records to be client-side encrypted Let encryption keys to be recoverable and revocable Negligible messaging, processing and storage overhead Future work should focus on experiments & assurance analysis 48
Future work Experiments Assurance analysis Applications Variants – Server caches client addresses – Probabilistic termination for key recovery – … Larger threat model – Malicious intruder Destroying or corrupting the shares 49
Thank you for Your Attention 50
Similar presentations
© 2025 Inc.
All rights reserved.