Relational-Based Encryption for Efficient Data Sharing on Encrypted Cloud Relational Databases
Introduction Encrypting sensitive data items in the database is necessary, especially on cloud database Cloud database service provider (SP) Company Item_IDCostWholesale_price Egask5A42fgs2S46Dg asD3j64139ASsDd3fj2 Store data on cloud Get back data Item_IDCostWholesale_price Note: SP does not have key
Introduction CryptDB, TrustedDB, Cipherbase are 3 recent encrypted cloud relational database systems supporting querying Conventional encryption scheme(s) are used to encrypt data without considering data structure – relational form For instance, all the above 3 systems use AES as the underlying encryption scheme with semantic security – (CryptDB also uses some other encryptions at the same time, e.g., OPES, Pailier system. But these methods are either less secure or slower than AES.)
Inflexibility of conventional encryption schemes, e.g., AES, in data sharing Item_IDCostWholesale_price Egask5A42fgs2S46Dg asD3j64139ASsDd3fj2 AliceSPBob Bob is my business partner, I want to let him know the wholesale price of some of my selected products. Alice’s options 1.Send decryption key to Bob But Bob is then able to see other data that are not intended to be shared (Infeasible) 2.Send decrypted data to Bob High processing cost and communication cost to Alice (Doable but expensive, baseline) Alice’s data Data in blue: to be shared with Bob
Application of data sharing 1.Alice is a company user of SP. Now, Alice hires Bob, who is a data analytics expert to perform analysis. Alice has to share some of her data with Bob 2.Alice and Bob are two business partners. They share some data for gaining advantages, e.g., more market information.
Data sharing problem Adversary Model – SP and Bob are both compromised by the same attacker, i.e., the attacker can observe everything seen by SP and Bob and control the actions by SP and Bob Sharing goal: – Alice defines a subset of data D S to be shared with Bob – Functional: Bob observes plain values of D S – Security: Bob and the attacker cannot observe plain values of data outside D S SP cannot observe any data – Performance: Low processing cost and communication cost to Alice AliceSPBob Attacker
Overview of our schemes Proposed encryption framework – relational- based encryption for supporting data sharing Basic scheme: Hash-based construction Constructions with pre-computed index – Static Index – Trapdoor-based index for arbitrary sharing
Relational-based encryption (RBE) Idea: Use individual key to encrypt each data item ABC a1b1c1 a2b2c2 ABC k 11 k 12 k 13 k 21 k 22 k 23 ABC a’ 1 b’ 1 c’ 1 a’ 2 b’ 2 c’ 2 + Plain valuesValue key tableEncrypted values To share b1 Give k 12 to Bob Bob can only decrypt b’ 1, other values are safe since Bob does not have other value keys
How to maintain the value key table? Assumption: – We assume there is only one table in the database Extension to multiple tables is obvious – Model The table has n tuples and m columns Each tuple/column has a tuple/column ID T is the set of tuple ID, C is the set of column ID Tuple/column ID is not name of tuple/column – Tuple/column ID is not supposed to be changed once generated – Update of tuple/column ID will lead to re-encryption of entire tuple/column
How to maintain the value key table? Individual key for each data item Tuple ID t1t1 t2t2 ABC Column IDc1c1 c2c2 c3c3 k 11 = ValueKeyGen(t 1, c 1, K) Master key held by Alice So, only Alice can do it Value key One-way function: one cannot recover input from output Example: encryption/one-way hash
Solution framework k 11 k 12 k 21 k 22 Tuple ID t1t1 t2t2 ABC Column IDc1c1 c2c2 c3c3 1. K = KeyGen() AliceSP Randomly generate IDs Store at SP 2. k ij = ValueKeyGen(t i, c j, K) 3. v’ ij = Enc(v ij, k ij ) v 11 v 12 v 21 v 22 v' 11 v' 12 v' 21 v' v ij = Dec(v’ ij, k ij ) 5. D S = ShareProtocol() A Protocol between Alice, Bob and SP to let Bob observe D S
Hash-based construction Assume the data to be shared with Bob D S is of relational form, i.e., DS is also a table with a set of tuple T S and a set of column C S – T S is a subset of T – C S is a subset of C We will show how to remove this assumption later, in the last scheme we propose
Hash-based construction 1. K = KeyGen() – Generate a random bitstring K B 2. k ij = ValueKeyGen(t i, c j, K) k ij = h(h(t i xor K B ) xor h(c j xor K B )) 3. v’ ij = Enc(v ij, k ij ) v’ ij = v ij xor k ij 4. v ij = Dec(v’ ij, k ij ) v ij = v’ ij xor k ij Note that k ij is ued to encrypt tuple t i at column c j only, this is known as one-time pad
Sharing protocol Alice: – For every t i in T S, compute t’ i = h(t i xor K B ), send t’ i to Bob – For every c j in C S, compute c’ j = h(c j xor K B ), send c’ j to Bob Bob – Compute value key k ij = h(t’ i xor c’ j ) – Use k ij to decrypt data
Background on construction Assumption: Random Oracle Model h is a secure hash function – One-way: cannot derive input from output – Random: the output is “random” (an adversary cannot distinguish it from a real random number) – From h(a xor b), an attacker cannot know what a or b or a xor b is Random Oracle Model is used to prove many other schemes used in practice, e.g., RSA encryption (with OAEP), RSA signature
Background on one-time pad Each key is used only once – Note: length of key should not be less than message length Perfect secrecy can be achieved – Secure even against adversaries with infinite computational power Simple function for encryption/decryption – v’ = v xor k
Security Adversary (with SP’s and Bob’s views together) – Encrypted data – t’ i = h(t i xor K B ) – c’ j = h(c j xor K B ) – Tuple ID: t i – Column ID: c j – Shared data ?? ?k 12 ?? ?v 22 v' 11 v' 12 v' 21 v' 22 Plain data Value key Encrypted data Can’t derive information about plain data or value key from only encrypted value (One-time pad) Know t i and t’ i but cannot derive K B (Random oracle model)
Optimization k ij = h(h(t i xor K B ) xor h(c j xor K B )) To encrypt/decrypt the entire table – Compute t’i = h(t i xor K B ) for all tuples – Compute c’ j = h(c j xor K B ) for all columns – k ij = h(t’ i xor c’ j ) Same for all values of the same tuple Same for all values of the same column n m mn
Cost analysis RBE-HBAES KeyGenO(1) Encryption/decryptionmn + m + n hash*mn encryption/decryption* Sharing: Alice’s computation m+nmn decryption Sharing: Alice’s communication 2m+2n (one between SP and Alice; one between Alice and Bob) 2mn values (one between SP and Alice; one between Alice and Bob) Bob’s decryption costmnNil *SHA-256 vs AES performance: similar About 100k operation per second *Any encryption function can be our hash function
Need for indexing support Problem of the basic construction – Number of tuples, n, is usually a big number – Still a high cost to Alice during data sharing The problem cannot be resolved without an index – Alice needs a way to define the sharing space D S – Number of possible combination of different tuples: 2 n – Minimum average size to denote one combination: n = lg(2 n )
Limiting sharing options by a hierarchy Assume there is a known hierarchy such that data in sharing can mostly be described by the hierarchy, e.g., share all chocolate product sales order (tuples) – Otherwise, just stick to the basic scheme B+-tree or any other structure can also be used, e.g., sharing is mostly related to time, we can use a B+-tree ordered by time All 2013 BiscuitsChocolateCandy……
Limiting sharing options by a hierarchy Alice can choose several nodes in the hierarchy (tree) – All tuples under the chosen nodes are shared with Bob – The leaf node of the hierarchy is a tuple Assume the number of nodes |N| selected by Alice is small Our idea: – Alice computes an index Δ and sends Δ to SP – In sharing, Alice “shares” with Bob with the selected nodes and Bob is able to communicate with SP and observe all descendant nodes but not any other nodes – |N| << |T S | – Alice’s cost in sharing can be significantly reduced
Index structure E(t’ 1, k n12 )E(t’ 2, k n12 )E(k n12,K) t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 Leaf level t1t1 t2t2 E(k n12, k n14 )E(k n34, k n14 )E(k n14,K) E(k n14, k n18 )E(k n58, k n18 )E(k n18,K) K: master key owned by Alice K B is part of K E: encryption, e.g., AES k nij : node key t’ i = h(t i xor K B )
Index maintenance Too cumbersome to discuss everything here Key issues – Inserting an entry: Re-encrypt the entry using the new parent node key – Deleting an entry: trivial – Bob can see the original shared data after any update – version control by SP – Avoiding Bob to see more data after an unshared entry adds to a shared node The parent node needs to re-generate the node key and re- encrypt all entries
Sharing using index AliceSPBob 1. Alice maintains an index at SP 2. Alice issues a query to SP. E.g., Alice wishes to share data of sales records of Jan SP finds out the shared nodes in the index Shared nodes: all tuples covered by these nodes are to be shared with Bob 3. Alice retrieves shared node information 4. Alice processes the shared node information and sends to Bob 5. Bob communicates with SP and decrypts all shared data
Sharing E(t’ 1, k n12 )E(t’ 2, k n12 )E(k n12,K) t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t1t1 t2t2 E(k n12, k n14 )E(k n34, k n14 )E(k n14,K) E(k n14, k n18 )E(k n58, k n18 )E(k n18,K) Shared tuples Just need to share this node
Sharing E(t’ 1, k n12 )E(t’ 2, k n12 )E(k n12,K) t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t1t1 t2t2 E(k n12, k n14 )E(k n34, k n14 )E(k n14,K) E(k n14, k n18 )E(k n58, k n18 )E(k n18,K) Shared tuples Alice retrieves E(k n14, K) Alice sends k n14 to Bob With kn14, Bob can decrypt the nodes
Sharing Bob obtains t’ 1 = h(t 1 xor K B ) t’ 2 = h(t 2 xor K B ) t’ 3 = h(t 3 xor K B ) t’ 4 = h(t 4 xor K B ) At the same time, Alice sends the same column information as hash-based construction Data decryption same as hash-based construction
Security K is never given out, E(x, K) is not useful to attacker k nij is known to Bob, i.e., attacker iff it is shared, otherwise, k nij is not derivable by Bob/attacker E(t’ 1, k n12 )E(t’ 2, k n12 )E(k n12,K) E(k n12, k n14 )E(k n34, k n14 )E(k n14,K) E(k n14, k n18 )E(k n58, k n18 )E(k n18,K)
Cost RBE-IndexRBE-HBAES KeyGenO(1) Encryption / decryption O(mn) Index construction (all tuples at once) O(n)-- Sharing: Alice’s computation O(x+m)O(m+n)O(mn) Sharing: Alice’s communication O(x + m)O(m+n)O(mn) Bob’s decryption cost O(mn) - x: number of shared nodes
Sharing of data in arbitrary form Can be described as multiple sub-tables ABC a1b1c1 a2b2c2 A a1 a2 B b1 =+ A a1 a2 B b1 h(t 1 xor K B ) h(t 2 xor K B ) h(t 1 xor K B ) h(c a xor K B )h(c b xor K B ) b2 Reveal more information in basic schemes
Naïve idea Use multiple index, each on one column E(k 11, k n12 )E(k 21, k n12 )E(k n12,K) E(k n12, k n14 )E(k n34, k n14 )E(k n14,K) E(k n14, k n18 )E(k n58, k n18 )E(k n18,K) Value key instead of row ID Significant space overhead, index maintenance overhead
Trapdoor-based index One single index for all columns The index is like a function/trapdoor RSA-based To get the value key of column B ABC a1b1c1 a2b2c2 t1t1 t2t2 Some hint h x ABC k 11 k 12 k 13 k 21 k 22 k 23 Plain data Value key k 12 k 22 Not other value keys of other columns / tuples
RBE-Trapdoor-based index 1. K = KeyGen() – Generate two big primes p, q – Set n = pq (Overloading the symbol n a bit) – Remember Φ(n) = (p-1)(q-1) SP and Bob does not know Φ(n) – RSA: Generate e relatively prime to Φ(n) d = e -1 mod Φ(n) x ed mod n = x for any x Attacker does not know Φ(n) and how to do inverse
RBE-Trapdoor-based index 2. k ij = ValueKeyGen(t i, c j, K) k ij = c j ti mod n 3. v’ ij = Enc(v ij, k ij ) v’ ij = v ij xor k ij 4. v ij = Dec(v’ ij, k ij ) v ij = v’ ij xor k ij Note: Tuple ID t i and column ID c j are stored at SP as encrypted Same as basic scheme
Index structure Alice generates a random r x to each node x – r x is not known by SP, Bob – r x is co-prime to Φ(n) r1r1 r2r2 r3r3 Leaf level r 2 t 1 mod Φ(n)|E(r 2 )r 3 t 2 mod Φ(n)|E(r 3 ) r 1 r 2 -1 mod Φ(n) | r 1 r 3 -1 mod Φ(n) | E(r 1 ) = r 1 t 1 mod Φ(n) Need r 1 -1 r 3 (decryption key of r 1 r 3 -1 )
Index maintenance A bit more complex than basic scheme – Inserting/moving an entry: simple – Deleting an entry: trivial – Bob can see the original shared data after any update – version control by SP – Avoiding Bob to see more data after an unshared entry adds to a shared node Need to go to the root of shared sub-tree. The node needs to re-generate the node key and re-generate all entries, descendant nodes not affected.
Sharing To share t 1, t 2 on column c 1 r1r1 r2r2 r3r3 c 1 r1 -1 r 2 t 1 mod Φ(n)|E(r 2 )r 3 t 2 mod Φ(n)|E(r 3 ) r 1 r 2 -1 mod Φ(n) | r 1 r 3 -1 mod Φ(n) | E(r 1 ) Attacker does not know c 1 or r 1 Summary: With this hint, attacker cannot derive value keys of other tuples/columns r 1 t 1 mod r1 Φ(n) k ij = c j ti mod n
Sharing summary Alice’s work: – Generate ca r1 -1 – Generate cb r2 -1 ABC a1b1c1 a2b2c2 r1r1 r2r2 r3r3 r 2 t 1 mod Φ(n)|E(r 2 )r 3 t 2 mod Φ(n)|E(r 3 ) r 1 r 2 -1 mod Φ(n) | r 1 r 3 -1 mod Φ(n) | E(r 1 )
Cost RBE-TBIRBE-IndexRBE-HBAES KeyGenO(1) Encryption / decryption O(mn) Index construction (all tuples at once) O(n) -- Sharing: Alice’s computation O(xm)O(x+m)O(m+n)O(mn) Sharing: Alice’s communication O(xm)O(x + m)O(m+n)O(mn) Bob’s decryption cost O(mn) - x: number of shared nodes
Integration with existing encrypted cloud relational database CryptDB – Already a family of encryption schemes, so there could be multiple copies of the same data, each encrypted by a different encryption scheme – Just use our method as another encryption to provide data sharing service to users TrustedDB, Cipherbase – Use Trusted hardware – Trusted hardware can take the role of Alice – Query computation is independent to the underlying encryption method – Replace AES by our scheme to reduce the load of trusted hardware Trusted hardware is having much less power than a usual computer
Experiment plan RBE-TBIRBE-IndexRBE-HBAES Encryption / decryption speed test High – (RSA)Low Sharing test Following index Varying sharing size, measure on Alice’s computation cost, communication cost, SP/Bob’s cost, index on X, query a<X<b Sharing test Not following index Varying sharing size, measure on Alice’s computation cost, communication cost, SP/Bob’s cost, query a<X<b and c<Y<d (may be dropped if performance is too bad) Index maintenance (B+-tree for the test) Should be efficient enough, moderate overhead Should be efficient enough, low overhead -- Dataset: TPC-? Data generator Probably 10m tuple Share about 1% to 50% May pick X as some meaningful columns, e.g., SalesDate
Expected experiment results Encryption/decryption is efficient enough, comparable to AES for basic cases Sharing cost to Alice is significantly reduced compared to AES – Even the sharing data does not follow the index structure well, hopefully, not worse than AES Support efficient update to the index
Implementation progress RBE-TBIRBE-IndexRBE-HBAES Encryption / decryption speed test Not startedStartedOptimizing
Backup Key generation Time taken: ms Data generation Time taken: ms Data encryption Time taken: ms Trial data decryption Time taken: ms Hint generation Time taken: 39.59ms Peer viewer decryption Time taken: ms Key generation Time taken: ms Data generation Time taken: ms Data encryption Time taken: ms Trial data decryption Time taken: ms Hint generation Time taken: ms Peer viewer decryption Time taken: 29.67ms #Row: 100k #Column: 20 Random data All data shared