How to Keyword-Search Securely in Cloud Storage Service Kaoru Kurosawa Ibaraki University, Japan ICISC 2014, Dec. 3-5, Chung-Ang University, Korea
Cloud Storage Service is now available ServiceProvider Amazon S3/Cloud Drive Amazon Google DriveGoogle OneDriveMicrosoft iCloudApple Dropbox and many more
We know that we should store encrypted documents. Then, we cannot even do keyword search. 3
A Searchable Symmetric Encryption (SSE) scheme solves this problem. It consists of a store phase and a search phase. 4
In the store phase, A client stores the encrypted files (or documents) and the encrypted Index on the server Client Server E(D 1 ), ⋯, E(D N ) E(Index) 5
In the search phase, The client sends an encrypted keyword to the server Client Server E(keyword) 6
The server somehow returns The encrypted files E(D 3 ), E(D 6 ), E(D 10 ) which contain the keyword Client Server E(keyword) E(D 3 ), E(D 6 ), E(D 10 ) 7
So the client can retrieve some of the encrypted files which contain a specific keyword, keeping the keyword secret Client Server E(keyword) E(D 3 ), E(D 6 ), E(D 10 ) 8
SSE has been studied by D.Song, D.Wagner, A.Perrig (2000) Eu-Jin Goh (2003) Golle, Staddon, Waters (2004) Y.Chang and M.Mitzenmacher (2005) Curtmola, Garay, Kamara and Ostrovsky (2006) Peishun Wang, Huaxiong Wang, Josef Pieprzyk (2008) Kamara, Papamanthou an Roeder (2012) Cash, Jarecki, Jutla, Krawczyk, Rosu, Steiner (2013) Cash and Tessaro (2014) 9
In this talk, UC-Secure Searchable Symmetric Encryption How to Update Documents Verifiably in Searchable Symmetric Encryption Garbled Searchable Symmetric Encryption 10
First UC-Secure Searchable Symmetric Encryption, Kaoru Kurosawa and Yasuhiro Ohtaki (FC 2012) 11
By Passive Attack A server tries to break the privacy: she tries to find the keyword and the documents Client Server E(keyword) E(D 3 ), E(D 6 ), E(D 10 ) Malicious 12
By Active Attack A server tries to break the reliability: she tries to forge and delete some files, or replace E(D 3 ) with another E(D 100 ). Client Server E(keyword) E(D 3 ), E(D 6 ), E(D 10 ) E(D 100 ) Malicious 13
Curtmola, Garay, Kamara and Ostrovsky (2006) showed a rigorous definition of security against passive attacks (privacy.) They also presented a scheme which satisfies their definition. 14
At FC 2012 PrivacyCurtmola et al. ReliabilityOur paper UC securityOur paper 15 We studied and proved that Privacy + Reliability = UC security
Curtmola et al. keywordDocuments AustinD 3, D 6, D 10 BostonD 8, D 10 WashingtonD 1, D 4, D 8 Showed an SSE scheme such as follows. Consider the following “Index” Index 16
The client first constructs E(Index) as follows. He chooses a pseudorandom permutation π. = E(Index) 17 π(1) π(2) π(3) …
He next computes π(Austin, 1), π(Austin, 2) and π(Austin, 3), Writes the indexes (3, 6, 10) in these addresses Address π(Austin, 1) π(Austin, 2) π(Austin, 3) E(Index) 18
Do the same for each keyword Address π(Austin, 1) π(Austin, 2) π(Austin, 3) π(Boston, 1) π(Boston, 2) E(Index) 19
In the store phase, The client stores this E(Index) and the ciphertext of each file to the server Client Server E(Index) E(D 1 ), ⋯, E(D N ) 20
In the search phase, The client sends a trapdoor information Client Server t(Austin)= ( π(Austin, 1), π(Austin, 2), π(Austin, 3) ) E(Index) 21
The server finds the corresponding indexes Client Server π(Austin, 1), π(Austin, 2), π(Austin, 3) E(Index) 22
and returns Client Server π(Austin, 1), π(Austin, 2), π(Austin, 3) E(D 3 ), E(D 6 ), E(D 10 ) E(Index) 23
This scheme Is secure against passive attacks. But it is not secure against active attacks. 24
This scheme Is secure against passive attacks. But it is not secure against active attacks. We will show how to make this scheme verifiable. 25
A naive approach is to add MAC to each E(D i ) ClientServer π(Austin, 1), π(Austin, 2), π(Austin, 3) E(D 3 ), MAC(E(D 3 )), E(D 6 ), MAC(E(D 6 )), E(D 10 ), MAC(E(D 10 )) The server returns these files together with their MACs 26
But a malicious server will Client π(Austin, 1), π(Austin, 2), π(Austin, 3) E(D 3 ), MAC(E(D 3 )), E(D 6 ), MAC(E(D 6 )), E(D 10 ), MAC(E(D 10 )) Malicious Replace some pair with another pair of (file, MAC) E(D 100 ), MAC(E(D 100 )) 27
The client cannot detect this cheating Client π(Austin, 1), π(Austin, 2), π(Austin, 3) E(D 3 ), MAC(E(D 3 )), E(D 6 ), MAC(E(D 6 )), E(D 10 ), MAC(E(D 10 )) Malicious Because this is a valid pair of MAC E(D 100 ), MAC(E(D 100 )) 28
In our verifiable scheme π(Austin, 1) So the server returns E(D 3 ), Tag 3 =MAC(π(Austin, 1), E(D 3 )) We include π(Austin, 1) in the input of MAC 29
This method works π(Austin, 1) E(D 3 ), Tag 3 =MAC(π(Austin, 1), E(D 3 )) Because the MAC authenticates the whole communication 30
At the store phase, The client writes such MAC values in E(Index) 3, tag3=MAC( π(Austin, 1), E(D 3 ) ) 6, tag6=MAC( π(Austin, 2), E(D 6 ) ) 10, tag10=MAC( π(Austin, 3), E(D 10 ) ) π(Austin, 1) π(Austin, 2) π(Austin, 3) E(Index) 31
For a query π(Austin, 1) E(Index) π(Austin, 1) The server returns E(D 3 ) and Tag3 3, tag3=MAC( π(Austin, 1), E(D 3 ) ) 6, tag6=MAC( π(Austin, 2), E(D 6 ) ) 10, tag10=MAC( π(Austin, 3), E(D 10 ) ) 32
The client checks the validity of π(Austin, 1) tag3=MAC( π(Austin, 1), E(D 3 ) ) E(D 3 ) 33
We next consider the definition of security. The security against active attacks consists of privacy and reliability We define privacy similarly to Curtmola et al. as follows. 34
Minimum Leakage In the store phase, E(D 1 ), ⋯, E(D N ), E(Index) the server learns |D 1 |, …, |D N | and |{keywords}| 35
In the search phase, This means that the server knows the corresponding indexes {3, 6, 10} For t(keyword), the server returns t(keyword) C(keyword)=( E(D 3 ), E(D 6 ), E(D 10 ) ) Tag 36
We call these information |D 1 |, …, |D N | and |{keywords}| corresponding indexes {3, 6, 10} The minimum leakage 37
The Privacy definition requires that the server should not be able to learn any more information 38
The Privacy definition requires that the server should not be able to learn any more information To formulate this, we consider a real game and a simulation game 39
In the Real Game D = {D 1, …, D N } W={set of keywords} Index Distinguisher C= { E(D 1 ), ⋯, E(D N ) } I= E{ Index } Challenger 40
In the search phase keyword Distinguisher t(keyword) Challenger 41
Repeat keyword Distinguisher t(keyword) Challenger 42
Finally keyword Distinguisher t(keyword) Challenger b=0 or 1 43
In the Simulation Game D = {D 1, …, D N } W={set of keywords} Index Distinguisher Somehow computes the ciphertexts C= { E(D 1 ), ⋯, E(D N ) } I= E{ Index } ChallengerSimulator the minimum leakage |D 1 |, …, |D N | and |{keywords}| 44
In the search phase, keyword Distinguisher Somehow computes t(keyword) ChallengerSimulator the minimum leakage {3, 6, 10} 45
Repeat keyword Distinguiher Somehow computes t(keyword) ChallengerSimulator {3, 6, 10} 46
Finally keyword Distinguisher t(keyword) ChallengerSimulator {3, 6, 10} b=0 or 1 47
We say that Privacy is satisfied if there exists a simulator such that the real game ≈ the simulation game 48
This Def. of privacy Was given by Curtmola et al. But it looks artificial. Who is the distinguisher ? 49
Server ? No. Client ? No. D = {D 1, …, D N } W={set of keywords} Index Distinguisher C= { E(D 1 ), ⋯, E(D N ) } I= E{ Index } Challenger 50
This question will be resolved When we consider UC security. From a view point of UC security, this is a very natural Def. of privacy. We will come back to this point later. 51
The client sends t(keyword) The honest server returns C(keyword)={E(D 3 ), E(D 6 ), E(D 10 )} Tag Next Reliability 52
We say that Reliability is satisfied if no server can forge (C(keyword)*, Tag*) such that C(keyword)* ≠ C(keyword) 53
By the way, Even if a protocol Σ is secure in stand-alone, it may not be secure if Σ is executed concurrently, Or if Σ is a part of a large protocol Client 1 Client 2 Server 54 Σ Σ
Universal Composability (UC) Is a framework which guarantees that A protocol Σ is secure Even if it is executed concurrently, and Even if it is a part of a large protocol 55
The notion of UC was introduced by Canetti. He proved that UC-security is maintained under a general protocol composition. 56
We formulated the UC security of verifiable SSE scheme. To do so, we defined the ideal functionality F vSSE as follows. 57
In the ideal world, dummy Client Ideal Functionality F vSSE Environment Z D={D 1, …, D N } W={set of keywords} Index 58
The dummy client relays them to F vSSE dummy Client Ideal Functionality F vSSE Environment Z D={D 1, …, D N } W={set of keywords} Index D={D 1, …, D N } W={set of keywords} Index 59
F vSSE keeps them dummy Client Ideal Functionality F vSSE Environment Z D={D 1, …, D N } W={set of keywords} Index UC adversary S 60
and sends the minimum leakage dummy Client Ideal Functionality F vSSE Environment Z D={D 1, …, D N } W={set of keywords} Index UC adversary S |D 1 |, …, |D N | |{keywords}| 61
In the search phase dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S 62
The dummy client relays it to F vSSE dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S 63
F vSSE sends the minimum leakage dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10} 64 D={D 1, …, D N } W={set of keywords} Index
The UC adversary S returns dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10} Accept or Reject 65 D={D 1, …, D N } W={set of keywords} Index
If S returns Reject, dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10} Reject 66
F vSSE sends Reject to the dummy client dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10} Reject 67
The dummy client relays it to Z dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10} Reject 68
If S returns Accept, dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10}Accept 69 D={D 1, …, D N } W={set of keywords} Index
F vSSE sends {D 3,D 6,D 10 } dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10}Accept {D 3,D 6,D 10 } 70 D={D 1, …, D N } W={set of keywords} Index
The dummy client relays them to Z dummy Client Ideal Functionality F vSSE Environment Z keyword UC adversary S {3,6,10}Accept {D 3,D 6,D 10 } 71
This is an ideal world Because (Correctness.) The dummy client receives {D 3,D 6,D 10 } correctly, or outputs Reject. (Security.) The UC adversary S learns only the minimum leakage. 72
Further S can corrupt dummy Client Ideal Functionality F vSSE Environment Z UC adversary S dummy Server 73 corrupt
Also Z can interact with S freely dummy Client Ideal Functionality F vSSE Environment Z UC adversary S dummy Server 74 corrupt
Z finally outputs 0 or 1 dummy Client Ideal Functionality F vSSE Environment Z UC adversary S dummy Server 75 corrupt
In the real world Client Server Environment Z D={set of documents} W={set of keywords} Index 76
Client Server Environment Z D, W, Index Then the client and the server run the store phase. 77
In the search phase Client Server Environment Z keyword 78
Client Server Environment Z keyword The client and the server run the search phase 79
Then the client sends D 3, D 6, D 10 to Z Client Server Environment Z keyword D 3, D 6, D 10 80
An adversary A can corrupt Client Server Environment Z Adversary A 81 corrupt
Further Z can interact with A freely Client Server Environment Z Adversary A 82 corrupt
Z finally outputs 0 or 1 Client Server Environment Z Adversary A 83 corrupt
We say that A verifiable SSE scheme is UC-secure if for any adversary A, there exists a UC-adversary S such that the real world ≈ the ideal world. 84
Equivalence (Our Theorem) A verifiable SSE scheme is UC-secure if and only if it satisfies privacy and reliability Here we consider non-adaptive adversaries. 85
Proof 86 Client Server Environment Z Adversary A keywordDocuments AustinD 3, D 6, D 10 BostonD 8, D 10 WashingtonD 1, D 4, D 8 D, W, In the real world,
The client sends Client Server Environment Z Adversary A 87 keywordDocuments AustinD 3, D 6, D 10 BostonD 8, D 10 WashingtonD 1, D 4, D 8 These ciphertexts E(D 1 ), …, E(D 10 ), E(Index) D, W,
Suppose that the adversary A Client Server Environment Z Adversary A 88 keywordDocuments AustinD 3, D 6, D 10 BostonD 8, D 10 WashingtonD 1, D 4, D 8 E(D 1 ), …, E(D 10 ), E(Index) corrupts D, W,
And sends these ciphertexts to Z Client Server Environment Z Adversary A 89 keywordDocuments AustinD 3, D 6, D 10 BostonD 8, D 10 WashingtonD 1, D 4, D 8 E(D 1 ), …, E(D 10 ), E(Index) corrupts E(D 1 ), …, E(D 10 ), E(Index) D, W,
In the Real Game of Privacy D, W, Index Distinguisher C= { E(D 1 ), ⋯, E(D N ) } I= E{ Index } Challenger 90
In the UC framework, let Client Server Environment Z Adversary A 91 E(D 1 ), …, E(D 10 ), E(Index) corrupts E(D 1 ), …, E(D 10 ), E(Index) challenger D, W, Index distinguisher
Equivalent to the real game of privacy Client Server Environment Z Adversary A 92 E(D 1 ), …, E(D 10 ), E(Index) corrupts E(D 1 ), …, E(D 10 ), E(Index) challenger D, W, Index distinguisher
In the ideal world dummy Client Ideal Functionality F vSSE Environment Z UC adversary S |D 1 |, …, |D N | |{keywords}| 93 relay D, W, Index
S must be able to send dummy Client Ideal Functionality F vSSE Environment Z UC adversary S |D 1 |, …, |D N | |{keywords}| 94 relay E(D 1 ), …, E(D 10 ), E(Index) D, W, Index
In the Simulation Game of Privacy D = {D 1, …, D N } W={set of keywords} Index Distinguisher Somehow computes C= { E(D 1 ), ⋯, E(D N ) } I= E{ Index } ChallengerSimulator the minimum leakage |D 1 |, …, |D N | and |{keywords}| 95
In the UC framework, let dummy Client Ideal Functionality F vSSE Environment Z UC adversary S |D 1 |, …, |D N | |{keywords}| 96 relay E(D 1 ), …, E(D 10 ), E(Index) challenger D, W, Index distinguisher simulator
Equivalent to the Sim. game of privacy dummy Client Ideal Functionality F vSSE Environment Z UC adversary S |D 1 |, …, |D N | |{keywords}| 97 relay E(D 1 ), …, E(D 10 ), E(Index) challenger D, W, Index distinguisher simulator
The proof of the equivalence proceeds in this way. 98
The proof of the equivalence proceeds in this way. At the first glance, the Def. of privacy looked artificial. But as we have seen now, it is very natural from a view point of UC 99
Lesson SSE is a good example to understand the notion of UC security. 100
Theorem Our scheme satisfies privacy and reliability if E is CPA secure and MAC is unforgeable 101
Corollary Our scheme is UC-secure. 102
Next How to Update Documents Verifiably in Searchable Symmetric Encryption, Kaoru Kurosawa and Yasuhiro Ohtaki (CANS 2013) 103
Kamara, Papamanthou and Roeder (2012) showed a dynamic SSE scheme such that the client can add, delete and modify the documents. However, their scheme is not verifiable.
Our contribution VerifiabileDynamic Curtmola et al.XX Our FC 2012 schemeOX Kamara et al.XO Our scheme of CANS 2013 OO
First we show A more efficient SSE cheme than Curtmola et al. and A more efficient verifiable SSE scheme than our FC 2012 scheme
Consider this example D1D2D3D4D5 Austin10101 Boston01010 Washingt on 11100
In our SSE scheme E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) The client computes where PRF means pseudorandom function.
and adds E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston) +PRF’(Washington)
The client stores this table E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston) +PRF’(Washington) The server
In the search pahse, E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston) The client sends
The server decrypts (10101) E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101)1) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston)
and returns E(D 1 ), E(D 3 ) and E(D 5 ) E(D 1 )E(D2)E(D 3 )E(D4)E(D 5 ) PRF(Austin)( 10101)1) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston)
In our verifiable SSE scheme, E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston) the client stores this table together with Tag A =MAC( PRF(Austin), E(D 1 ), E(D 3 ), E(D 5 ) ) Tag B =MAC(PRF(Boston), E(D 2 ), E(D 4 )) Tag W =MAC(PRF(Washington), E(D 1 ), E(D 2 ), E(D 3 ))
In our verifiable SSE scheme, E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101)1) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston) the client stores this table where Tag A =MAC( PRF(Austin), E(D 1 ), E(D 3 ), E(D 5 ) ) and so on
In the search phase, E(D 1 ), E(D 3 ), E(D 5 ), Tag A PRF(Austin) and PRF’(Austin)
The client accepts if E(D 1 ), E(D 3 ), E(D 5 ), Tag A =MAC(PRF(Austin), E(D 1 ), E(D 3 ), E(D 5 )) PRF(Austin) and PRF’(Austin)
Theorem The above verifiable SSE scheme satisfies privacy and reliability if E is CPA-secure, PRF and PRF’ are psuedorandom functions and MAC is unforgeable.
Now suppose that E(D 1 )E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston) The client wants to modify D 1 to D′ 1 D 1 contains Austin and Washington
Therefore in the update phase E(D1)E(D2)E(D3)E(D4)E(D5) PRF(Austin)( 10101) PRF(Boston)( 01010) PRF(Washing ton) ( 11100) +PRF’(Austin) +PRF’(Boston) the client must update E(D1) Tag A Tag W
We want to do this more efficiently In the proposed scheme, we break this part (PRF(Austin), E(D 1 ), E(D 3 ), E(D 5 )) down to (PRF(Austin), 1,3,5) (1, E(D 1 )) … (5, E(D 5 ))
The client authenticates each piece separately (PRF(Austin), 1,3,5) (1, E(D 1 )) … separately (5, E(D 5 ))
The last problem is How to timestamp on these (1, E(D 1 )) … (5, E(D 5 )) Remember that the client wants to update files.
We can solve this problem by using any authentication scheme which has the timestamp functionality such as – Merkle hash tree – Authenticated skip list – RSA accumulator (in this talk)
Let x 1 = H(1, E(D 1 )) x 2 = H(2, E(D 2 )) x 3 = H(3, E(D 3 )) x 4 = H(4, E(D 4 )) x 5 = H(5, E(D 5 )) A = g mod N(=pq) x 1 x 2 x 3 x 4 x 5 For simplicity, suppose that x 1 ~ x 5 are primes. Then the client computes and keeps A.
In the search phase Tag1 =MAC(PRF(Austin), 1,3,5) y= g x2 ・ x4 mod N (1,E(D 1 )), (3,E(D 3 )), (5,E(D 5 )), PRF(Austin) and PRF’(Austin)
In the search phase Tag1 =MAC(PRF(Austin), 1,3,5) y= g x2 ・ x4 mod N (1,E(D 1 )), (3,E(D 3 )), (5,E(D 5 )), PRF(Austin) and PRF’(Austin) The client verifies that Tag1 =MAC(PRF(Austin), 1,3,5) A= y x1 ・ x3 ・ x5 mod N
In the search phase Tag1 =MAC(PRF(Austin), 1,3,5) y= g x2 ・ x4 mod N (1,E(D 1 )), (3,E(D 3 )), (5,E(D 5 )), PRF(Austin) and PRF’(Austin) The client verifies that Tag1 =MAC(PRF(Austin), 1,3,5) A= y x1 ・ x3 ・ x5 mod N ( = g x1 … x5 mod N )
In the update phase, To modify D 1 to D 1 ’ the client sends only (1, E(D 1 ’)) to the server.
He then updates A to where x 1 ’= H(1, E(D 1 ’)) A’= g mod N(=pq) x 1 ’x 2 x 3 x 4 x 5
To delete D 1 Modify D 1 to D 1 ’=delete.
How to add files Please see the paper.
We defined the UC security of verifiable dynamic SSE schemes
We then proved that The proposed scheme is UC-secure against non-adaptive adversaries under the strong RSA assumption if – E is CPA-secure – PRF and PRF’ are pseudorandom functions – H is a collision-resistant hash function
Finally Garbled Searchable Symmetric Encryption Kaoru Kurosawa (FC 2014) 135
So far, I have talked about single keyword search SSE schemes.
Next I will talk about multiple keyword search SSE schemes.
Golle, Staddon and Waters (2004) showed a multiple keyword SSE scheme which has keyword fields. FromToSubject D1Keyword 1Keyword 2Keyword 4 D2Keyword 2Keyword 1Keyword 5 D3Keyword 3Keyword 2Keyword 6
Golle, Staddon and Waters (2004) A client can specify at most one keyword in each keyword field. FromToSubject D1Keyword 1Keyword 2Keyword 4 D2Keyword 2Keyword 1Keyword 5 D3Keyword 3Keyword 2Keyword 6
In such a scheme, however, It’s hard to retrieve files which contain both Alice and Bob somewhere in the keyword fields FromToSubject D1AliceBobKeyword 4 D2BobKeyword 5Alice D3Keyword 3Keyword 2Keyword 6
Wang et al. (2008) Showed a keyword field free SSE scheme But it works only for AND search.
Cash et al. (CRYPTO 2013) showed a keyword field free SSE scheme which can support any search formula (in the random oracle model).
However, the search formula is revealed to the server and the search phase requires 2 rounds. Search formula Search phaseSearch formula secrecy Wang et al.only AND 1 round No Cash et al.Any2 roundsNo
At FC 2014, I showed an SSE scheme such that even the search formula is kept secret. Search formula Search phase Search formula secrecy Wang et al.only AND1 roundNo Cash et al.Any2 roundsNo ProposedAny1 roundYes
Also, it can support any search formula and the search phase requires only 1 round. Search formula Search phase Search formula secrecy Wang et al.only AND1 roundNo Cash et al.Any2 roundsNo ProposedAny1 roundYes
The proposed SSE scheme is based on Yao’s garbled circuit.
Yao (1982) constructed A secure two-party protocol by using a garbled circuit and an oblivious transfer. AliceBob GC + OT x y f(x,y)
Since then, garbled circuits have found many applications: multi-party secure protocols, one-time programs, KDM-security, verifiable computation, homomorphic computations and others.
The proposed scheme is the first application of garbled circuits to SSE
A garbled circuit of f is an encoding garble(f) such that one can compute f(X) from garble(f) and label(X) without learning anything on f and X. garble(f) label(X) f(X)
However, if garble(f) or label(X) is reused, then some information on (f, X) is leaked. garble(f) label(X) f(X)
Recently Goldwasser et al. constructed a scheme such that garble(f) can be reused I constructed a scheme such that label(X) can be reused and applied it to multiple keyword SSE
High level overview of the proposed scheme w1w1 w2w2 w3w3 D1D1 111 D2D2 100 keywords files Consider this example.
Let w1w1 w2w2 w3w3 D1D1 (111)=X 1 D2D2 (100)=X 2
The client computes w1w1 w2w2 w3w3 D1D1 label(X 1 ) D2D2 label(X 2 )
The client also computes PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 )
and sends PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 ) Server
In the 1 st search phase, Suppose that the client wants to search on f(w 1,w 2,w 3 )=w 1 ⋀ w 2 ⋀ w 3 He computes the garbled circuits of f: Γ 1 for D 1 and Γ 2 for D 2.
PRF(w 1 ), …, PRF(w 3 ) Γ 1 Γ 2 counter=1 The client sends
PRF(w 1 ), …, PRF(w 3 ) Γ 1 Γ 2 counter=1 The server has this table PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 )
PRF(w 1 ), …, PRF(w 3 ) Γ 1 Γ 2 counter=1 The server computes f(X 1 ) from PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 ) counter=1, label(X 1 ) Γ1Γ1 f(X 1 )=1 garbled circuit
PRF(w 1 ), …, PRF(w 3 ) Γ 1 Γ 2 counter=1 Similarly she computes f(X 2 ) PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 ) Γ2Γ2 counter=1 label(X 2 ) f(X 2 )=0 garbled circuit
The server returns E(D 1 ) Since f(X 1 )=1 and f(X 2 )=0,
In the 2nd search phase, Suppose that the client wants to search on g(w 1,w 2,w 3 )=w 1 ⋁ w 2 ⋁ w 3 He computes the garbled circuits of g: Δ 1 for D 1 and Δ 2 for D 2.
PRF(w 1 ), …, PRF(w 3 ) Δ 1 Δ 2 counter=2 The client sends
and returns E(D 1 ), E(D 2 ) The server computes g(X 1 )=g(X 2 )=1,
Note that label(X 1 ) is reused for Γ 1 and Δ 1 label(X 1 ) Γ1Γ1 Δ1Δ1 f(X 1 )=1 g(X 1 )=1
and label(X 2 ) is reused for Γ 2 and Δ 2 label(X 2 ) Γ2Γ2 Δ2Δ2 f(X 2 )=0 g(X 2 )=1
More details Bellare et al. (2012) defined Kurosawa ( 2014 ) extended them to garbling schemesextended garbling schemes Input-circuit privacylabel reusable privacy
The difference is that counter is included in the extended GC generation algorithm (eGC.gen) and in the extended GC evaluation algorithm (eGC.eval)
XOR AND 1 OR This is a Boolean circuit f
This is the topological circuit f-
Label.gen algorithm chooses 2 random strings (v i 0, v i 1 ) for each wire i such that the lsbs are different: lsb(v i 0 ) ≠ lsb(v i 1 ) XOR AND v 1 0, v 1 1 OR v 2 0, v 2 1 v 3 0, v 3 1 v 4 0, v 4 1
label(0000) is XOR AND v 1 0, v 1 1 OR v 2 0, v 2 1 v 3 0, v 3 1 v 4 0, v 4 1 this vector.
label(1111) is XOR AND v 1 0, v 1 1 OR v 2 0, v 2 1 v 3 0, v 3 1 v 4 0, v 4 1 this vector.
eGC.gen algorithm takes XOR AND v 1 0, v 1 1 OR v 2 0, v 2 1 v 3 0, v 3 1 v 4 0, v 4 1 eGC.gen counter a boolean circuit f All the strings
and outputs a garbled circuit Γ XOR AND v 1 0, v 1 1 OR v 2 0, v 2 1 v 3 0, v 3 1 v 4 0, v 4 1 eGC.gen counter Γ a boolean circuit f All the strings
eGC.eval algorithm takes v10v10 v20v20 v31v31 v41v41 eGC.eval counter the topological circuit f- label(0011), for example GC Γ
and outputs f(0,0,1,1) v10v10 v20v20 v31v31 v41v41 eGC.eval counter the topological circuit f- label(0011), for example GC Γ f(0,0,1,1)
Label reusable privacy (informal) Even if label(x 1, …, x n ) = (v 1 x1, …, v n xn ) is reused for multiple garbled circuits Γ 1, Γ 2, …., no information on (x 1, …, x n ) and (f 1,f 2, … ) are leaked, where Γ i is a garbled circuit of f i
Our construction of the extended garbling scheme which satisfies label reusable privacy is the same as the usual construction of the garbling scheme except for that counter is included in the hash function H.
For simplicity, consider f(x 1,x 2 ) f(x 1,x 2 ) v 1 0, v 1 1 v 2 0, v 2 1 Each input wire has two labels
eGC.gen algorithm computes y 00 =H(counter, v 1 0, v 2 0 ) ⊕ f(0,0) y 01 =H(counter, v 1 0, v 2 1 ) ⊕ f(0,1) y 10 =H(counter, v 1 1, v 2 0 ) ⊕ f(1,0) y 11 =H(counter, v 1 1, v 2 1 ) ⊕ f(1,1)
Note that this part works as one-time pad y 00 =H(counter, v 1 0, v 2 0 ) ⊕ f(0,0) y 01 =H(counter, v 1 0, v 2 1 ) ⊕ f(0,1) y 10 =H(counter, v 1 1, v 2 0 ) ⊕ f(1,0) y 11 =H(counter, v 1 1, v 2 1 ) ⊕ f(1,1)
Roughly speaking, the garbled circuit Γ is a random permutation of (y 00, …, y 11 ). y 00 =H(counter, v 1 0, v 2 0 ) ⊕ f(0,0) y 01 =H(counter, v 1 0, v 2 1 ) ⊕ f(0,1) y 10 =H(counter, v 1 1, v 2 0 ) ⊕ f(1,0) y 11 =H(counter, v 1 1, v 2 1 ) ⊕ f(1,1)
More precisely lsb(v 1 0 )lsb(v 2 0 )y 00 lsb(v 1 0 )lsb(v 2 1 )y 01 lsb(v 1 1 )lsb(v 2 0 )y 10 lsb(v 1 1 )lsb(v 2 1 )y 11 Construct this table
If lsb(v 1 0 )=0, 0lsb(v 2 0 )y 00 0lsb(v 2 1 )y 01 1lsb(v 2 0 )y 10 1lsb(v 2 1 )y 11 then the 1st column is
If lsb(v 2 0 )=1 01y 00 00y 01 11y 10 10y 11 then the 2 nd column is
Then permute the rows in such a way that (00) ~ (11) appear here
00y 01 01y 00 10y 11 11y 10 The garbled circuit Γ is these 4 bits
eGC.eval algorithm takes counter eGC.eval label(11)= (v 1 1, v 2 1 ) y 01 =H(counter,v 1 0, v 2 1 ) ⊕ f(01) y 00 =H(counter, v 1 0, v 2 0 ) ⊕ f(00) y 11 =H(counter, v 1 1, v 2 1 ) ⊕ f(11) y 10 =H(counter, v 1 1, v 2 0 ) ⊕ f(10) T he garbled circuit Γ
Since lsb(v 1 1 )= 1, lsb(v 2 1 )=0 counter eGC.eval y 01 =H(counter,v 1 0,v 2 1 ) ⊕ 0 y 00 =H(counter, v 1 0, v 2 0 ) ⊕ 0 y 11 =H(counter, v 1 1, v 2 1 ) ⊕ f(11) y 10 =H(counter, v 1 1, v 2 0 ) ⊕ look at the 3 rd row of Γ (v 1 1, v 2 1 )
Then we can compute f(1,1) from the given inputs counter eGC.eval y 01 =H(counter,v 1 0,v 2 1 ) ⊕ 0 y 00 =H(counter, v 1 0, v 2 0 ) ⊕ 0 y 11 =H(counter, v 1 1, v 2 1 ) ⊕ f(11) y 10 =H(counter, v 1 1, v 2 0 ) ⊕ 0 garbled circuit Γ f(1,1) label(11)= (v 1 1, v 2 1 )
Theorem The above construction satisfies label reusable privacy in the random oracle model
How to Apply Extended Garbling Scheme to Multiple Keyword SSE w1w1 w2w2 w3w3 D1D1 e 11 =1e 12 =1e 13 =1 D2D2 e 21 =1e 22 =0e 23 =0 Consider this example
The client computes v 11 0 =AES k (1,1,0) v 11 1 =AES k (1,1,1) w1w1 w2w2 w3w3 D1D1 e 11 =1e 12 =1e 13 =1 D2D2 e 21 =1e 22 =0e 23 =0
Since e 11 =1, let v 11 0 =AES k (1,1,0) v 11 = v 11 1 =AES k (1,1,1) w1w1 w2w2 w3w3 D1D1 e 11 =1e 12 =1e 13 =1 D2D2 e 21 =1e 22 =0e 23 =0
In this way, w1w1 w2w2 w3w3 D1D1 v 11 =v 11 1 v 12 =v 12 1 v 13 =v 13 1 D2D2 v 21 =v 21 1 v 22 =v 22 0 v 23 =v 23 0 the client computes each entry of this table.
Let w1w1 w2w2 w3w3 D1D1 (v 11 v 12 v 13 )=label(X 1 ) D2D2 (v 21 v 22 v 23 )=label(X 2 )
Namely for D 1, The client generates these strings by using AES (v 11 0, v 11 1 ), (v 12 0, v 12 1 ), (v 13 0, v 13 1 )
and chooses each element of label(X 1 ) from (111) (v 11 0, v 11 1 ), (v 12 0, v 12 1 ), (v 13 0, v 13 1 ) label(X 1 )=(v 11, v 22, v 33 ) w1w1 w2w2 w3w3 D1D1 111 D2D2 100
Similarly for D 2, The client generates these strings by using AES (v 21 0, v 21 1 ), (v 22 0, v 22 1 ), (v 23 0, v 23 1 )
and chooses each element of label(X 2 ) from (100) (v 21 0, v 21 1 ), (v 22 0, v 22 1 ), (v 23 0, v 23 1 ) label(X 2 )=(v 11, v 22, v 33 ) w1w1 w2w2 w3w3 D1D1 111 D2D2 100
The client further computes PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 )
and sends PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 ) The server
After the store phase, The clients keeps only the secret keys of AES, E, PRF and PRF’. He remembers nothing other than these.
In the search phase, Suppose that the client searches on f(w 1,w 2,w 3 )=w 1 ⋀ w 2 ⋀ w 3
For D 1, the client re-generates these strings (v 11 0, v 11 1 ), …, (v 13 0, v 13 1 ) by using AES in the same way as in the store phase.
f(w 1,w 2,w 3 )=w 1 ⋀ w 2 ⋀ w 3 counter eGC.gen and computes the garbled circuit Γ 1 (v 11 0, v 11 1 ), …, (v 13 0, v 13 1 ) Then the client runs eGC.gen on input
For D 2, The client computes the garbled circuit Γ 2 similarly
PRF(w 1 ), …, PRF(w 3 ) Γ 1 Γ 2 The topological circuit f- and counter The client sends
The server has this table PRF(w 1 )PRF(w 2 )PRF(w 3 ) E(D 1 )label(X 1 ) E(D 2 )label(X 2 )
The server runs eGC.eval on input eGC.eval and computes z 1 =f(X 1 ) label(X 1 ) the garbled circuit Γ 1 the topological circuit f - counter
E(D 1 ) if z 1 =1 The server returns The same for D 2
Theorem In the proposed scheme, if the underlying extended garbling scheme satisfies label reusable privacy Then only the following information is leaked to the server (other than the minimum leakage)
The topological circuit f- (π(j 1 ), …, π(j c )), where π is a random permutation and {w j1, …, w jc } are the queried keywords
In the scheme of Cash et al. (2013) If 「 Japan AND Crypto 」 is searched, the following information is leaked to the server the search formula = AND the search result of Japan or that of Crypto and some more information ( see Sec.5.3 of their paper )
Communication overhead of the proposed scheme Let m = # of files c = # of search keywords s = # of gates of f In the search phase, the com. overhead is |counter|+(c+4m(s-1))×128+4m bits
If # of search keywords is 2 The communication overhead is |counter| × ( # of files ) bits
Computer simulation We used a computer such as follows. 2.4GHz CPU and 32G byte RAM OS = CentOS 6.5 C++ and NTL library The total # of keywords is 20. We generated Index randomly
The running time of the client in the search phase
The running time of the server in the search phase
In the proposed SSE scheme, Search formula Search phase Search formula secrecy Wang et al. (2008) Only AND 1 round--- Cash at al. (CRYPTO 2013) Any2 roundsleaked Kurosawa (FC 2014) Any1 roundsecret
Summary UC-Secure Searchable Symmetric Encryption How to Update Documents Verifiably in Searchable Symmetric Encryption Garbled Searchable Symmetric Encryption 224
Thank you !