Keyword search on encrypted data
Keyword search problem Linux utility: grep Information retrieval Basic operation Advanced operations – relevance analysis and ranking Search engines highly complicated problem
New settings Search data in the cloud Filter encrypted s Privacy preserving log retrieval
Basic techniques Symmetric encryption Public key encryption Simple keyword matching A little bit relevance evaluation
Secure keyword search with symmetric encryption Paper: Song 2000 Seed is random, different for each Wi Key idea: Li and Ri are self- verifiable Advantage of XOR
How to set K?
Setting of ki Ki = Fk’(Wi), k’ is secret User publishes W and k = Fk’(W) Server checks CiW whether == CiW It reveals nothing if Ci is not the ciphertext for W. And Li is random for different Wi – server cannot find any information from Li.
Hidden search In previous schemes, W is revealed Weakness: each search will have to release k for W Easy to collect information Solution: encrypt Wi with an private key, then xor with Still weaknesses Wi encryption should be deterministic Access pattern is leaked Linear scan over the whole doc collection
Typical method for speedy keyword based search Using the “inverted index” Word -> doc1:pos, doc2:pos,… Or simply word -> doc1, doc2, … However, inverted index reveals the word frequency
Recent developments Reza 2006 “Searchable symmetric encryption: improved definitions and efficient constructions” Completely solved this problem, with a solution indistinguishability under chosen ciphertext attack (IND-CCA) Allow inverted index Hide word frequency
setup D – the set of documents {D1,…,Dn} max - the maximum number of distinct words in a document Li – the list of document IDs that contain the keyword w_i, plus some dummy entries to reach max A – array contains all elements in Li (max * |D|) T – table that contains the )
Symmetric encryption function, encrypt words and document ids id(Dj) for wi entry is encoded as enc(wi||j) to make indistinguishable Pseudo-random function f Two pseudo-random permutation functions : for mapping word to table entry : for mapping index to next node of Li to the index of array A
Building the index table T The key used to encrypt the node N i, to random values of the same size of the existing entries
Generating Li with K i,0, We can decrypt all nodes in the list For the remaining max – |D(wi)| dummy nodes, store the doc id that Already appears in the first |D(wi)| entries. This can be done with the help of a look-up table I
Search Generate the trapdoor Search
Property Each keyword search returns the same number of encrypted document ids – the attacker cannot distinguish word frequency
Search public-key encrypted data Users who encrypt the data (with public key) can be different from the owner of the private key
Cyclic group For example, if G = { g 0, g 1, g 2, g 3, g 4, g 5 } mod p is a group, then g 6 = g 0, and G is cyclic. p is the order g is the generator
Bilinear-map construction Two groups G1 G2 of prime order p A bilinear map : G1 X G1 -> G2 Properties: