1 Secure Indexes Author : Eu-Jin Goh Presented by Yi Cheng Lin
2 Outline Introduction Contribution Index Scheme Background Construction Choosing Suitable Bloom Filter Parameter
3 Outline Pseudo-Random Functions IND-CKA Z-IDX is a IND-CKA index Conclusion Comment
4 Introduction Keyword indexes let us search in constant time for documents containing specified keywords Unfortunately, standard index constructions such as those using hash table are unsuitable for indexing encrypted documents
5 Introduction In this paper, they formally define a secure index that allows a querier with a “ trapdoor ” for a word x to test in O (1) time only if the index contains x The index reveals no information about its constants without valid trapdoors
6 Contribution The first contribution of this paper is in defining a secure index and formulating a security model for indexes known as semantic security against adaptive chosen keyword attack (IND-CKA) adversary A index document D Knows m words n words Can ’ t get any word n-m unknown wotrds
7 Contribution The second contribution is an efficient IND-CKA secure index construction called Z-IDX, which is built using pseudo-random functions and Bloom filters Z-IDX scheme is efficient
8 Contribution 2654 plaintext files Debian Linux an index for the average document is roughly kilobytes in size The largest document in this collection is kilobytes long and its index is kilobytes large The smallest document is 9 bytes long and its index is 115 bytes large indexes can searched in one second on a 866 MHz Pentium 3 machine 27.4 megabytes
9 Index Scheme Keygen (s): Given a security parameter s, outputs the master private key K priv Trapdoor (K priv, w): Given the master key K priv and word w, outputs the trapdoor T w for w
10 Index Scheme BuildIndex (D, K priv ): Given a document D and the master key K priv, outputs the index I D SearchIndex(T w, I D ): Given the trapdoor T w for word w and the index I D for document D, outputs 1 if w D and 0 otherwise
11 Index Scheme Alice Keygen (s): K priv BuildIndex (D1, K priv ): I D1 I D1, E(D1) Store Server ID1ID1 E(D1) ID2ID2 E(D2) …… IndexEncrypted data
12 Index Scheme Alice Server Trapdoor (K priv, w): T w TwTw SearchIndex(T w, I D1 ) ID1ID1 E(D1) ID2ID2 E(D2) …… 1 E(D1), … … 0 Keygen (s): K priv
13 Background pseudo-random functions :is computationally indistinguishable from a random function given pairs (x 1, f(x 1, k)),..., (x m, f(x m, k)), an adversary cannot predict f(x m+1, k) for any x m+1
14 Background Bloom Filter: a set of S = {s 1,..., s n } of n elements and is represented by an array of m bits. All array bits are initially set to 0. The filter uses r independent hash functions h 1,..., h r, where h i : {0, 1}* ->[1,m] for i [1, r].
15 ah1(a)h1(a) h2(a)h2(a) hr(a)hr(a) To determine if an element a belongs to the set S If all bit are 1 ’ s,then a S Else a S
16 Construction Keygen(s): Given a security parameter s, choose a pseudo-random function f : {0, 1} n ×{0, 1} s {0, 1} s and the master key K priv = (k 1,..., k r ) {0, 1} sr Trapdoor(K priv,w): Given the master key K priv = (k 1,..., k r ) {0, 1} sr and word w, output the trapdoor for word w as T w = (f(w, k 1 ),..., f(w, k r )) {0, 1} sr R
17 Construction BuildIndex(D,K priv ): Document D : D id {0, 1} n A list of words (w 0,...,w t ) {0, 1} nt K priv = (k 1,..., k r ) {0, 1} sr WiWi x 1 = f (w i, k 1 )... x r = f (w i, k r ) y 1 = f (D id, x 1 )... y r = f (D id, x r ) BF for D id trapdoor codeword Input Output I Did = (D id, BF)
18 Construction SearchIndex(T w, I Did ): Input trapdoor T w = (x 1, …, x r ) {0, 1} sr index I Did = (D id, BF) for document D id y 1 = f (D id, x 1 )... y r = f (D id, x r ) If so, output 1; Otherwise, output 0 Test if BF contains 1 ’ s in all r locations denoted by y 1,..., y r
19 Choosing Suitable Bloom Filter Parameter Hash functions h 1, …., h r Insert n distinct element in to an array of size m The probability that bit i in the array is 0 is (1 – (1/m)) rn ≈ e -rn/m the probability of a false positive is (1 − (1 − (1/m)) rn ) r ≈ (1 − e −rn/m ) r
20 Choosing Suitable Bloom Filter Parameter False positive rate fp = (1/2) r = (1 − e −rn/m ) r ½ = 1 − e −rn/m ½ = e −rn/m ln(1/2) = -rn/m ln 2 = r (n/m) m = rn/ ln 2
21 Choosing Suitable Bloom Filter Parameter fp = 0.01 r = 7 fp = r = 10 n = n = Choose suitable m
22 Pseudo-Random Functions f : {0, 1} n × {0, 1} s ->{0, 1} m is a (t, ɛ, q)-pseudo-random function if for any t time oracle algorithm A that makes at most q adaptive queries
23 IND-CKA Setup : Queries : Challenger C creates a set S of q words S Adversary A Chooses a number of subsets from S This collection of subset is called S* S*S* C build index for each subset in S* Index Query C on a word x Trapdoor T x for x
24 IND-CKA Challenge : A picks a non-empty subset V 0 S*, and generating another non-empty subset V 1 from S such that |V 0 − V 1 | 0, |V 1 − V 0 | 0, and the total length of words in V 0 is equal to that in V 1 Next, A gives V 0 and V 1 to C who chooses b {0,1}, invokes BuildIndex(V b, K priv ) to obtain the index I V b for V b, and return I V b to A
25 IND-CKA Response :A eventually output a bit b ’, representing its guess for b The advantage of A in winning this game is defined as Adv A = | Pr[b = b ’ ] − 1/2| We say that an adversary A (t, ɛ, q)-breaks an index if Adv A is at least ɛ after A takes at most t time and makes q trapdoor queries to the challenger. We say that I is an (t, ɛ, q)- IND-CKA secure index if no adversary can (t, ɛ, q)-break it AdvA = | Pr[b = b ’ ] − 1/2|< ɛ
26 Z-IDX is a IND-CKA index Theorem 3.2. If f is a (t, ɛ, q)-pseudo-random function, then Z-IDX is a (t, ɛ, q/2)- IND-CKA index We use ¬q -> ¬p to prove
27 Z-IDX is a IND-CKA index Prove :Suppose Z-IDX is not a (t, ɛ, q/2)- IND-CKA index algorithm A (t, ɛ, q/2)-breaks Z-IDX We build an algorithm B that uses A to determine if f is a pseudo-random function or a random function. the unknown function f that takes as input x {0, 1} n and returns f (x) {0, 1} s.
28 Z-IDX is a IND-CKA index Setup : Queries : algorithm B creates a set S of q/2 words S algorithm A Chooses a number of subsets from S This collection of subset is called S* S*S* B build index for each subset in S* Index Query B on a word x Trapdoor T x for x
29 Z-IDX is a IND-CKA index Response : A eventually outputs a bit b ’, representing its guess for b. If b ’ = b, then B outputs 0, indicating that it guesses that f is a pseudo-random function. Otherwise, B outputs 1 B takes at most t time because A takes at most t time. Furthermore, B makes at most q queries to f because there are only q/2 strings in S and A makes at most q/2 queries
30 Z-IDX is a IND-CKA index Claim 1: When f is a pseudo-random function, then Claim 2: When f is a random function, then
31 Z-IDX is a IND-CKA index By claim1 and claim 2 But, if f is a (t, ɛ, q)-pseudo-random function Theorem 3.2. If f is a (t, ɛ, q)-pseudo-random function, then Z-IDX is a (t, ɛ, q/2)- IND-CKA index
32 Conclusion Z-IDX is efficient for search indexes Index and document ’ s size are independent Property : ” hidden queries ”, “ controlled searching ”, and “ query isolation ”
33 Comment Bloom Filter is a probabilistic data structure Need more space (index ’ s size ≈ document ’ s size)