Cryptography Lecture 12 Stefan Dziembowski
Plan 1.Introduction to multiparty cryptographic protocols. 2.Private Information Retrieval
Traditional scenario Alice and Bob are attacked by an adversary.
Multiparty protocols A group of players wants to perform some task together even though they do not trust each other
This is a vast area Examples: voting coin-tossing auctions.... Today’s lecture: Private Information Retrieval
AOL search data scandal (2006) # : clothes for age single men best retirement city jarrett arnold jack t. arnold jaylene and jarrett arnold gwinnett county yellow pages rescue of older dogs movies for dogs sinus infection Thelma Arnold 62-year-old widow Lilburn, Georgia
Observation The owners of databases know a lot about the users! This poses a risk to users’ privacy. E.g. consider database with stock prices… Can we do something about it? Yes, we can: trust them that they will protect our secrecy, or use cryptography! problematic problematic!
How can crypto help? Note: this problem has nothing to do with secure communication! user U database D
Our settings user U database D A new primitive: Private Information Retrieval (PIR) secure link
Plan 1.Definition of PIR 2.An ideal PIR doesn’t exist 3.Construction of a computational PIR 4.Open problems Literature: B. Chor, E. Kushilevitz, O. Goldreich and M. Sudan, Private Information Retrieval, Journal of ACM, 1998 E. Kushilevitz and R. Ostrovsky Replication Is NOT Needed: SINGLE Database, Computationally-Private Information Retrieval, FOCS 1997
Question How to protect privacy of queries? user U database D wants to retrieve some data from D shouldn’t learn what U retrieved
Let’s make things simple! B1B1 B2B2 …BwBw index i = 1,…,w the user should learn B i BiBi ? each B i є {0,1} database B: (he may also learn other B i ’s)
Trivial solution B1B1 B2B2 …BwBw The database simply sends everything to the user!
Non-triviality The previous solution has a drawback: the communication complexity is huge! Therefore we introduce the following requirement: “Non-triviality”: the number of bits communicated between U and D has to be smaller than w.
input: Private Information Retrieval (PIR) B1B1 B2B2 …BwBw input: index i = 1,…,w at the end the user learns B i the database does not learn i the total communication is < w Note: secrecy of the database is not required correctness secrecy (of the user) non-triviality This property needs to be defined more formally! polynomial time randomized interactive algorithms
How to define secrecy of the user [1/2]? iB Def. T(i,B) – transcript of the conversation. query Q(i)reply A(Q(i),B)
multi-round case: it is impossible to distinguish between T(i,B) and T(j,B) even if the adversary is malicious How to define secrecy of the user [2/2]? Secrecy of the user: for every i,j є {0,1} single-round case: it is impossible to distinguish between Q(i) and Q(j) ? What does it mean? For now say: the distribution of Q(i) and Q(j) is the same
PIR doesn’t exists [1/4] We now show that correctness, non-triviality and secrecy cannot be satisfied simultaneously. Def: A transcript T is possible for (i,B) if P(T(i,B) = T) > 0 Take some T’, and look where it is possible: T’ indices i databases B
PIR doesn’t exists [2/4] Observation: secrecy → if T’ is possible for some B and i then it is possible for B and all the other i’s T’ indices i databases B T’
PIR doesn’t exists [3/4] non-triviality → length(transcript) < length(database) ↓ # transcripts < #databases ↓ there has to exist T’ that is possible for two databases B 0 and B 1 T’ databases B ← B 0 ← B1 ← B1 indices i
PIR doesn’t exists [4/4] B 0 and B 1 differ on at least one index i’ So, if i’ is the input of the user then correctness → contradiction T’ databases B ← B 0 ← B1 ← B1 i’ ↓ indices i
So PIR doesn’t exist! How to bypass the impossibility result? Two ideas: –limit the computing power of a cheating database –use a larger number of “independent” databases
Computationally-secure PIR secrecy: For every i,j є {0,1} it is impossible to distinguish efficiently between T(i,B) and T(j,B) ? computational-secrecy: Formally: for every polynomial-time probabilistic algorithm A the value: | P(A(T(i,B)) = 0) – P(A(T(j,B))=0) | should be negligible.
Hardness assumptions? [KO97] – construct PIR based on the Quadratic Residuosity Assumption We describe it on the next slides.
Favourite cryptographers’ group p,q – large random primes (|p|=|q|=2 1024, say) RSA group: Z n *, where n=pq
Chinese remainder theorem Chinese remainder theorem (CRT): For n = pq (where p and q are prime) a function λ: Z* n → Z* p × Z* q defined as λ(i) := (i mod p, i mod q) is an isomorphism.
Example Z15Z15 Z15*Z15* λ(i) := (i mod p, i mod q) Z5*Z5* Z3*Z3*
Def. x is quadratic residue modulo m if there exists a є Z m * such that x = a 2 mod m QR(m) := the set of all quadratic residues modulo m. QNR(m) := Z m * \ QR(n) Observation: every quadratic residue modulo 13 has exactly 2 square roots, and hence |QR(13)| = |Z 13 *| / 2. Z 13 *: QR(13): a a2:a2: Quadratic Residues
A Lemma about QRs modulo prime p Lemma: For every prime p we have QR(p) = (p-1)/2 Remark: Let g be a generator of Z p *. Then QR(p) = {g 0,g 2,g 4,g 6,...,g p-3 }. QNR(p) = {g 1,g 3,g 5,g 7,...,g p-2 }.
QRs modulo pq Observation: every quadratic residue modulo 15 has exactly 4 square roots, and hence |QR(15)| = |Z 15 *| / 4. Z 15 *: QR(15): 14 a a2a2
A Lemma about QRs modulo pq Fact: For n=pq we have |QR(n)| = |Z n *| / 4. Proof: x є QR(n) iff x = a 2 mod n, for some a iff (by CRT) x = a 2 mod p and x = a 2 mod q iff x mod p є QR(p) and x mod q є QR(q) QR(p) QR(q) Zn*:Zn*: mod q mod p QR(n)
Z 15 : Z15*Z15* QRs modulo pq – an example QR(3) QR(5) mod mod mod mod mod mod 3
Homomorphism of QR(pq) Q(n,a) = Homomorphism: for all a,b є Z n * Q(n,ab) = Q(n,a) xor Q(n,b) (exercise) 1 if a є QR(n) 0 otherwise
Algorithmic questions about QR Suppose n=pq Is it easy to test membership in QR(n)? Fact: if one knows p and q – yes! What if one doesn’t know p and q?
Quadratic Residuosity Assumption (QRA) n=pq, where p and q are large primes QR(p) QR(q) Zn*:Zn*: Z n + : all a є Z n *: such that a mod p є QR(p) iff a mod q є QR(q) Quadratic Residuosity Assumption (QRA): For a random a є Z n + it is computationally hard to determine if a є QR(n). Formally: for every polynomial-time probabilistic algorithm G the value: |P(G(a) = Q(a)) – 0.5| (where a is random) is negligible. QNR(q) QNR(p) ? a є Z n + ↓ Note: Z n + is a group! QR(n)
We are ready to construct PIR! Our PIR will work in the group Z n +, where n=pq. What’s so good about this group?: testing membership in Z n + is easy, testing membership in QR(n) is hard for random elements on Z n +, unless one knows p and q. homomorphism of Q!
First (wrong) idea i QR X 1 QR X 2... QR X i-1 NQR X i QR X i+1... QR X w-1 QR X w B1B1 B2B2...B i-1 BiBi B i+1...B w-1 BwBw for every j = 1,...,w the database sets Y j = X j 2 if B j = 0 X j otherwise { QR Y 1 QR Y 2... QR Y i-1 YiYi QR Y i+1... QR Y w-1 QR Y w i↓i↓ Y i is a QR iff B i =0 Set M = Y 1 · Y 2 ·... · Y w M M is a QR iff B i =0 the user checks if M is a QR
Problems! PIR from the previous slide: correctness √ security? To learn i the database would need to distinguish NQR from QR. √ QR X 1 QR X 2... QR X i-1 NQR X i QR X i+1... QR X w-1 QR X w non-triviality? doesn’t hold! communication: user → database: |B| · |Z* n | database → user: |Z* n | Call it: (|B|, 1) - PIR
How to fix it? Idea: Given: construct Suppose that |B| = v 2 and present B as a v×v-matrix: B13B14B15B16 B9B10B11B12B5B6B7B8 B1B2B3B4 consider each row as a separate database
An improved idea B13B14B15B16 B9B10B11B12 B5B6B7B8 B1B2B3B4 v v Let j be the column where B i is. In every “row” the user asks for the jth element So, instead of sending v queries the user can send one! Observe: in this way the user learns all the elements in the jth column! BiBi j↓j↓ execute v (v,1) - PIRs in parallel Looks even worse: communication: user → database: v 2 · |Z* n | database → user: v · |Z* n | The method
Putting things together B1B1...B j-1 BjBj B j+1...BvBv BiBi B vv i QR X 1... QR X j-1 NQR X j QR X j+1... QR X v X1X1...X j-1 XjXj X j+1...XvXv X1X1 X j-1 XjXj X j+1...XvXv Y1Y1 Y j-1 YjYj Y j+1...YvYv Y vv M1M1... MvMv multiply elements in each row kth row M1M1 MkMk MvMv B j =0 iff M k is QR for every j = 1,...,v set Y j = X j 2 if B j = 0 X j otherwise { jth column here the same row is copied v times: only this counts
So we are done! PIR from the previous slide: correctness √ non-triviality: communication complexity = 2√|B| · |Z n | √ security? The to learn i the database would need to distinguish NQR from QR. Formally: from any adversary that breaks our scheme we can construct an algorithm that breaks QRA simulates:
Improvements user U database D (X 1,…,X v ) (M 1,…,M v ) the user is interested just in one M i. Idea: apply PIR recursively!
Complexity of PIRs – overview of the results Communication: “recursive” PIR of [KO97]: for every c: O(|B| c ) [Cachin, Micali, Stadler, 1999]: poly-logarithmic in |B| [Lipmaa, 2005]: O(log 2 |B|) For practical analysis see: [Sion, Carbunar] On the Computational Practicality of Private Information Retrieval. their conclusion: It is the time-complexity that matters. In real-life: it is still more practical to transmit the entire database.
Extensions Symmetric PIR (also protect privacy of the database). [Gertner, Ishai, Kushilevitz, Malkin. 1998] Searching by key-words [Chor, Gilboa, Naor, 1997] Public-key encryption with key-word search [Boneh, Di Crescenzo, Ostrovsky, Persiano]
Open problems: Improve efficiency. Construct new extensions. What was the key property that we used? homomorphism of QR Holy grail: fully-homomorphic encryption
Fully-homomorphic encryption Observe that we constructed a 1-bit probabilistic public-key encryption scheme: Enc(X) = random QRif X = 0 random NQRif X = 1 { Which has the following homomorphic with respect to xor: Enc(X xor Y) = Enc(X) Enc(Y) It would be really useful to have an encryption scheme homomorphic with respect to: conjunction and negation simultaneously.