Private Information Retrieval Yuval Ishai Computer Science Department Technion
Talk Overview Intro to PIR –Motivation and problem definition –Toy examples –State of the art Relation with other primitives –Locally Decodable Codes –(Oblivious Transfer, CRHF) Constructions Open problems
Private Information Retrieval (PIR) [CGKS95] Goal: allow a user to access a database while hiding what she is after. Motivation: patent databases, web searches, etc. Paradox(?): imagine buying in a store without the seller knowing what you buy. Note: Encrypting requests is useful against third parties; not against server holding the data.
Modeling Database: n-bit string x User: wishes to –retrieve x i and –keep i private
Server User xixi ?? ? Modeling
Some “solutions” 1.User downloads entire database. Drawback: n communication bits (vs. logn+1 w/o privacy). Main research goal: minimize communication complexity. 2. User masks i with additional random indices. Drawback: gives a lot of information about i. 3. Enable anonymous access to database. Addresses a different concern: hides identity of user, not the fact that x i is retrieved. Fact: PIR as described so far requires (n) communication bits.
Two Approaches Information-Theoretic PIR [CGKS95,Amb97,...] Replicate database among k servers. Unconditional privacy against t servers. Default: t=1 Computational PIR [KO97,CMS99,...] Computational privacy, based on cryptographic assumptions.
Model for I.T. PIR S1S1 User X i S2S2 X SkSk xixi ?? ? X
Information-Theoretic PIR for Dummies S2S2 i U i X n 1/2 q2 q2 q1 q1 a 2 =X·q 2 a 1 =X·q 1 S1S1 q 1 + q 2 = e i 2-server PIR with O(n 1/2 ) communication a 1 +a 2 =X·e i
Computational PIR for Dummies Tool: homomorphic encryption Protocol: a b a+b = n 1/2 i X= User sends E(e i ) E(0) E(0) E(1) E(0) (=c 1 c 2 c 3 c 4 ) Server replies with E(X·e i ) c2c3c2c3 c1 c2c3c1 c2c3 c1c2c1c2 c4c4 User recovers ith column of X PIR with ~ O(n 1/2 ) communication
Bounds for Computational PIR serverscomm. assumption [CG97] 2 O(n ) one-way function [KO97] 1 O(n ) QRA / [CMS99] 1 polylog(n) -hiding … DCRA [Lipmaa] [KO00] 1 n-o(n) trapdoor permutation homomorphic encryption
Bounds for I.T. PIR Upper bounds: –O(log n / loglog n) servers, polylog(n) [BF90,BFKR91,CGKS95] –2 servers, O(n 1/3 ); k servers, O(n 1/k ) [CGKS95] –k servers, O(n 1/(2k-1) ) [Amb97,Ito99, IK99, BI01,WY05] t-private, O(n t /(2k-1) ) [BI01,WY05] –k servers, O(n cloglogk /(klogk) ) [BIKR02]. Lower bounds: –log n +1 (no privacy) –2 servers, ~5log n ; k servers, c k log n [Man98,WdW04] –Better for restricted 2-server protocols [CGKS95, GKST02, BFG02, KdW03, WdW04] CGKS AMB IK BI BIKR efficient “ clean ” “ dirty ” inefficient WY
Why Information-Theoretic PIR? Cons: Requires multiple servers Privacy against limited collusions Worse asymptotic complexity (with const. k): O(n c ) vs. polylog(n) Pros: Neat question Unconditional privacy Better “real-life” efficiency Allows very short queries or very short answers (+apps [DIO98,BIM99] ) Closely related to a natural coding problem [KT00]
Locally Decodable Codes [KT00] Requirements: High fault-tolerance Local decoding xy i Question: how large should m(n) be in a k-query LDC? k=2: 2 (n) k=3: 2 O(n^ 0.5) (n 2 ) Recover from m faults … … with ½ + probability
From I.T. PIR to LDC k-server PIR with -bit queries and -bit answers k-query LDC of length 2 over ={0,1} Converse relation also holds. Binary LDC PIR with one answer bit per server Best known LDC are obtained from PIR protocols. –const. q: m=exp(n c loglogq / qlogq ) y[q]=Answer(x,q)
A Question about MPC Beaver, Micali, Rogaway, 1990 B., Feigenbaum, Kilian, R., 1990 Can k computationally unbounded players compute an arbitrary f with communication = poly(input-length)? Open question: … or with work = poly(formula-size) and constant rounds [BB89,…] Can this be done using a constant number of rounds? Ben-Or, Goldwasser, Wigderson, 1988 Chaum, Crépeau, Damgård, 1988 k 3 players can compute any function f of their inputs with total work = poly(circuit-size) Information-theoretic MPC is feasible!
Question Reformulated Is the communication complexity of MPC strongly correlated with the computational complexity of the function being computed? efficiently computable functions All functions = communication-efficient MPC = no communication-efficient MPC
Connecting MPC and LDC [KT00] The three problems are “essentially equivalent” –up to considerable deterioration of parameters [IK04]
cPIR and the Crypto World cPIR CRHF*OTSecure Computation NI UH Commitment KA Homomorphic Encryption Trapdoor Permutation*
From PIR to OT [DMO00] Rabin’s OT –Sender holds a secret b. –Following interaction with Receiver: w/prob ½, Receiver outputs b w/prob ½, Receiver outputs ? and cannot learn b Sender cannot tell which is the case –Enough to consider “honest-but-curious” parties [GMW87] PIR SenderReceiver b x i xixi i bxibxi b SenderReceiver b PIR x i xixi i’i’ bxi’bxi’ ?
From PIR to OT [DMO00] PIR SenderReceiver b x i xixi i bxibxi b SenderReceiver b PIR x i xixi i’i’ bxi’bxi’ ? Analysis –Privacy of PIR Sender can’t distinguish between two cases –Sublinear comm. in Case 2, Receiver cannot learn b Privacy amplified using XOR lemma (cf. [Haitner]) Interesting corollary: –Homomorphic symmetric encryption PKE
From PIR to CRHF [IKO05] Def. {h q } is a CRHF if: –h q shrinks its input –Given q Gen(1 n ), hard to find distinct x,x’ s.t. h q (x)=h q (x’) CRHF from 1-round PIR: –i [n]; q Query(i) –h q (x)=Answer(x,q) Analysis: –Sublinear communication shrinks input –Collision resistance: h q (x)=h q (x’) x i =x’ i i diff(x,x’) Gives noticeable advantage in guessing i from q Interesting corollary: –HE CRHF
PIR as a Building Block Private storage [OS98] Sublinear-communication secure computation –1-out-of-n Oblivious Transfer (SPIR) [GIKM98,NP99,…] –Keyword search [CGN99,FIPR05] –Statistical queries [CIKRRW01] –Approximate distance [FIMNSW01, IW04] –Communication-preserving secure function evaluation [NN01]
Time Complexity of PIR Focus so far: communication complexity Obstacle: time complexity Server/s must spend at least linear time. Workarounds: Preprocessing [BIM00] Amortization [BIM00, IKOS04] Single user Multiple users AdaptiveNon-adaptive ?? ?
Protocols
High level structure of all known protocols –User maps i into a point z F m –User secret-shares z between servers using some t-threshold LSSS over F –Server j responds with a linear function of x determined by its share of z. Two types of protocols: –Polynomial-based [BF90,BFKR91,CGKS95,…,WY05] LSSS = Shamir Scale well with k,t –Replication-based [IK99,BI01,BIKR02] LSSS = CNF Do not scale well with k,t - involve (k choose t) replication overhead However, dominate over polynomial-based up to (k choose t) factor [CDI05] Best known protocols for constant k
Polynomial-Based Protocols Step 1: Arithmetization –Fix a degree parameter d (will be determined by k) Goal: Communication = O(n 1/d ) –User maps i [n] into a weight-d vector z of length m=O(n 1/d ). 1 11100….0 2 11010…0 n 00…0111 –Servers view x as a degree-d m-variate polynomial P(Z 1,…,Z m )= x 1 Z 1 Z 2 Z 3 + x 2 Z 1 Z 2 Z 4 + … + x n Z m-2 Z m-1 Z m –Privately retrieving i-th bit of x privately evaluating P on z.
Basic Protocol: t=1 Goal: user learns P(z) without revealing z. Step 2: Secret Sharing of z FmFm z+ z+2 z+3 z+4 z t =1 t=1: Pick random “ direction ” F m z j = z+j goes to S j Step 3: S j responds with P(z j ) –User can extrapolate P(z) from P(z 1 ),…,p(z k ) if k>d. Define deg-d univariate poly Q(W)=P( z+W ) Q(0)=P(z) can be extrapolated from d+1 distinct values Q(j)=P(z j ) Query length m=O(n 1/d ), answer length 1 Using k servers, O(n 1/k-1 ) communication
Basic Protocol: General t Goal: user learns P(z) without revealing z. Step 2: Secret Sharing of z Step 3: S j responds with P(z j ) –User can extrapolate P(z) from P(z 1 ),…,P(z k ) if k>dt. Define deg-dt univariate poly Q(W)=P( z+W 1 +W 2 2 + … +W t t ) P(z)=Q(0) can be extrapolated from dt+1 distinct values q(j) O(n 1/d ) communication using k=dt+1 servers. General t: z j = z+j 1 +j 2 2 + … +j t t FmFm z
Improved Variant [WY05] Goal: user learns P(z) without revealing z. Step 2: Secret Sharing of z Step 3: S j responds with P(z j ) along with all m partial derivatives of P evaluated at z j –User can extrapolate P(z) if k>dt/2. Define deg-dt univariate poly Q(W)=P( z+W 1 +W 2 2 + … +W t t ) P(z)=Q(0) can be extrapolated from 2k>dt distinct values Q(j),Q’(j) Complexity: O(m) communication both ways Same communication using half as many servers! General t: z j = z+j 1 +j 2 2 + … +j t t
Breaking the O(n 1/(2k-1) ) Barrier [BIKR02] k = 2 k = 3 k = 4 k = 5 k = 6
Arithmetization As before, except that now F=GF(2) –Fix a degree parameter d (will be determined by k) Goal: Communication = O(n 1/d ) –User maps i [n] into a weight-d vector z of length m=O(n 1/d ). 1 11100….0 2 11010…0 n 00…0111 –Servers view x as a degree-d m-variate polynomial P(Z 1,…,Z m )= x 1 Z 1 Z 2 Z 3 + x 2 Z 1 Z 2 Z 4 + … + x n Z m-2 Z m-1 Z m –Privately retrieving i-th bit of x privately evaluating P on z.
Effect of Degree Reduction degree d, m variables size n degree d/c, m variables size O(n 1/c )
Degree Reduction Using Partial Information [BKL95,BI01] Each entry of y is known to all but one server. S1S1 UserQ y S2S2 SkSk Q Q(y)Q(y) S1S1 P z S2S2 SkSk P P(z)P(z) z is hidden from servers
Q = + + S 1 S 2 S k=3,d=6 S1S1 S3S3 S1S1 S2S2 S3S3 Q(y)=Q 1 (y)+Q 2 (y)+Q 3 (y) degQ j d/k = 2 Q1Q1 Q3Q3 Q2Q2
Back to PIR Each entry of y is known to all but one server. O(n 1/k ) comm. bits S1S1 User Q y S2S2 SkSk Q Q(y)Q(y) S1S1 User P z S2S2 SkSk P P(z)P(z) z is hidden from servers n comm. bits Let z=y 1 + … + y k, where the y j are otherwise random Q(Y 1,…,Y k )= P(Y 1 +… +Y k )
Initial Protocol User picks random y 1,…, y k s.t. y 1 +…+ y k = z, and sends to S j all y’s except y j. Servers define an mk-variate degree-d polynomial Q(Y 1,…,Y k )= P(Y 1 +… +Y k ). Each S j computes degree-(d/k) poly. Q j, such that Q(y)= Q 1 (y)+…+Q k (y). S j sends a description of Q j to User. User computes Q j (y)=x i. O(m) = O(n 1/d ) O(n 1/k )
M S j missing at most d/k variables. A Closer Look Useful parameters: d=k-1 query length O(n 1/(k-1) ) d/k =0 answer length 1 d=2k-1 query length O(n 1/(2k-1) ) d/k =1 answer length O(n 1/(2k-1) ) Best previous binary PIR Best previous PIR deg Q j d/k
Boosting the Integer Truncation Effect Idea: apply multiple “ partial ” degree reduction steps. Generalized degree reduction: Assign each monomial to the k ’ servers V which jointly miss the least number of variables. Q = + + S 1 S 2 S k=3,d=6, k ’ =2 '|| )()( kV V QQyy S1S2S1S2 S2S3S2S3 S1S2S1S2 S1S2S1S2 S1S3S1S3
replication degree size k d n k ’ d ’ n ’ =O(n d ’ /d ) k ” d ” n ” =O(n ’ d ” /d ’ ) … … … reduction 1 d/k n d/k /d k d n k d ’ n The missing operator: #vars m m ’ =O(m d/d ’ ) conversion Additional cost: re-distribute new point y ’
QueriesAnswers O(n 4/21 ) communication replication degree #vars size 3 7 O(n 1/7 ) n Example: k=3 reduction 2 4 O(n 1/7 ) O(n 4/7 ) conversion 2 3 O(n 4/21 ) O(n 4/7 ) reduction 1 1 O(n 4/21 ) O(n 4/21 )
In Search of the Missing Operator Must have m ’ = (m d/d ’ ). Question: For which d ’ <d can get m ’ =O(m d/d ’ )? Possible when d ’ |d. Open otherwise. Positive result Better PIR Simplest open case: d=3, d ’ =2, m ’ =O(m 3/2 ) d, m d ’, m ’ P(y)=P ’ (y ’ ) PP’P’ yy’y’
A Workaround Can ’ t solve the polynomial conversion problem in general. … but easy to solve given the promise weight(y)=const. Stronger degree reduction: Main technical lemma: good parameters for strong degree reduction.
An Abstract Framework L = linear space of polynomials in Y j,h, j [k], h [d], spanned by the k d monomials Let Z h = j Y j,h. Block = poly in L expressible as a product of Y ’ s and Z ’ s. Y 1,1 Y 1,2 Y 1,3 Y 1,4 Y 1,5 Y 2,1 Y 2,2 Y 2,3 Y 2,4 Y 2,5 Y 3,1 Y 3,2 Y 3,3 Y 3,4 Y 3,5 Y 1,1 Y 1,2 Y 1,3 Y 1,4 Y 1,5 Y 2,1 Y 2,2 Y 2,3 Y 2,4 Y 2,5 Y 3,1 Y 3,2 Y 3,3 Y 3,4 Y 3,5 Y 3,1 Y 3,2 Y 1,3 Y 2,4 Y 1,5 (3,3,1,2,1) Z 1 Z 2 Y 3,1 Z 4 Y 5,1 (*,*,1,*,1)
For each block b, define: V(b) = set of servers j [k] not occurring in b (b) = # of *’s in b. Ex. k=3, b = Z 1 Z 2 Y 3,1 Z 4 Y 5,1 (*,*,1,*,1) V(b) = {2,3} (b) = 3 Want: V(b) large, (b) small. Cost Measures for Blocks
A block set B is said to be spanning if it spans Z 1 Z 2 …Z d. For any spanning B, can write: where Q b is a deg- (b) polynomial known to all servers in V(b). Retrieve each Q b (z) from servers in V(b) using 2 d PIRs For every k, d=d(k) and spanning set B=B(k), Back to PIR
Necessary condition for spanning: Blocks in B cover all k d monomials in Z 1 Z 2 …Z d. Observation: The following conditions are sufficient. (I) Above covering condition (II) Closure under intersection: Any nonempty intersection of b 1,b 2 B is in B. In fact, (II) may be relaxed to: (II’) Any intersection of b 1,b 2 B is spanned by B. Spanning vs. Covering
Naive strategy: C = “optimal” covering block set (e. g., all b with |V(b)| k’, (b) d’), B = closure of C under intersections. Problem: Intersection can make things worse. (1,1,1,1,2,*,*,*,*) (1,1,1,1,*,3,*,*,*) (1,1,1,1,2,3,*,*,*) Finding Good Spanning Sets
B = intersections of blocks b such that |V(b)|=2, (b)=4, e.g., (1,1,1,*,*,*,*). Case 1:(1,1,1,*,*,*,*) Case 2:(1,1,1,*,*,*,*) (*,*,1,1,1,*,*) (*,*,*,2,2,2,*) (1,1,1,1,1,*,*) (1,1,1,2,2,2,*) Example: k =3, d =7
Let,k’<k be parameters and d ( +1)k-( -1)k’+( -2). Then the following B is spanning: –All b such that (b) 1 and |V(b)|>0 –All b such that: (1) |V(b)| k’ (2) Each server index in b occurs there more than times (3) (b) |V(b)| The General Case
Open Problems Better upper bounds –Known: O(n cloglogk /(klogk) ) –What is the true limit of our technique? Generalize best upper bound to t>1 Tight bounds for polynomial conversion Lower bounds –Known: clogn –Simplest cases: k=2 k=3, single answer bit per server