Batch Codes and Their Applications Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai Preliminary version in STOC 2004.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Private Inference Control David Woodruff MIT Joint work with Jessica Staddon (PARC)
Private Inference Control
Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Xiaoming Sun Tsinghua University David Woodruff MIT
On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
Chapter 5: Tree Constructions
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Computational Privacy. Overview Goal: Allow n-private computation of arbitrary funcs. –Impossible in information-theoretic setting Computational setting:
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
Henry C. H. Chen and Patrick P. C. Lee
An Ω(n 1/3 ) Lower Bound for Bilinear Group Based Private Information Retrieval Alexander Razborov Sergey Yekhanin.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Distribution and Revocation of Cryptographic Keys in Sensor Networks Amrinder Singh Dept. of Computer Science Virginia Tech.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Locally Decodable Codes
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Improving the Round Complexity of VSS in Point-to-Point Networks Jonathan Katz (University of Maryland) Chiu-Yuen Koo (Google Labs) Ranjit Kumaresan (University.
Traitor Tracing Papers Benny Chor, Amos Fiat and Moni Naor, Tracing Traitors (1994) Moni Naor and Benny Pinkas, Threshold Traitor Tracing (1998) Presented.
1/17 Optimal Long Test with One Free Bit Nikhil Bansal (IBM) Subhash Khot (NYU)
Two Query PCP with Sub-constant Error Dana Moshkovitz Princeton University Ran Raz Weizmann Institute 1.
BTrees & Bitmap Indexes
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Private Information Retrieval Benny Chor, Oded Goldreich, Eyal Kushilevitz and Madhu Sudan Journal of ACM Vol.45 No Reporter : Chen, Chun-Hua Date.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Introduction to Modern Cryptography, Lecture ?, 2005 Broadcast Encryption, Traitor Tracing, Watermarking.
Random Key Predistribution Schemes for Sensor Networks Authors: Haowen Chan, Adrian Perrig, Dawn Song Carnegie Mellon University Presented by: Johnny Flowers.
Sketching in Adversarial Environments Or Sublinearity and Cryptography 1 Moni Naor Joint work with: Ilya Mironov and Gil Segev.
The Goldreich-Levin Theorem: List-decoding the Hadamard code
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Private Information Retrieval. What is Private Information retrieval (PIR) ? Reduction from Private Information Retrieval (PIR) to Smooth Codes Constructions.
1 The Sybil Attack John R. Douceur Microsoft Research Presented for Cs294-4 by Benjamin Poon.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
BB84 Quantum Key Distribution 1.Alice chooses (4+  )n random bitstrings a and b, 2.Alice encodes each bit a i as {|0>,|1>} if b i =0 and as {|+>,|->}
Private Information Retrieval Amos Beimel – Ben-Gurion University Tel-Hai, June 4, 2003 This talk is based on talks by:
Variable-Length Codes: Huffman Codes
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst.
Simulating independence: new constructions of Condensers, Ramsey Graphs, Dispersers and Extractors Boaz Barak Guy Kindler Ronen Shaltiel Benny Sudakov.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
CS548 Advanced Information Security Presented by Gowun Jeong Mar. 9, 2010.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
A Linear Lower Bound on the Communication Complexity of Single-Server PIR Weizmann Institute of Science Israel Iftach HaitnerJonathan HochGil Segev.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
1 Private codes or Succinct random codes that are (almost) perfect Michael Langberg California Institute of Technology.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Secure Conjunctive Keyword Search Over Encrypted Data Philippe Golle Jessica Staddon Palo Alto Research Center Brent Waters Princeton University.
Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)
On Locally Decodable Codes Self Correctable Codes t-private PIR and Omer Barkol, Yuval Ishai and Enav Weinreb Technion, Israel.
Efficient Private Matching and Set Intersection Mike Freedman, NYU Kobbi Nissim, MSR Benny Pinkas, HP Labs EUROCRYPT 2004.
Private Information Retrieval Based on the talk by Yuval Ishai, Eyal Kushilevitz, Tal Malkin.
The Message Passing Communication Model David Woodruff IBM Almaden.
Multi-Party Computation r n parties: P 1,…,P n  P i has input s i  Parties want to compute f(s 1,…,s n ) together  P i doesn’t want any information.
New Locally Decodable Codes and Private Information Retrieval Schemes
Sublinear-Time Error-Correction and Error-Detection
Sublinear-Time Error-Correction and Error-Detection
Secure Computation of Constant-Depth Circuits with Applications to Database Search Problems Omer Barkol Yuval Ishai Technion.
Hash Tables – 2 Comp 122, Spring 2004.
On the effect of randomness on planted 3-coloring models
Cryptography Lecture 5.
Limits of Practical Sublinear Secure Computation
Presentation transcript:

Batch Codes and Their Applications Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai Preliminary version in STOC 2004

Talk Outline Batch codes Amortized PIR –via hashing –via batch codes Constructing batch codes Concluding remarks

A Load-Balancing Scenario x

What’s wrong with a random partition? Good on average for “oblivious” queries. However: –Can’t balance adversarial queries –Can’t balance few random queries –Can’t relieve “hot spots” in multi-user setting

Example 3 devices, 50% storage overhead. By how much can the maximal load be reduced? –Replicating bits is no good:  device s.t.1/6 of the bits can only be found at this device. –Factor 2 load reduction is possible: LR LR LRLR

Batch Codes (n,N,m,k) batch code: Notes –Rate = n / N –By default, insist on minimal load per bucket  m≥k. –Load measured by # of probes. Generalizations –Allow t probes per bucket –Larger alphabet  x n y1y1 y2y2 ymym N { i 1,…,i k }

Multiset Batch Codes (n,N,m,k) multiset batch code: Motivation –Models multiple users (with off-line coordination) –Useful as a building block for standard batch codes Nontrivial even for multisets of the form x n y1y1 y2y2 ymym N

Examples Trivial codes –Replication: N=kn, m=k Optimal m, bad rate. –One bit per bucket: N=m=n Optimal rate, bad m. (L,R,L  R) code: rate=2/3, m=3, k=2. Goal: simultaneously obtain –High rate (close to 1) –Small m (close to k) multiset

Private Information Retrieval (PIR) Goal: allow user to query database while hiding the identity of the data-items she is after. Motivation: patent databases, web searches,... Paradox(?): imagine buying in a store without the seller knowing what you buy. Note: Encrypting requests is useful against third parties; not against server holding the data.

Modeling Database: n-bit string x User: wishes to –retrieve x i and –keep i private

Server User xixi ?? ?

Some “Solutions” 1. User downloads entire database. Drawback: n communication bits (vs. logn+1 w/o privacy). Main research goal: minimize communication complexity. 2. User masks i with additional random indices. Drawback: gives a lot of information about i. 3. Enable anonymous access to database. Note: addresses the different security concern of hiding user’s identity, not the fact that x i is retrieved. Fact: PIR as described so far requires  (n) communication bits.

Two Approaches Computational PIR [KO97, CMS99,...] –Computational privacy –Based on cryptographic assumptions Information-Theoretic PIR [CGKS95,Amb97,...] –Replicate database among s servers –Unconditional privacy against t servers –Default: t=1

Communication Upper Bounds Computational PIR –O(n  ), polylog(n), O(  logn), O(  +logn) [KO97,CMS99,…] Information-theoretic PIR –2 servers, O(n 1/3 ) [CGKS95] –s servers, O(n 1/c(s) ) where c(s)=Ω(slogs / loglogs) [CGKS95,Amb97,BIKR02] –O(logn/loglogn) servers, polylog(n)

Time Complexity of PIR Given low-communication protocols, efficiency bottleneck shifts to servers’ time complexity. –Protocols require (at least) linear time per query. –This is an inherent limitation! Possible workarounds: –Preprocessing –Amortize cost over multiple queries

Previous Results [BIM00] PIR with preprocessing –s-server protocols with O(n  ) communication and O(n 1/s+  ) work per query, requiring poly(n) storage. –Disadvantages: Only work for multi-server PIR Storage typically huge Amortized PIR –Slight savings possible using fast matrix multiplication –Require a large batch of queries and high communication –Apply also to queries originating from different users. This work: –Assume a batch of k queries originate from a single user. –Allow preprocessing (not always needed). –Nearly optimal amortization

Model Server/s User ?? ? x i, x i, …, x i 1 2k

Amortized PIR via Hashing Let P be a PIR protocol. Hashing-based amortized PIR: –User picks h  R H, defining a random partition of x into k buckets of size  n/k, and sends h to Server/s. Except for 2 -  failure probability, at most t=O(  logk) queries fall in each bucket. –P is applied t times for each bucket. Complexity: –Time  kt  T(n/k)  t  T(n) –Communication  kt  C(n/k) –Asymptotically optimal up to “polylog factors”

So what’s wrong? Not much… Still: –Not perfect introduces either error or privacy loss –Useless for small k t=O(  logk) overhead dominates –Cannot hash “once and for all”  h  bad k-tuple of queries Sounds familiar?

Amortized PIR via Batch Codes Idea: use batch-encoding instead of hashing. Protocol: –Preprocessing: Server/s encode x as y=(y 1,y 2,…,y m ). –Based on i 1,…,i k, User computes the index of the bit it needs from each bucket. –P is applied once for each bucket. Complexity –Time   1  j  m T(N j )  T(N) –Communication   1  j  m C(N j )  m  C(n) Trivial batch codes imply trivial protocols. (L,R,L  R) code: 2 queries,1.5 X time, 3 X communication

Constructing Batch Codes

Overview Recall notion Main qualitative questions: 1.Can we get arbitrarily high constant rate (n/N=1-  ) while keeping m feasible in terms of k (say m=poly(k))? 2.Can we insist on nearly optimal m (say m=O(k)) and still get close to a constant rate? Several incomparable constructions Answer both questions affirmatively. x n y1y1 y2y2 ymym N i 1,…,i k ~

Batch Codes from Unbalanced Expanders By Hall’s theorem, the graph represents an (n,N=|E|,m,k) batch code iff every set S containing at most k vertices on the left has at least |S| neighbors on the right. Fully captures replication-based batch codes. n m

Parameters Non-explicit: N=dn, m=O(k  (nk) 1/(d-1) ) –d=3: rate=1/3, m=O(k 3/2 n 1/2 ). –d=logn: rate=1/logn, m=O(k)  Settles Q2 Explicit (using [TUZ01],[CRVW02] ) –Nontrivial, but quite far from optimal Limitations: –Rate < ½ (unless m=  (n)) –For const. rate, m must also depend on n. –Cannot handle multisets.

The Subcube Code Generalize (L,R,L  R) example in two ways –Trade better rate for larger m (Y 1,Y 2,…,Y s,Y 1  …  Y s ) still k=2 –Handle larger k via composition

Geomertic Interpretation AB CD A B C D ABAB CDCD ACAC BDBD ABCDABCD

Parameters N  k log(1+1/s)  n, m  k log(s+1) –s=O(logk) gives an arbitrary constant rate with m=k O(loglogk).  “almost” resolves Q1 Advantages: –Arbitrary constant rate –Handles multisets –Very easy decoding Asymptotically dominated by subsequent construction.

The Gadget Lemma From now on, we can choose a “convenient” n and get same rate and m(k) for arbitrarily larger n. Primitive multiset batch code

Batch Codes vs. Smooth Codes Def. A code C:  n   m is q-smooth if there exists a (randomized) decoder D such that –D(i) decodes x i by probing q symbols of C(x). –Each symbol of C(x) is probed w/prob  q/m. Smooth codes are closely related to locally decodable codes [KT00]. Two-way relation with batch codes: –q-smooth code  primitive multiset batch code with k=m/q 2 (ideally would like k=m/q). –Primitive multiset batch code  (expected) q-smooth for q=m/k Batch codes and smooth codes are very different objects: –Relation breaks when relaxing “multiset” or “primitive” –Gap between m/q and m/q 2 is very significant for high rate case Best known smooth codes with rate>1/2 require q>n 1/2 These codes are provably useless as batch codes.

Batch Codes from RM Codes (s,d) Reed-Muller code over F –Message viewed as s-variate polynomial p over F of total degree (at most) d. –Encoded by the sequence of its evaluations on all points in F s –Case |F|>d is useful due to a “smooth decoding” feature: p(z) can be extrapolated from the values of p on any d+1 points on a line passing through z.

s=2, d  (2n) 1/2 x2x2 x1x1 xnxn Two approaches for handling conflicts: 1.Replicate each point t times 2.Use redundancy to “delete” intersections Slightly increases field size, but still allows constant rate.

Parameters Rate = (1/s!-  ), m=k 1+1/(s-1)+o(1) –Multiset codes with constant rate (< ½) Rate =  (1/k  ), m=O(k)  resolves Q2 for multiset codes as well Main remaining challenge: resolve Q1 ~

The Subset Code Choose s,d such that n  Each data bit i  [n] is associated T  Each bucket j  [m] is associated S  Primitive code: y S =  T  S x T x y s d ( ) [s]d[s]d [s]d[s]d sdsd

Batch Decoding the Subset Code Lemma: For each T’  T, x T can be decoded from all y S such that S  T=T’. –Let L T,T’ denote the set of such S. –Note: {L T,T’ : T’  T } defines a partition of xTxT y T’ ( ) [s]d[s]d **0110****

Batch Decoding the Subset Code (contd.) Goal: Given T 1,…,T k, find subsets T’ 1,…,T’ k such that L Ti, T’i are pairwise disjoint. –Easy if all T i are distinct or if all T i are the same. Attempt 1: T’ i is a random subset of T i –Problem: if T i,T j are disjoint, L Ti, T’i and L Tj, T’j intersect w.h.p. Attempt 2: greedily assign to T i the largest T’ i such that L Ti, T’i does not intersect any previous L Tj, T’j –Problem: adjacent sets may “block” each other. Solution: pick random T’ i with bias towards large sets. x3x1x2

Parameters Allows arbitrary constant rate with m=poly(k)  Settles Q1 Both the subcube code and the subset code can be viewed as sub-codes of the binary RM code. –The full binary RM code cannot be batch decoded when the rate>1/2.

Concluding Remarks: Batch Codes A common relaxation of very different combinatorial objects –Expanders –Locally-decodable codes Problem makes sense even for small values of m,k. –For multiset codes with m=3,k=2, rate 2/3 is optimal. –Open for m  k+2. Useful building block for “distributed data structures”.

Concluding Remarks: PIR Single-user amortization is useful in practice only if PIR is significantly more efficient than download. –Certainly true for multi-server PIR –Most likely true also for single-server PIR Killer app for lattice-based cryptosystems? Single user Multiple users AdaptiveNon-adaptive ?? ?