Sublinear-Time Error-Correction and Error-Detection Luca Trevisan U.C. Berkeley luca@eecs.berkeley.edu
Contents Survey of results on error-correcting codes with sub-linear time checking and decoding procedures Most of the results not proved by the speaker Some of the results not yet proved by anybody
Error-correction
Error-detection
Minimum Distance
Ideally Constant information rate Linear minimum distance Very efficient decoding Sipser-Spielman: linear time deterministic procedure
Sub-linear time decoding? Must be probabilistic Must have some probability of incorrect decoding Even so, is it possible?
Reasons to be interested Sub-linear time decoding useful for worst-case to average-case reductions, and in information-theoretic Private Information Retrieval Sub-linear time checking arises in PCP Useful in practice?
Hadamard Code
“Constant time” decoding
Analysis
A Lower Bound If: the code is linear, the alphabet is small, and the decoding procedure uses two queries Then exponential encoding length is necessary Goldreich-Trevisan, Samorodnitsky
More trade-offs For k queries and binary alphabet: More complicated formulas for bigger alphabet
Construction without polynomials
Negative result 1 Suppose C:{0,1}^n -> {0,1}^m is code with decoding procedure that reads only k bits of corrupted encoding Pick random x, compute C(x), project C(x) on m^{(k-1)/k} coordinates, prove that it still contains W(n) bits of info. about x. Then it must be m=W(n^{k/(k-1)}) Katz-Trevisan
Negative Result 2 Suppose C:{0,1}^n -> {0,1}^m is linear code with decoding procedure that reads only 2 bits of corrupted encoding Then there are vectors a1…am in {0,1}^n such that for each i=1,…,n there are W(m) disjoint pairs j1,j2 such that aj1 xor aj2 = ei Then it must be m=exp(W(n)) Goldreich-Trevisan, Samorodnitksy
Checking polynomial codes Consider encoding with multivariate low-degree polynomials Given p, pick random z, do the decoding for p(z), compare with actual value of p(z) “Simple” case of low-degree test. Rejection prob. proportional to distance from code. Rubinfeld-Sudan
Bivariate Low Degree Test A degree-d bivariate polynomial p:F x F -> F is represented as 2|F| elements of F^d (the univariate polynomial qa (y) = p(a,y) for each a and the polynomial rb(x) = p(x,b) for each b Test: pick random a and b, read qa and rb, check that qa(b)=rb(a)
Analysis If |F| is a constant factor bigger than d, then rejection probability is proportional to distance from code Arora-Safra, ALMSS, Polishuck-Spielman
Efficiency of Decoding vs Checking
Tensor Product Codes Suppose we have a linear code C with codewords in {0,1}^m. Define new code C’ with codewords in {0,1}^(mxm); a “matrix” is a codeword of C’ if each row and each column is codeword for C If C has lots of codeword and large minimum distance, same true for C’
Generalization of the Bivariate Low Degree Test Suppose C has K codewords Define code C’’ over alphabet [K], with codewords of length 2m C’’ has as many codewords as C’ For each codeword y of C’, corresponding codeword in C’’ contains value of each row and each column of y Test: pick a random “row” and a random “column”, check intersection agrees Analysis?
Negative Results? No known lower bound for locally checkable codes Possible to get encoding length n^(1+o(1)) and checking with O(1) queries and {0,1} alphabet? Possible to get encoding length O(n) with O(1) queries and small alphabet?
Applications? Better locally decodable codes have applications to PIR General/simple analysis of checkable proofs could have application to PCP (linear-length PCP, simple proof of the PCP theorem) Applications to the practice of fault-tolerant data storage/transmission?