A Property Testing Double-Feature of Short Talks Oded Goldreich Weizmann Institute of Science Talk at Technion, June 2013.

Slides:



Advertisements
Similar presentations
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Advertisements

On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
Finding Cycles and Trees in Sublinear Time Oded Goldreich Weizmann Institute of Science Joint work with Artur Czumaj, Dana Ron, C. Seshadhri, Asaf Shapira,
Lower Bounds for Non-Black-Box Zero Knowledge Boaz Barak (IAS*) Yehuda Lindell (IBM) Salil Vadhan (Harvard) *Work done while in Weizmann Institute. Short.
Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS.
Boolean Circuits of Depth-Three and Arithmetic Circuits with General Gates Oded Goldreich Weizmann Institute of Science Based on Joint work with Avi Wigderson.
A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Deterministic vs. Non-Deterministic Graph Property Testing Asaf Shapira Tel-Aviv University Joint work with Lior Gishboliner.
Distributional Property Estimation Past, Present, and Future Gregory Valiant (Joint work w. Paul Valiant)
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.
On the limitations of efficient computation Oded Goldreich Weizmann Institute of Science.
Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University.
Noga Alon Institute for Advanced Study and Tel Aviv University
Christian Sohler | Every Property of Hyperfinite Graphs is Testable Ilan Newman and Christian Sohler.
New Algorithms and Lower Bounds for Monotonicity Testing of Boolean Functions Rocco Servedio Joint work with Xi Chen and Li-Yang Tan Columbia University.
Derandomized DP  Thus far, the DP-test was over sets of size k  For instance, the Z-Test required three random sets: a set of size k, a set of size k-k’
Proximity Oblivious Testing Oded Goldreich Weizmann Institute of Science Joint work with Dana Ron.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
ACT1 Slides by Vera Asodi & Tomer Naveh. Updated by : Avi Ben-Aroya & Alon Brook Adapted from Oded Goldreich’s course lecture notes by Sergey Benditkis,
Proximity Oblivious Testing Oded Goldreich Weizmann Institute of Science Joint work with Dana Ron.
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Evaluating Hypotheses
Is solving harder than checking? Oded Goldreich Weizmann Institute of Science.
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
Testing Metric Properties Michal Parnas and Dana Ron.
On Proximity Oblivious Testing Oded Goldreich - Weizmann Institute of Science Dana Ron – Tel Aviv University.
1 On approximating the number of relevant variables in a function Dana Ron & Gilad Tsur Tel-Aviv University.
On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
1 Algorithmic Aspects in Property Testing of Dense Graphs Oded Goldreich – Weizmann Institute Dana Ron - Tel-Aviv University.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint works with Mira Gonen and Oded Goldreich Dana Ron Tel-Aviv University.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Adaptiveness vs. obliviousness and randomization vs. determinism Dariusz Kowalski University of Connecticut & Warsaw University Andrzej Pelc University.
On Testing Computability by small Width OBDDs Oded Goldreich Weizmann Institute of Science.
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.
In a World of BPP=P Oded Goldreich Weizmann Institute of Science.
Finding Cycles and Trees in Sublinear Time Oded Goldreich Weizmann Institute of Science Joint work with Artur Czumaj, Dana Ron, C. Seshadhri, Asaf Shapira,
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Sub-Constant Error Low Degree Test of Almost-Linear Size Dana Moshkovitz Weizmann Institute Ran Raz Weizmann Institute.
1 Information and interactive computation January 16, 2012 Mark Braverman Computer Science, Princeton University.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Property Testing: Sublinear-Time Approximate Decisions Oded Goldreich Weizmann Institute of Science Talk at CTW, July 2013.
Asymmetric Communication Complexity And its implications on Cell Probe Complexity Slides by Elad Verbin Based on a paper of Peter Bro Miltersen, Noam Nisan,
Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)
 2005 SDU Lecture13 Reducibility — A methodology for proving un- decidability.
Data Stream Algorithms Lower Bounds Graham Cormode
NP-Completness Turing Machine. Hard problems There are many many important problems for which no polynomial algorithms is known. We show that a polynomial-time.
狄彥吾 (Yen-Wu Ti) 華夏技術學院資訊工程系 Property Testing on Combinatorial Objects.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
1 Tolerant Locally Testable Codes Atri Rudra Qualifying Evaluation Project Presentation Advisor: Venkatesan Guruswami.
On Sample Based Testers
Information Complexity Lower Bounds
Randomness and Computation
Dana Ron Tel Aviv University
On Testing Dynamic Environments
Finding Cycles and Trees in Sublinear Time
On Approximating the Number of Relevant Variables in a Function
From dense to sparse and back again: On testing graph properties (and some properties of Oded)
CIS 700: “algorithms for Big Data”
The Subgraph Testing Model
Every set in P is strongly testable under a suitable encoding
On Derandomizing Algorithms that Err Extremely Rarely
Oded Goldreich Weizmann Institute of Science
Presentation transcript:

A Property Testing Double-Feature of Short Talks Oded Goldreich Weizmann Institute of Science Talk at Technion, June 2013

Oded Goldreich Weizmann Institute of Science On the communication complexity methodology for proving lower bounds on the query complexity of property testing

Before Blais, Brody, and Matulef (2011) (In order to derive a lower bound on testing the property , reduce a two-party communication problem  to .) Communication Complexity Property Testing AxAx ByBy T z The models seem incompatible: (1) no natural partition in PT, (2) no distance in CC.

The Methodology of Blais, Brody, and Matulef In order to derive a lower bound on testing the property , reduce a two-party communication problem  to . That is, present a mapping F of pairs of inputs (x,y)  0,1  n+n for the CC-problem  to l(n)-bit long inputs for testing  such that (x,y)  implies F(x,y)  and (x,y)  implies that F(x,y) is far from . In [BBM], l(n)=n and each f i is a function of x i and y i only. This restriction complicates the use of the methodology. Let f i (x,y) be the i-th bit of F(x,y), and suppose that B is an upper bound on the (deterministic) communication complexity of each f i and that C is a lower bound on the randomized communication complexity of . Then, testing  requires at least C/B queries. (x,y) F(x,y)  =1  =0 in  far from 

Soundness of the Methodology RCC = randomized CC (with error, say 1/3). Shared randomness. DCC = deterministic CC (or randomized with error 1/6n). PT = query complexity of testing (w.r.t distance as in “far”). Proof: Each of the two parties invokes a local copy of the tester using the shared randomness. Each query (i.e., i) made by the tester is answered by invoking the corresponding CC protocol (for f i ). Note that the two local executions are kept identical. The error probability of this protocol equals that of the tester. ■ THM: Let F:  0,1  n+n  0,1  l(n) be such that (x,y)  implies F(x,y)  and (x,y)  implies that F(x,y) is far from . Let f i (x,y) be the i-th bit of F(x,y). Then, RCC(  ) ≤ max i {DCC(f i )} ∙ PT(  ). Extends to CC promise problems

Applying the Methodology THM: Let F:  0,1  n+n  0,1  l(n) be such that (x,y)  implies F(x,y)  and (x,y)  implies that F(x,y) is far from . Let f i (x,y) be the i-th bit of F(x,y). Then, RCC(  ) ≤ max i {DCC(f i )} ∙ PT(  ); i.e., PT(  ) ≥ RCC(  )/max i {DCC(f i )}. THM: Let C:  0,1  n  0,1  l(n) be a linear code of constant relative distance, and k:N  N. Then, the query complexity of the set {C(x):x  0,1  n & wt(x)=k} is  (k). PF: Reduce from k-DISJ n (disjointness for k/2-subsets), using F(x,y)=C(x+y)=C(x)+C(y). Note that each bit in F(x,y) has DCC=2 (by exchanging the corresponding bits of C(x) and C(y)). COR: Testing k-linearity has query complexity  (k). [C = Hadamard] Note: Typically, the i-th bit of F(x,y) depends on a linear number of bits in x and in y. An alternative proof that uses the original BBM formulation needs to maneuver around this difficulty.

Applying the Restricted Methodology THM: Let F:  0,1  n+n  0,1  l(n) be such that (x,y)  implies F(x,y)  and (x,y)  implies that F(x,y) is far from . Let f i (x,y) be the i-th bit of F(x,y). Then, PT(  ) ≥ RCC(  )/max i {DCC(f i )}. Restriction: f i (x,y)=fnc(i,x i,y i ). THM: Let C:  0,1  n  0,1  l(n) be a linear code of constant relative distance, and k:N  N. Then, the query complexity of the set {C(x):x  0,1  n & wt(x)=k} is  (k). An alternative proof via the restricted methodology introduces an auxiliary CC problem (“C-encoded k-DISJ”)  ’ that consists of pairs (C(x),C(y)) s.t (x,y)  k-DIST n and reduces (in the CC world) k-DISJ to  ’ and then applies the restricted method to  ’. The general methodology frees the prover/user from this type of acrobatics. Interestingly, this is only a matter of convenience; that is, it does not add power (i.e., “anything provable via general is essentially provable by restricted”).

Emulating the Restricted Methodology THM: Let F:  0,1  n+n  0,1  l(n) be such that (x,y)  implies F(x,y)  and (x,y)  implies that F(x,y) is far from . Let f i (x,y) be the i-th bit of F(x,y). Then, PT(  ) ≥ RCC(  )/max i {DCC(f i )}. Restriction: f i (x,y)=fnc(i,x i,y i ). THM (imprecise sketch): Suppose that ,  and F satisfy the conditions of the general methodology with B=max i {DCC(f i )}. Then, there exists  ’,  ’ and F’ that satisfy the conditions of the restricted methodology while RCC(  ’)≥RCC(  ) and PT(  )=  (PT(  ’)/B). Still, the general methodology frees the prover/user from this type of acrobatics.

Oded Goldreich Weizmann Institute of Science On Multiple Input Problems in Property Testing

Three types of multiple input problems For any fixed property  and proximity parameter . Direct m-Sum Problem: Given a sequence of m inputs, output a sequence of m outputs that each satisfy the testing requirements; that is, for every i, if the i th input is in  then the i th output is 1 w.p.≥2/3, whereas if the input is  -far from  then the output is 1 w.p. ≥ 2/3. Direct m-Product Problem: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is  -far from . m-Concatenation Problem: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if the average distance of the inputs from  is at least . The results at a glance: For DS and DP the query complexity is m times the query complexity of , for CP it is about the same as for .

The main results m-DS: Given a sequence of m inputs, output a sequence of m outputs such that, for every i, if the i th input is in  then the i th output is 1 w.p.≥2/3, whereas if the input is  -far from  then the output is 1 w.p. ≥ 2/3. m-DP: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is  -far from . m-CP: Given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if the average distance of the inputs from  is at least . For any  and , w.r.t. error probability at most 1/3. THM 1: m-DS  (  ) =  (m∙PT  (  )). THM 2: m-DP  (  ) =  (m∙PT  (  )). THM 3: Typically (*), m-DP  (  ) = Õ(PT  (  )). *) “Typically” = if PT  (  ) increases at least linearly with 1/ 

Comments re the proof of THM1 THM 1: m-DS  (  ) =  (m∙PT  (  )). (m-DS  = given a sequence of m inputs, output a sequence of m outputs such that, for every i, if the i th input is in  the i th output is 1 w.p.≥2/3, whereas if the input is  -far from  then the output is 1 w.p. ≥ 2/3.) Re the lower bound: In the model of query complexity, it is easy to decouple the execution of the multiple-instance procedure into a sequence of single-instance executions, and the only issue at hand is the possibly uneven and adaptive allocation of resources among the executions. We need to consider the allocation of resources w.r.t some distribution on instances; which one? The one provided by the MiniMax Principle! The real contents of the MMP is not that the worst-case performance of each randomized algorithm is bounded by the average-case performance (of all deter’ algorithms) w.r.t some fixed input distribution, but rather that this bound is tight!

Comments re the proof of THM2 THM 2: m-DP  (  ) =  (m∙PT  (  )). (m-DP  = given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is  -far from .) In iteration j, run DS on the instances with index in I, with error parameter exp(-j), and reset I to be the set of indices with output 0. If |I|>m/2 j, then halt with output 0. If I is empty, halt with output 1. Re the upper bound: A straightforward reduction of DP to DS will require error reduction (and so we would lose a  (log m) factor). LEM: m-DP can be reduced to O(j) instances of 2 -(j-1) m-DS, for j=1,…,log m. Idea: Proceed in iterations, initializing I (the set of “far” suspects) to [m]. Re the lower bound: Via an adaptation of the proof of THM1.

Illustration for the proof of LEM In iteration j, run DS on the instances with index in I, with error parameter exp(-j), and reset I to be the set of indices with output 0. If |I|>m/2 j, then halt with output 0. If I is empty, halt with output 1. LEM: m-DP can be reduced to O(j) instances of 2 -(j-1) m-DS, for j=1,…,log m. Idea: Proceed in iterations, initializing I (the set of “far” suspects) to [m]. Case: All inputs in  Case:  an input far from  * 110 0

Comments re the proof of THM3 THM 3: Typically (*), m-DP  (  ) = Õ(PT  (  )). (m-CP  = given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if the average distance of the inputs from  is at least .) *) “Typically” = if PT  (  ) increases at least linearly with 1/  Suppose E s [q(s)] > , for q:[N]  [0,1]. (Invested work is proportional to 1/q(s), unknown a priori.) Then, exists j  [l] such that Prob s [q(s)>2 -j ] > 2 j  /4l. Re the upper bound: A straightforward algorithm would sample O(1/  ) instances and run the  -tester for  on each of them. Complexity O(PT  /  ). One can do better using Levin’s economical work investment strategy. Let l = log(2/  ). For j=1,…,l, take a sample of O(l/2 j  ) instances and invoke a 2 -j -tester on each.

Additional results and comments Non-adaptive and/or one-sided error testers The only deviation from the general case is for the one-sided error version of DP: Its complexity is  (m∙PT(  )+PT ose (  )). (m-DP = given a sequence of m inputs, output 1 w.p. ≥2/3 if all inputs are in , and 0 w.p.≥2/3 if some input is  -far from .) (OSE is the adaptive version) it selects a random i in I, and invokes the one-sided error tester on the i th instance, and decides accordingly. In contrast, in the invocations of the reduction procedure, we use the two-sided error tester. Re the upper bound: We adapt the procedure presented in the proof of the efficient reduction of DP to DS (cf., Lemma for THM2). Recall that this procedure proceeds in iterations halting with output 1 if I (the set of “far” suspects) becomes empty and outputting 0 if I is ever too big. We modify the procedure such that in the latter case

End The slides of this talk are available at The “CC Methodology” paper is available at The “Multiple Input” paper is available at

Gothic cathedral ? Property Testing: an illustration

Property Testing: informal definition A relaxation of a decision problem: For a fixed property P and any object O, determine whether O has property P or is far from having property P ( i.e., O is far from any other object having P ). Focus: sub-linear time algorithms – performing the task by inspecting the object at few locations. ?? ? ? ? Objects viewed as functions. Inspecting = querying the function/oracle.

Property Testing: the standard (one-sided error) def’n A property P =  n P n, where P n is a set of functions with domain D n. The tester gets explicit input n and , and oracle access to a function with domain D n. If f  P n then Prob[T f (n,  ) accepts] = 1. (or > 2/3) If f is  -far from P n then Prob[T f (n,  ) rejects] > 2/3. (Distance is defined as fraction of disagreements.) Focus: query complexity, q(n,  ) « |D n | Special focus: q(n,  )=q(  ), independent of n. Terminology:  is called the proximity parameter.

The Methodology of Blais, Brody, and Matulef In order to derive a lower bound on testing the property , reduce a two-party communication problem  to . That is, present a mapping F of pairs of inputs (x,y)  0,1  n+n for the CC-problem  to l(n)-bit long inputs for testing  such that (x,y)  implies F(x,y)  and (x,y)  implies that F(x,y) is far from . In [BBM], l(n)=n and each f i is a function of x i and y i only. This restriction complicates the use of the methodology. Let f i (x,y) be the i-th bit of F(x,y), and suppose that B is an upper bound on the (deterministic) communication complexity of each f i and that C is a lower bound on the randomized communication complexity of . Then, testing  requires at least C/B queries.