Tight Bound for the Gap Hamming Distance Problem Oded Regev Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Xiaoming Sun Tsinghua University David Woodruff MIT
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Shortest Vector In A Lattice is NP-Hard to approximate
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Quantum One-Way Communication is Exponentially Stronger than Classical Communication TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
The Unique Games Conjecture with Entangled Provers is False Julia Kempe Tel Aviv University Oded Regev Tel Aviv University Ben Toner CWI, Amsterdam.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.
1 Truthful Mechanism for Facility Allocation: A Characterization and Improvement of Approximation Ratio Pinyan Lu, MSR Asia Yajun Wang, MSR Asia Yuan Zhou,
The number of edge-disjoint transitive triples in a tournament.
1 List Coloring and Euclidean Ramsey Theory TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A Noga Alon, Tel Aviv.
Chapter 5 Orthogonality
Avraham Ben-Aroya (Tel Aviv University) Oded Regev (Tel Aviv University) Ronald de Wolf (CWI, Amsterdam) A Hypercontractive Inequality for Matrix-Valued.
1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.
Theory of Computing Lecture 22 MAS 714 Hartmut Klauck.
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
CHAPTER FIVE Orthogonality Why orthogonal? Least square problem Accuracy of Numerical computation.
Entropy-based Bounds on Dimension Reduction in L 1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AAAA A Oded Regev.
C&O 355 Mathematical Programming Fall 2010 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
C&O 355 Mathematical Programming Fall 2010 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
1 Fingerprinting techniques. 2 Is X equal to Y? = ? = ?
Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.
Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.
Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
AN ORTHOGONAL PROJECTION
Orthogonality and Least Squares
Communication Complexity Rahul Jain Centre for Quantum Technologies and Department of Computer Science National University of Singapore. TexPoint fonts.
Languages with Bounded Multiparty Communication Complexity Arkadev Chattopadhyay (McGill) Joint work with: Andreas Krebs (Tubingen) Michal Koucky (Czech.
Information Theory for Data Streams David P. Woodruff IBM Almaden.
PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
X y x-y · 4 -y-2x · 5 -3x+y · 6 x+y · 3 Given x, for what values of y is (x,y) feasible? Need: y · 3x+6, y · -x+3, y ¸ -2x-5, and y ¸ x-4 Consider the.
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
Data Stream Algorithms Lower Bounds Graham Cormode
CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Lattice-based cryptography and quantum Oded Regev Tel-Aviv University.
Joint Moments and Joint Characteristic Functions.
Inequalities for Stochastic Linear Programming Problems By Albert Madansky Presented by Kevin Byrnes.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
The Message Passing Communication Model David Woodruff IBM Almaden.
Approximation Algorithms based on linear programming.
Random Access Codes and a Hypercontractive Inequality for
Information Complexity Lower Bounds
New Characterizations in Turnstile Streams with Applications
Dimension reduction for finite trees in L1
Markov Chains Mixing Times Lecture 5
Background: Lattices and the Learning-with-Errors problem
Sketching and Embedding are Equivalent for Norms
CS 154, Lecture 6: Communication Complexity
Linear sketching over
Orthogonality and Least Squares
Linear sketching with parities
Imperfectly Shared Randomness
Orthogonality and Least Squares
Presentation transcript:

Tight Bound for the Gap Hamming Distance Problem Oded Regev Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AAAA A Based on joint paper with Amit Chakrabarti Dartmouth College

Alice is given x  {0,1} n and Bob is given y  {0,1} nAlice is given x  {0,1} n and Bob is given y  {0,1} n They are promised that eitherThey are promised that either Δ (x,y) > n/2+  n or Δ (x,y) n/2+  n or Δ (x,y) < n/2-  n. Their goal is to decide which is the case using the minimum amount of communicationTheir goal is to decide which is the case using the minimum amount of communication Allowed to use randomizationAllowed to use randomization Gap Hamming Distance (GHD) x  {0,1} n y  {0,1} n

Alice is given x  {0,1} n and Bob is given y  {0,1} nAlice is given x  {0,1} n and Bob is given y  {0,1} n They are promised that eitherThey are promised that either Δ (x,y) > n/2+  n or Δ (x,y) n/2+  n or Δ (x,y) < n/2-  n. Their goal is to decide which is the case using the minimum amount of communicationTheir goal is to decide which is the case using the minimum amount of communication Allowed to use randomizationAllowed to use randomization Gap Hamming Distance (GHD) Important applications in the data stream model [FlajoletMartin85,AlonMatiasSzegedy99]Important applications in the data stream model [FlajoletMartin85,AlonMatiasSzegedy99] E.g., approximating the number of distinct elementsE.g., approximating the number of distinct elements Equivalent to the Gap Inner Product problemEquivalent to the Gap Inner Product problem

Gap Hamming Distance (GHD) Known upper bound:Known upper bound: Naïve protocol: nNaïve protocol: n Known lower bounds:Known lower bounds: Version without a gap: Ω (n)Version without a gap: Ω (n) Easy lower bound of Ω (  n)Easy lower bound of Ω (  n) Lower bound of Ω (n) in the deterministic model [Woodruff07]Lower bound of Ω (n) in the deterministic model [Woodruff07] One-round Ω (n) [IndykWoodruff03, JayramKumarSivakumar07]One-round Ω (n) [IndykWoodruff03, JayramKumarSivakumar07] Constant-round Ω (n) [BrodyChakrabarti09]Constant-round Ω (n) [BrodyChakrabarti09] Improved in [BrodyChakrabartiRegevVidickdeWolf09]Improved in [BrodyChakrabartiRegevVidickdeWolf09] Nothing better known in the general case!Nothing better known in the general case!

Our Main Result R(GHD) = (n) We completely resolve the question:We completely resolve the question:

The Smooth Rectangle Bound

The Rectangle Bound Assume there is a randomized protocol that solves GHD with error <0.1 and communication n/1000Assume there is a randomized protocol that solves GHD with error <0.1 and communication n/1000 Define two distributions:Define two distributions: μ 0 : uniform over x,y  {0,1} n with Δ (x,y) = n/2-  nμ 0 : uniform over x,y  {0,1} n with Δ (x,y) = n/2-  n μ 1 : uniform over x,y  {0,1} n with Δ (x,y) = n/2+  nμ 1 : uniform over x,y  {0,1} n with Δ (x,y) = n/2+  n By easy direction of Yao’s lemma, we obtain a deterministic protocol with communication n/1000 that on μ 0 outputs 0 w.p. >0.9 and on μ 1 outputs 1 w.p. >0.9By easy direction of Yao’s lemma, we obtain a deterministic protocol with communication n/1000 that on μ 0 outputs 0 w.p. >0.9 and on μ 1 outputs 1 w.p. >0.9

The Rectangle Bound This deterministic protocol defines a partition of the 2 n *2 n communication matrix into 2 n/1000 rectangles, each labeled with 0 or 1:This deterministic protocol defines a partition of the 2 n *2 n communication matrix into 2 n/1000 rectangles, each labeled with 0 or 1:

1 The Rectangle Bound This deterministic protocol defines a partition of the 2 n *2 n communication matrix into 2 n/1000 rectangles, each labeled with 0 or 1:This deterministic protocol defines a partition of the 2 n *2 n communication matrix into 2 n/1000 rectangles, each labeled with 0 or 1: μ 0 : μ 1 : >0.9<0.1<0.1>0.9

μ 0 : μ 1 : >0.9<0.1<0.1>0.9 The Rectangle Bound In order to reach the desired contradiction, one proves:In order to reach the desired contradiction, one proves: For all rectangles R with μ 0 (R) ≥ 2 -n/100, μ 1 (R) ≥ ½ μ 0 (R)

Problem! ConsiderConsider R = { (x,y) | x and y start with 10  n ones } Then μ 0 (R)=2 -Ω (  n) but μ 1 (R) < μ 0 (R) !!Then μ 0 (R)=2 -Ω (  n) but μ 1 (R) < μ 0 (R) !! The trouble: big unbalanced rectangles exist…The trouble: big unbalanced rectangles exist… But apparently they cannot form a partition?But apparently they cannot form a partition?

Smooth Rectangle Bound To resolve this problem, we use a new lower bound technique introduced in [Klauck10, JainKlauck10].To resolve this problem, we use a new lower bound technique introduced in [Klauck10, JainKlauck10]. Define three distributions:Define three distributions: μ 0 : uniform over x,y  {0,1} n with Δ (x,y) = n/2-  nμ 0 : uniform over x,y  {0,1} n with Δ (x,y) = n/2-  n μ 1 : uniform over x,y  {0,1} n with Δ (x,y) = n/2+  nμ 1 : uniform over x,y  {0,1} n with Δ (x,y) = n/2+  n μ 2 : uniform over x,y  {0,1} n with Δ (x,y) = n/2+3  nμ 2 : uniform over x,y  {0,1} n with Δ (x,y) = n/2+3  n Our main technical inequality:Our main technical inequality: For all rectangles R with μ 1 (R) ≥ 2 -n/100, ( μ 0 (R)+ μ 2 (R))/2 ≥ 0.9 μ 1 (R) ( μ 0 (R)+ μ 2 (R))/2 ≥ 0.9 μ 1 (R)

Smooth Rectangle Bound For all rectangles R with μ 1 (R) ≥ 2 -n/100, ( μ 0 (R)+ μ 2 (R))/2 ≥ 0.9 μ 1 (R) ( μ 0 (R)+ μ 2 (R))/2 ≥ 0.9 μ 1 (R) μ 0 : μ 1 : μ 2 : * * * * * * * * * * * * * >0.9<0.1<0.1>0.9 >1.5 Contradiction!!

The Main Technical Theorem

Theorem: For any sets A,B  {0,1} n of measure ≥ 2 -n/100 the distribution of  (x,y)-n/2 where x  A and y  B is ‘at least as spread out’ as N(0, 0.49  n) Example: Take A={all strings starting with n/2 zeros, and ending with a string of Hamming weight n/4}. Similarly for B. Then their measure is 2 -n/2 but  (x,y) is always n/2 0 0 … … … 0 A B

The Main Technical Theorem: Gaussian Version We actually derive the main theorem as a corollary of the analogous statement for Gaussian space (which is much nicer to work with!): We actually derive the main theorem as a corollary of the analogous statement for Gaussian space (which is much nicer to work with!): Theorem: For any sets A,B   n of measure ≥ 2 -n/100 the distribution of  x,y  /  n where x  A and y  B is ‘at least as spread out’ as N(0,1)

A Stronger Theorem Our main theorem follows from the following stronger result:Our main theorem follows from the following stronger result: Theorem: Let B   n be any set of measure ≥ 2 -n/100. Then the projection of B on all but 2 -n/50 of directions is distributed like the sum of N(0,1) and an independent r.v.Theorem: Let B   n be any set of measure ≥ 2 -n/100. Then the projection of B on all but 2 -n/50 of directions is distributed like the sum of N(0,1) and an independent r.v. (i.e., a mixture of normals with variance 1)

Lemma 1 – Hypercube Version Lemma 1’:Lemma 1’: Let B  {0,1} n be of size ≥2 0.99n and let b=(b 1,…,b n ) be uniformly distributed in B. Then for 90% of indices k  {1,…,n}, b k is close to uniform (even when conditioned on b 1,…,b k-1 ). Proof:Proof: Since entropy of a bit is never bigger than 1, most summands are very close to 1.

Lemma 1 Lemma 1:Lemma 1: For any set B   n of measure  (B)≥2 -n/100 and any orthonormal basis x 1,…,x n, it holds that for 90% of indices k  {1,…,n},  B,x k  is close to N(0,1) (even when conditioned on  B,x 1 ,…,  B,x k-1  )

Lemma 2 Lemma 2 [Raz’99]:Lemma 2 [Raz’99]: Any set A’   n-1 of at least ≥2 -n/50 directions contains a set of 1/10-orthogonal vectors x 1,…,x n/2. (i.e., the projection of each x i on the span of x 1,…,x i-1 is of length at most 1/10) Proof: Based on the isoperimetric inequalityProof: Based on the isoperimetric inequality x1x1x1x1 x2x2x2x2

Completing the Proof Theorem: Let B   n be any set of measure ≥ 2 -n/100. Then the projection of B on all but 2 -n/50 of directions is distributed like the sum of N(0,1) and an independent r.v. Proof: Let A’ be the set of ‘bad’ directions and assume by contradiction that its measure is ≥2 -n/50Let A’ be the set of ‘bad’ directions and assume by contradiction that its measure is ≥2 -n/50 Let x 1,…,x n/2  A’ be the vectors given by Lemma 2Let x 1,…,x n/2  A’ be the vectors given by Lemma 2 If they were orthogonal, then by Lemma 1, there is a k (in fact, most k) s.t.  B,x k  is close to N(0,1), in contradictionIf they were orthogonal, then by Lemma 1, there is a k (in fact, most k) s.t.  B,x k  is close to N(0,1), in contradiction Since they are only 1/10-orthogonal, we obtain that  B,x k  is distributed like the sum of N(0,1) and an independent r.v., in contradiction.Since they are only 1/10-orthogonal, we obtain that  B,x k  is distributed like the sum of N(0,1) and an independent r.v., in contradiction.

Open Questions Our main technical theorem can be seen as a (weak) symmetric analogue of a result by [Borell’85]Our main technical theorem can be seen as a (weak) symmetric analogue of a result by [Borell’85] (which was used in the proof of the Majority in Stablest Theorem [Mossell O’Donnell Oleszkiewicz’05]) Can one prove a tight inequality as done by Borell? Symmetrization techniques do not seem to help...Can one prove a tight inequality as done by Borell? Symmetrization techniques do not seem to help... Other applications of the technique?Other applications of the technique?