Linear sketching with parities

Linear sketching with parities
Grigory Yaroslavtsev with Sampath Kannan (Penn) and Elchanan Mossel (MIT)

Linear sketching with parities
Input 𝒙∈ 0,1 𝑛 Parity = Linear function over 𝔾 𝐹 2 : ⊕ 𝑖∈𝑆 𝑥 𝑖 E.g. 𝑥 4 ⊕ 𝑥 2 ⊕ 𝑥 42 Deterministic linear sketch: set of 𝒌 parities: ℓ 𝒙 = ⊕ 𝑖 1 ∈ 𝑆 1 𝑥 𝑖 1 ; ⊕ 𝑖 2 ∈ 𝑆 2 𝑥 𝑖 2 ;…; ⊕ 𝑖 𝒌 ∈ 𝑆 𝒌 𝑥 𝑖 𝒌 Randomized linear sketch: distribution over 𝒌 parities (random 𝑆 1 , 𝑆 2 , …, 𝑆 𝒌 ):

Linear sketching over 𝔾 𝐹 2
Given 𝒇 𝒙 : 0,1 𝑛 →{0,1} Question: Can one recover 𝒇(𝒙) from a small (𝒌≪𝑛) linear sketch over 𝔾 𝐹 2 ? Allow randomized computation (99% success) Probability over choice of random sets Sets are known at recovery time Recovery is deterministic (also consider randomized)

Motivation: Distributed Computing
Distributed computation among 𝑴 machines: 𝒙=( 𝒙 𝟏 , 𝒙 𝟐 , …, 𝒙 𝑴 ) (more generally 𝒙= ⊕ 𝑖=1 𝑴 𝒙 𝒊 ) 𝑴 machines can compute sketches locally: ℓ 𝒙 𝟏 , …, ℓ( 𝒙 𝑴 ) Send them to the coordinator who computes: ℓ 𝑖 𝒙 = ℓ 𝑖 𝒙 𝟏 ⊕⋯⊕ ℓ 𝑖 ( 𝒙 𝑴 ) (coordinate-wise XORs) Coordinator computes 𝒇 𝒙 with 𝒌𝑴 communication 1 𝒙 𝒙 𝟏 𝒙 𝟐

Motivation: Streaming
𝒙 generated through a sequence of updates Updates 𝑖 1 ,…, 𝑖 𝑚 : update 𝑖 𝑡 flips bit at position 𝑖 𝑡 𝒙 𝟎 Updates: (1, 3, 8, 3) 𝒙 𝟏 1 𝒙 𝟐 1 𝒙 𝟑 1 𝒙 1 ℓ 𝒙 allows to recover 𝒇(𝒙) with 𝒌 bits of space

Deterministic vs. Randomized
Fact: 𝒇 has a deterministic sketch if and only if 𝒇=𝒈( ⊕ 𝑖 1 ∈ 𝑆 1 𝑥 𝑖 1 ; ⊕ 𝑖 2 ∈ 𝑆 2 𝑥 𝑖 2 ;…; ⊕ 𝑖 𝑘 ∈ 𝑆 𝑘 𝑥 𝑖 𝑘 ) Equivalent to “𝒇 has Fourier dimension 𝒌" Randomization can help: OR: 𝒇 𝒙 = 𝑥 1 ∨⋯∨ 𝑥 𝑛 Has “Fourier dimension” =𝑛 Pick 𝒕= log 1/𝜹 random sets 𝑆 1 ,…, 𝑆 𝒕 If there is 𝑗 such that ⊕ 𝑖∈ 𝑆 𝑗 𝑥 𝑖 =1 output 1, otherwise output 0 Error probability 𝜹

Fourier Analysis 𝒇 𝑥 1 , …, 𝑥 𝑛 : 0,1 𝑛 →{0,1} Notation switch:
𝒇 𝑥 1 , …, 𝑥 𝑛 : 0,1 𝑛 →{0,1} Notation switch: 0→1 1→−1 𝒇 ′ : −1,1 𝑛 →{−1,1} Functions as vectors form a vector space: 𝒇: −1,1 𝑛 →{−1,1}⇔𝒇∈ {−1,1} 2 𝑛 Inner product on functions = “correlation”: 𝒇,𝒈 = 2 −𝑛 𝑥∈ −1,1 𝑛 𝒇 𝑥 𝒈(𝑥) = 𝔼 𝑥∼ −1,1 𝑛 𝒇 𝑥 𝒈 𝑥 𝒇 2 = 𝑓,𝑓 = 𝔼 𝑥∼ −1,1 𝑛 𝒇 2 𝑥 =1 (for Boolean only)

“Main Characters” are Parities
For 𝑺⊆[𝑛] let character 𝝌 𝑺 (𝑥)= 𝑖∈𝑺 𝑥 𝑖 Fact: Every function 𝒇: −1,1 𝑛 →{−1,1} uniquely represented as multilinear polynomial 𝒇 𝑥 1 , …, 𝑥 𝑛 = 𝑺⊆[𝑛] 𝒇 𝑺 𝝌 𝑺 (𝑥) 𝒇 𝑺 a.k.a. Fourier coefficient of 𝒇 on 𝑺 𝒇 𝑺 ≡ 𝒇, 𝝌 𝑺 = 𝔼 𝑥∼ −1,1 𝑛 𝒇 𝑥 𝝌 𝑺 𝑥 𝑺 𝒇 𝑺 2 =1 (Parseval)

Fourier Dimension Fourier sets 𝑆 ≡ vectors in 𝔾 𝐹 2 𝑛
“𝒇 has Fourier dimension 𝒌“ = a 𝒌-dimensional subspace in Fourier domain has all weight 𝑺⊆ 𝐴 𝒌 𝒇 𝑺 2 =1 𝒇 𝑥 1 , …, 𝑥 𝑛 = 𝑺⊆[𝑛] 𝒇 𝑺 𝝌 𝑺 (𝒙) = 𝑺⊆ 𝐴 𝒌 𝒇 𝑺 𝝌 𝑺 (𝒙) Pick a basis 𝑺 𝟏 , …, 𝑺 𝒌 in 𝐴 𝒌 : Sketch: 𝝌 𝑺 𝟏 (𝒙), …, 𝝌 𝑺 𝒌 (𝒙) For every 𝑺∈ 𝐴 𝒌 there exists 𝒁⊆ 𝒌 : 𝑺= ⊕ 𝒊∈𝒁 𝑺 𝒊 𝝌 𝑺 𝒙 = ⊕ 𝒊∈𝒁 𝝌 𝑺 𝒊 (𝒙)

Deterministic Sketching and Noise
Suppose “noise” has a bounded norm 𝒇 = 𝒌-dim. + noise 𝐿 0 -noise in the Fourier domain (via [Sanyal’15]) 𝒇 = 𝒌-dim. + “Fourier 𝐿 0 -noise” 𝑛𝑜𝑖𝑠𝑒 = # non-zero Fourier coefficients of noise (aka “Fourier sparsity”) Linear sketch size: 𝒌+𝑂( 𝑛𝑜𝑖𝑠𝑒 /2 ) Our work: can’t be improved even with randomness and even for uniform 𝑥, e.g for ``addressing function’’.

How Randomization Handles Noise
𝐿 0 -noise in the original domain (hashing a la OR) 𝒇= 𝒌-dim. + “ 𝐿 0 -noise” Linear sketch size: 𝒌 + O(log 𝑛𝑜𝑖𝑠𝑒 0 ) Optimal (but only existentially, i.e. ∃𝒇:…) 𝐿 1 -noise in the Fourier domain (via [Grolmusz’97]) 𝒇 = 𝒌-dim. + “Fourier 𝐿 1 -noise” Linear sketch size: 𝒌+𝑂( 𝑛𝑜𝑖𝑠𝑒 ) Example = 𝒌-dim. + small decision tree / DNF / etc.

Randomized Sketching: Hardness
𝒌 -dimensional affine extractors require 𝒌: 𝒇 is an affine-extractor for dim. 𝒌 if any restriction on a 𝒌-dim. affine subspace has values 0/1 w/prob. ≥0.1 each Example (inner product): 𝒇 𝒙 = ⊕ 𝑖=1 𝑛/2 𝑥 2𝑖−1 𝑥 2𝑖 Not 𝜸-concentrated on 𝒌 -dim. Fourier subspaces For ∀ 𝒌 -dim. Fourier subspace 𝐴 : 𝑆∉𝐴 𝒇 𝑺 2 ≥ 1−𝜸 Any 𝒌 -dim. linear sketch makes error 1− 𝜸 2 Converse doesn’t hold, i.e. concentration is not enough

Randomized Sketching: Hardness
Not 𝜸-concentrated on 𝑜(𝑛)-dim. Fourier subspaces: Almost all symmetric functions, i.e. 𝒇 𝒙 =𝒉( 𝑖 𝑥 𝑖 ) If not Fourier-close to constant or 𝑖=1 𝑛 𝑥 𝑖 E.g. Majority (not an extractor even for O( 𝑛 )) Tribes (balanced DNF) Recursive majority: 𝑀𝑎 𝑗 ∘𝑘 =𝑀𝑎 𝑗 3 ∘𝑀𝑎 𝑗 3 …∘𝑀𝑎 𝑗 3

Approximate Fourier Dimension
Not 𝜸-concentrated on 𝒌 -dim. Fourier subspaces ∀ 𝒌 -dim. Fourier subspace 𝐴: 𝑆∉𝐴 𝒇 𝑺 2 ≥ 1−𝜸 Any 𝒌 -dim. linear sketch makes error ½(1− 𝜸 ) Definition (Approximate Fourier Dimension) dim 𝜸 𝒇 = smallest 𝒅 such that 𝒇 is 𝜸-concentrated on some Fourier subspace of dimension 𝒅 𝒇 ( 𝑆 1 ) 𝒇 ( 𝑆 2 ) 𝒇 ( 𝑆 3 ) 𝒇 ( 𝑆 2 + 𝑆 3 ) 𝒇 ( 𝑆 1 + 𝑆 3 ) 𝒇 ( 𝑆 1 +𝑆 2 + 𝑆 3 ) 𝑆∈𝐴 𝒇 𝑺 2 ≥ 𝜸

Sketching over Uniform Distribution + Approximate Fourier Dimension
Sketching error over uniform distribution of 𝒙. dim 𝝐 𝒇 -dimensional sketch gives error 𝟏−𝝐: Fix dim 𝝐 𝒇 -dimensional 𝐴: 𝑆∈𝐴 𝒇 𝑺 2 ≥ 𝝐 Output: 𝒈 𝑥 =sign 𝑺∈𝐴 𝒇 𝑺 𝝌 𝑺 𝑥 : Pr 𝑥∼𝑈 −1,1 𝑛 𝒈 𝑥 =𝒇 𝑥 ≥𝝐⇒ error 𝟏−𝝐 We show a basic refinement ⇒ error 𝟏−𝝐 𝟐 Pick 𝜽 from a carefully chosen distribution Output: 𝒈 𝜽 𝑥 =sign 𝑺∈𝐴 𝒇 𝑺 𝝌 𝑺 𝑥 −𝜽 -1 +1 1

Sketching over Uniform Distribution
𝕯 𝜹 1,𝑈 (𝒇) = bit-complexity of best compression scheme allowing to compute 𝒇 with err. 𝜹 over uniform 𝒙 Thm: If 𝝐 𝟐 > 𝝐 𝟏 >0, dim 𝝐 𝟏 (𝒇) = dim 𝝐 𝟐 (𝒇) = 𝒅−1 then: 𝕯 𝜹 1,𝑈 (𝒇)≥𝒅, where 𝜹=( 𝝐 𝟐 − 𝝐 𝟏 )/4. Corollary: If 𝒇 (∅)<𝐶 for 𝐶<1 then there exists 𝒅: 𝕯 Θ 1 𝑛 1,𝑈 𝒇 ≥𝒅. Optimal up to error as 𝒅-dim. linear sketch has error 1− 𝝐 𝟐 𝟐 Tight for the Majority function

Application: Random Streams
𝒙∈ 0,1 𝑛 generated via a stream of updates Each update randomly flips a random coordinate Goal: maintain 𝒇 𝒙 during the stream (error 𝝐) Question: how much space necessary? Answer: 𝕯 𝜖 1,𝑈 and best algorithm is linear sketch After first O(𝑛 log⁡𝑛) updates input 𝒙 is uniform Big open question: Is the same true if 𝒙 is not uniform? True for VERY LONG ( Ω 𝑛 ) streams (via [LNW’14]) How about short ones?

1-way Communication Complexity of XOR-functions
Shared randomness Alice: 𝒙 ∈ 0,1 𝑛 Bob: 𝒚 ∈ 0,1 𝑛 𝑀(𝒙) 𝒇(𝒙⊕𝒚) Examples: 𝒇(z) = 𝑂 𝑅 𝑖=1 𝑛 ( 𝑧 𝑖 ) ⇒ (not) Equality 𝒇(z) = ( 𝑧 0 > d) ⇒ Hamming Distance > d 𝑅 𝜖 1 𝒇 = min.|M| so that Bob’s error prob. 𝜖

Communication Complexity of XOR-functions
Well-studied (often for 2-way communication): [Montanaro,Osborne], ArXiv’09 [Shi, Zhang], QIC’09, [Tsang, Wong, Xie, Zhang], FOCS’13 [O’Donnell, Wright, Zhao,Sun,Tan], CCC’14 [Hatami, Hosseini, Lovett], FOCS’16 Connections to log-rank conjecture [Lovett’14]: Even special case for XOR-functions still open

Deterministic 1-way Communication Complexity of XOR-functions
Alice: 𝒙 ∈ 0,1 𝑛 Bob: 𝒚 ∈ 0,1 𝑛 𝑀(𝒙) 𝒇(𝒙⊕𝒚) 𝐷 1 𝒇 = min.|M| so that Bob is always correct [Montanaro-Osborne’09]: 𝐷 1 𝒇 = 𝐷 𝑙𝑖𝑛 𝒇 𝐷 𝑙𝑖𝑛 𝒇 = deterministic lin. sketch complexity of 𝒇 𝐷 1 𝒇 = 𝐷 𝑙𝑖𝑛 𝒇 = “Fourier dimension of 𝒇”

1-way Communication Complexity of XOR-functions
Shared randomness Alice: 𝒙 ∈ 0,1 𝑛 Bob: 𝒚 ∈ 0,1 𝑛 𝑀(𝒙) 𝒇(𝒙⊕𝒚) 𝑅 𝜖 1 𝒇 = min.|M| so that Bob’s error prob. 𝜖 𝑅 𝜖 𝑙𝑖𝑛 𝒇 = rand. lin. sketch complexity (error 𝜖 ) 𝑅 𝜖 1 𝒇 ≤ 𝑅 𝜖 𝑙𝑖𝑛 𝒇 Question: 𝑅 𝜖 1 𝒇 ≈ 𝑅 𝜖 𝑙𝑖𝑛 𝒇 ? (true for symmetric)

Distributional 1-way Communication under Uniform Distribution
Alice: 𝒙 ∼𝑈( 0,1 𝑛 ) Bob: 𝒚 ∼𝑈( 0,1 𝑛 ) 𝑀(𝒙) 𝒇(𝒙⊕𝒚) 𝕯 𝜖 1,𝑈 𝒇 = min.|M| so that Bob’s error prob. 𝜖 is over the uniform distribution over (𝒙,𝒚) Enough to consider deterministic messages only Motivation: streaming/distributed with random input 𝑅 𝜖 1 𝒇 = sup 𝐷 𝕯 𝜖 1,𝐷 𝒇

𝕯 𝜖 1,𝑈 and Approximate Fourier Dimension
Thm: If 𝝐 𝟐 > 𝝐 𝟏 >0, dim 𝝐 𝟏 (𝒇) = dim 𝝐 𝟐 (𝒇) = 𝒅−1 then: 𝕯 𝜹 1,𝑈 (𝒇)≥𝒅, where 𝜹=( 𝝐 𝟐 − 𝝐 𝟏 )/4. 𝒚∈ 𝟎,𝟏 𝒏 𝒇(𝒙⊕𝒚) = 𝒇 𝒙 (𝒚) 𝒇 𝒙 𝟏 𝒇 𝒙 𝟐 00 01 10 11 𝒇 𝒙 𝟑 𝑀(𝒙)= 𝒙∈ 𝟎,𝟏 𝒏

If 𝑴 𝒙 =𝒅−1 average “rectangle” size = 2 𝒏−𝒅+𝟏 A subspace 𝐴 distinguishes 𝒙 𝟏 and 𝒙 𝟐 if: ∃𝑺∈𝐴 : 𝜒 𝑺 𝒙 𝟏 ≠ 𝜒 𝑺 𝒙 𝟐 Fix a 𝒅-dim. subspace 𝐴 𝒅 : typical 𝒙 𝟏 and 𝒙 𝟐 in a typical “rectangle” are distinguished by 𝐴 𝒅 Lem: If a 𝒅-dim. subspace 𝐴 𝒅 distinguishes 𝒙 𝟏 and 𝒙 𝟐 + 1) 𝒇 is 𝝐 𝟐 -concentrated on 𝐴 𝒅 2) 𝒇 not 𝝐 𝟏 -concentrated on any (𝒅−1)-dim. subspace Pr 𝑧∼𝑈( −1,1 𝑛 ) 𝒇 𝒙 𝟏 𝑧 ≠ 𝒇 𝒙 𝟐 𝑧 ≥ 𝝐 𝟐 − 𝝐 𝟏

Thm: If 𝝐 𝟐 > 𝝐 𝟏 >0, dim 𝝐 𝟏 (𝒇) = dim 𝝐 𝟐 (𝒇) = 𝒅−1 then: 𝕯 𝜹 1,𝑈 (𝒇)≥𝒅, Where 𝜹=( 𝝐 𝟐 − 𝝐 𝟏 )/4. Pr 𝑧∼𝑈( −1,1 𝑛 ) 𝒇 𝒙 𝟏 𝑧 ≠ 𝒇 𝒙 𝟐 𝑧 ≥ 𝝐 𝟐 − 𝝐 𝟏 Error for fixed 𝒚 = min( Pr 𝑥∈𝑅 [ 𝒇 𝒙 𝒚 =0], Pr 𝑥∈𝑅 [ 𝒇 𝒙 𝒚 =1]) Average error for (𝒙,𝒚)∈𝑅 = Ω( 𝝐 𝟐 − 𝝐 𝟏 ) 𝒚 𝒈 𝒙 𝟏 1 𝑅=“typical rectangle” 𝒈 𝒙 𝟐

Thanks! Questions? Other stuff:
Sketching Linear Threshold Functions: 𝑂 𝜃 𝑚 log 𝜃 𝑚 Resolves a commuication conjecture of [MO’09] Blog post:

Example: Majority 𝕯 𝑂(1/ 𝑛 ) 1,𝑈 (𝑴𝒂 𝒋 𝒏 )≥𝒏 Majority function:
𝑴𝒂 𝒋 𝒏 𝑧 1 ,…, 𝑧 𝑛 ≡ 𝑖=1 𝑛 𝑧 𝑖 ≥𝑛/2 𝑴𝒂 𝒋 𝒏 𝑺 only depends on 𝑺 𝑴𝒂 𝒋 𝒏 𝑺 =0 if |𝑺| is odd 𝑊 𝒌 𝑴𝒂 𝒋 𝒏 = 𝑺: 𝑺 =𝒌 𝑴𝒂 𝒋 𝒏 𝑺 =𝛼 𝒌 − ±𝑂 1 𝒌 (𝑛−1)-dimensional subspace with most weight: 𝑨 𝒏−𝟏 =𝑠𝑝𝑎𝑛( 1 , 2 ,…,{𝑛−1}) 𝑺∈ 𝑨 𝒏−𝟏 𝑴𝒂 𝒋 𝒏 𝑺 =1− 𝛾 𝑛 ±𝑂 𝑛 −3/2 Set 𝝐 𝟐 =1−𝑂( 𝑛 −3/2 ), 𝝐 𝟏 =1− 𝛾 𝑛 +𝑂 𝑛 −3/2 𝕯 𝑂(1/ 𝑛 ) 1,𝑈 (𝑴𝒂 𝒋 𝒏 )≥𝒏

Linear sketching with parities

Similar presentations

Presentation on theme: "Linear sketching with parities"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linear sketching with parities

Similar presentations

Presentation on theme: "Linear sketching with parities"— Presentation transcript:

Similar presentations

About project

Feedback