Advances in Linear Sketching over Finite Fields

Slides:

Advertisements

Similar presentations

Estimating Distinct Elements, Optimally

Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT

Quantum t-designs: t-wise independence in the quantum world Andris Ambainis, Joseph Emerson IQC, University of Waterloo.

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.

Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.

An Introduction to Randomness Extractors Ronen Shaltiel University of Haifa Daddy, how do computers get random bits?

Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.

Models of Computation Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 1, Lecture 2.

Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS.

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.

On the Power of Adaptivity in Sparse Recovery Piotr Indyk MIT Joint work with Eric Price and David Woodruff, 2011.

Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.

Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

Yi Wu (CMU) Joint work with Parikshit Gopalan (MSR SVC) Ryan O’Donnell (CMU) David Zuckerman (UT Austin) Pseudorandom Generators for Halfspaces TexPoint.

Probably Approximately Correct Model (PAC)

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.

Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

Preference elicitation Communicational Burden by Nisan, Segal, Lahaie and Parkes October 27th, 2004 Jella Pfeiffer.

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)

Data Stream Algorithms Lower Bounds Graham Cormode

1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.

The Message Passing Communication Model David Woodruff IBM Almaden.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Imperfectly Shared Randomness

Information Complexity Lower Bounds

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

Vitaly Feldman and Jan Vondrâk IBM Research - Almaden

Computational Learning Theory

Approximating the MST Weight in Sublinear Time

Open Problems in Streaming

Approximate Matchings in Dynamic Graph Streams

Communication Amid Uncertainty

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD)

Background: Lattices and the Learning-with-Errors problem

COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.

Effcient quantum protocols for XOR functions

Tight Fourier Tails for AC0 Circuits

Sketching and Embedding are Equivalent for Norms

CS 154, Lecture 6: Communication Complexity

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

CIS 700: “algorithms for Big Data”

CIS 700: “algorithms for Big Data”

Linear sketching with parities

The Curve Merger (Dvir & Widgerson, 2008)

Near-Optimal (Euclidean) Metric Compression

Sublinear Algorihms for Big Data

Linear sketching over

CSCI B609: “Foundations of Data Science”

Linear sketching with parities

CSCI B609: “Foundations of Data Science”

Classical Algorithms from Quantum and Arthur-Merlin Communication Protocols Lijie Chen MIT Ruosong Wang CMU.

Imperfectly Shared Randomness

Streaming Symmetric Norms via Measure Concentration

CS21 Decidability and Tractability

Lecture 6: Counting triangles Dynamic graphs & sampling

CIS 700: “algorithms for Big Data”

Shengyu Zhang The Chinese University of Hong Kong

Presentation transcript:

Advances in Linear Sketching over Finite Fields Grigory Yaroslavtsev (Indiana University, Bloomington) http://grigory.us CCC’18, with Sampath Kannan (U. Pennsylvania), Elchanan Mossel (MIT) and Swagato Sanyal (NUS)

𝔽 2 -Sketching Input 𝒙∈ 0,1 𝑛 Parity = Linear function over 𝔾 𝐹 2 : ⊕ 𝑖∈𝑆 𝑥 𝑖 Deterministic linear sketch: set of 𝒌 parities: ℓ 𝒙 = ⊕ 𝑖 1 ∈ 𝑆 1 𝑥 𝑖 1 ; ⊕ 𝑖 2 ∈ 𝑆 2 𝑥 𝑖 2 ;…; ⊕ 𝑖 𝒌 ∈ 𝑆 𝒌 𝑥 𝑖 𝒌 E.g. 𝑥 4 ⊕ 𝑥 2 ⊕ 𝑥 42 ; 𝑥 239 ⊕ 𝑥 30 ; 𝑥 566 ;… Randomized linear sketch: distribution over 𝒌 parities (random 𝑆 1 , 𝑆 2 , …, 𝑆 𝒌 ):

Linear sketching over 𝔽 2 Given 𝒇 𝒙 : 0,1 𝑛 →{0,1} Question: Can one recover 𝒇(𝒙) from a small (𝒌≪𝑛) linear sketch over 𝔽 2 ? Allow randomized computation (99% success) Probability over choice of random sets Sets are known at recovery time Recovery is deterministic (w.l.o.g)

Puzzle: Open Problem 78 on Sublinear.info Shared randomness Alice: 𝒙 ∈ 0,1 𝑛 Bob: 𝒚 ∈ 0,1 𝑛 𝑀(𝒙) 𝒇 + =𝒇(𝒙⊕𝒚) Conjecture: (Almost) shortest message is a randomized 𝔽 2 -sketch https://sublinear.info/index.php?title=Open_Problems:78

Multi-player version … Thm [Hosseini, Lovett, Y.’18; ECCC TR18-169] Shared randomness 𝒙 𝟏 ∈ 0,1 𝑛 𝒙 𝟐 ∈ 0,1 𝑛 𝒙 𝟑 ∈ 0,1 𝑛 𝒙 𝑵 ∈ 0,1 𝑛 𝒙 𝒊 ∈ 0,1 𝑛 … 𝒇 + =𝒇( 𝒙 𝟏 ⊕…⊕ 𝒙 𝑵 ) 𝑀 1 ( 𝒙 𝟏 ) 𝑀 2 ( 𝒙 𝟐 , 𝑀 1 ) 𝑀 𝑖 ( 𝒙 𝒊 , 𝑀 𝑖−1 ) Thm [Hosseini, Lovett, Y.’18; ECCC TR18-169] For 𝑁≥10𝑛 a protocol where each 𝑀 𝑖 is at most c bits ⇒ ∃ a randomized 𝔽 2 -sketch of size 𝑂(𝑐)

Multi-player version over 𝑍 𝑝 Shared randomness 𝒙 𝟏 ∈𝑍 𝑝 𝑛 𝒙 𝟐 ∈𝑍 𝑝 𝑛 𝒙 𝟑 ∈𝑍 𝑝 𝑛 𝒙 𝑵 ∈𝑍 𝑝 𝑛 𝒙 𝒊 ∈𝑍 𝑝 𝑛 … 𝒇 + =𝒇( 𝒙 𝟏 +…+ 𝒙 𝑵 ) 𝑀 1 ( 𝒙 𝟏 ) 𝑀 2 ( 𝒙 𝟐 , 𝑀 1 ) 𝑀 𝑖 ( 𝒙 𝒊 , 𝑀 𝑖−1 ) Thm [Hosseini, Lovett, Y.’18; ECCC TR18-169] For 𝑁≥10𝑛 log 𝑝 a protocol where each 𝑀 𝑖 is at most c bits ⇒ ∃ a randomized 𝑍 𝑝 -sketch of dimension 𝑂(𝑐) *Holds even for 𝑀 𝑖 ( 𝒙 𝒊 , 𝑀 1 , 𝑀 2 , …, 𝑀 𝑖−1 ) instead of 𝑀 𝑖 ( 𝒙 𝒊 , 𝑀 𝑖−1 )

Motivation: Distributed Computing Distributed computation among 𝑴 machines: 𝒙=( 𝒙 𝟏 , 𝒙 𝟐 , …, 𝒙 𝑴 ) (more generally 𝒙= ⊕ 𝑖=1 𝑴 𝒙 𝒊 ) 𝑴 machines can compute sketches locally: ℓ 𝒙 𝟏 , …, ℓ( 𝒙 𝑴 ) Send them to the coordinator who computes: ℓ 𝑖 𝒙 = ℓ 𝑖 𝒙 𝟏 ⊕⋯⊕ ℓ 𝑖 ( 𝒙 𝑴 ) (coordinate-wise XORs) Coordinator computes 𝒇 𝒙 with 𝒌𝑴 communication 1 𝒙 𝒙 𝟏 𝒙 𝟐

Motivation: Streaming 𝒙 generated through a sequence of updates Updates 𝑖 1 ,…, 𝑖 𝑚 : update 𝑖 𝑡 flips bit at position 𝑖 𝑡 𝒙 𝟎 Updates: (1, 3, 8, 3) 𝒙 𝟏 1 𝒙 𝟐 1 𝒙 𝟑 1 𝒙 1 ℓ 𝒙 allows to recover 𝒇(𝒙) with 𝒌 bits of space

Frequently Asked Questions Q: Why 𝔽 2 updates instead of ±1? Often doesn’t help if you know the sign Q: How to store random sets? Derandomize using Nisan’s PRG – extra O(log n) factor in space Q: Some applications? Essentially all dynamic graph streaming algorithms can be based on 𝐿 0 -sampling 𝐿 0 -sampling can be done optimally using 𝔽 2 -sketching [Kapralov et al. FOCS’17] Q: Why not allow to compute 𝒇 approximately? Stay tuned

Deterministic vs. Randomized Fact: 𝒇 has a deterministic sketch if and only if 𝒇=𝒈( ⊕ 𝑖 1 ∈ 𝑆 1 𝑥 𝑖 1 ; ⊕ 𝑖 2 ∈ 𝑆 2 𝑥 𝑖 2 ;…; ⊕ 𝑖 𝑘 ∈ 𝑆 𝑘 𝑥 𝑖 𝑘 ) Equivalent to “𝒇 has Fourier dimension 𝒌" Randomization can help: OR: 𝒇 𝒙 = 𝑥 1 ∨⋯∨ 𝑥 𝑛 Has “Fourier dimension” =𝑛 Pick 𝒕= log 1/𝜹 random sets 𝑆 1 ,…, 𝑆 𝒕 If there is 𝑗 such that ⊕ 𝑖∈ 𝑆 𝑗 𝑥 𝑖 =1 output 1, otherwise output 0 Error probability 𝜹

Fourier Analysis 𝒇 𝑥 1 , …, 𝑥 𝑛 : 0,1 𝑛 →{0,1} Notation switch: 𝒇 𝑥 1 , …, 𝑥 𝑛 : 0,1 𝑛 →{0,1} Notation switch: 0→1 1→−1 𝒇 ′ : −1,1 𝑛 →{−1,1} Functions as vectors form a vector space: 𝒇: −1,1 𝑛 →{−1,1}⇔𝒇∈ {−1,1} 2 𝑛 Inner product on functions = “correlation”: 𝒇,𝒈 = 2 −𝑛 𝑥∈ −1,1 𝑛 𝒇 𝑥 𝒈(𝑥) = 𝔼 𝑥∼ −1,1 𝑛 𝒇 𝑥 𝒈 𝑥 𝒇 2 = 𝑓,𝑓 = 𝔼 𝑥∼ −1,1 𝑛 𝒇 2 𝑥 =1 (for Boolean only)

“Main Characters” are Parities For 𝑺⊆[𝑛] let character 𝝌 𝑺 (𝑥)= 𝑖∈𝑺 𝑥 𝑖 Fact: Every function 𝒇: −1,1 𝑛 →{−1,1} is uniquely represented as a multilinear polynomial 𝒇 𝑥 1 , …, 𝑥 𝑛 = 𝑺⊆[𝑛] 𝒇 𝑺 𝝌 𝑺 (𝑥) 𝒇 𝑺 a.k.a. Fourier coefficient of 𝒇 on 𝑺 𝒇 𝑺 ≡ 𝒇, 𝝌 𝑺 = 𝔼 𝑥∼ −1,1 𝑛 𝒇 𝑥 𝝌 𝑺 𝑥 𝑺 𝒇 𝑺 2 =1 (Parseval)

Fourier Dimension Fourier sets 𝑺 ≡ vectors in 𝔽 2 𝑛 “𝒇 has Fourier dimension 𝒌“ = a 𝒌-dimensional subspace 𝐴 𝒌 in Fourier domain has all weight 𝑺⊆ 𝐴 𝒌 𝒇 𝑺 2 =1 𝒇 𝑥 1 , …, 𝑥 𝑛 = 𝑺⊆[𝑛] 𝒇 𝑺 𝝌 𝑺 (𝒙) = 𝑺⊆ 𝐴 𝒌 𝒇 𝑺 𝝌 𝑺 (𝒙) Pick a basis 𝑺 𝟏 , …, 𝑺 𝒌 in 𝐴 𝒌 : Sketch: 𝝌 𝑺 𝟏 (𝒙), …, 𝝌 𝑺 𝒌 (𝒙) For every 𝑺∈ 𝐴 𝒌 there exists 𝒁⊆ 𝒌 : 𝑺= ⊕ 𝒊∈𝒁 𝑺 𝒊 𝝌 𝑺 𝒙 = ⊕ 𝒊∈𝒁 𝝌 𝑺 𝒊 (𝒙)

Deterministic Sketching and Noise Suppose “noise” has bounded norm 𝒇 =𝒈⊕𝒉= 𝒌-dimensional ⊕ “ noise” Sparse Fourier noise (via [Sanyal’15]) 𝒇 = 𝒌-dimensional ⊕ “Fourier 𝐿 0 -noise” 𝑛𝑜𝑖𝑠𝑒 0 = # non-zero Fourier coefficients of noise (aka “Fourier sparsity”) Linear sketch size: 𝒌+𝑂( 𝑛𝑜𝑖𝑠𝑒 0 1/2 ) Our work: can’t be improved even with randomness and even for uniform 𝑥, e.g for ``addressing function’’.

How Randomization Handles Noise 𝐿 0 -noise in original domain (via hashing a la OR) 𝒇= 𝒌-dim. ⊕ “ 𝐿 0 -noise” 𝔽 2 -sketch size: 𝒌 + O(log 𝑛𝑜𝑖𝑠𝑒 0 ) Optimal (but only existentially, i.e. ∃𝒇:…) 𝐿 1 -noise in the Fourier domain (via [Bruck, Smolensky ‘92; Grolmusz’97]) 𝒇 = 𝒌-dim. ⊕ “Fourier 𝐿 1 -noise” 𝔽 2 -sketch size: 𝒌+𝑂( 𝑛𝑜𝑖𝑠𝑒 1 2 ) Example = 𝒌-dim. ⊕ small decision tree / DNF / etc.

Randomized Sketching: Hardness Not 𝜸-concentrated on 𝒌 -dim. Fourier subspaces For ∀ 𝒌 -dim. Fourier subspace 𝐴 : 𝑆∉𝐴 𝒇 𝑺 2 ≥ 1−𝜸 Thm. Any 𝒌 -dim. 𝔽 2 -sketch makes error 1− 𝜸 2 Converse doesn’t hold, i.e. concentration is not enough

Randomized Sketching: Hardness Not 𝜸-concentrated on 𝑜(𝑛)-dim. Fourier subspaces: Almost all symmetric functions, i.e. 𝒇 𝒙 =𝒉( 𝑖 𝑥 𝑖 ) If not Fourier-close to constant or 𝑖=1 𝑛 𝑥 𝑖 E.g. Majority (not an extractor even for O( 𝑛 )) Tribes (balanced DNF) Recursive majority: 𝑀𝑎 𝑗 ∘𝑘 =𝑀𝑎 𝑗 3 ∘𝑀𝑎 𝑗 3 …∘𝑀𝑎 𝑗 3

Approximate Fourier Dimension Not 𝜸-concentrated on 𝒌 -dim. Fourier subspaces ∀ 𝒌 -dim. Fourier subspace 𝐴: 𝑆∉𝐴 𝒇 𝑺 2 ≥ 1−𝜸 Any 𝒌 -dim. linear sketch makes error ½(1− 𝜸 ) Definition (Approximate Fourier Dimension) dim 𝜸 𝒇 = smallest 𝒅 such that 𝒇 is 𝜸-concentrated on some Fourier subspace of dimension 𝒅 𝒇 ( 𝑆 1 ) 𝒇 ( 𝑆 2 ) 𝒇 ( 𝑆 3 ) 𝒇 ( 𝑆 2 + 𝑆 3 ) 𝒇 ( 𝑆 1 + 𝑆 3 ) 𝒇 ( 𝑆 1 +𝑆 2 + 𝑆 3 ) 𝑆∈𝐴 𝒇 𝑺 2 ≥ 𝜸

Sketching over Uniform Distribution + Approximate Fourier Dimension Sketching error over uniform distribution of 𝒙. dim 𝝐 𝒇 -dimensional sketch gives error 𝟏−𝝐: Fix dim 𝝐 𝒇 -dimensional 𝐴: 𝑆∈𝐴 𝒇 𝑺 2 ≥ 𝝐 Output: 𝒈 𝑥 =sign 𝑺∈𝐴 𝒇 𝑺 𝝌 𝑺 𝑥 : Pr 𝑥∼𝑈 −1,1 𝑛 𝒈 𝑥 =𝒇 𝑥 ≥𝝐⇒ error 𝟏−𝝐 We show a basic refinement ⇒ error 𝟏−𝝐 𝟐 Pick 𝜽 from a carefully chosen distribution Output: 𝒈 𝜽 𝑥 =sign 𝑺∈𝐴 𝒇 𝑺 𝝌 𝑺 𝑥 −𝜽 -1 +1 1

1-way Communication Complexity of XOR-functions Shared randomness Alice: 𝒙 ∈ 0,1 𝑛 Bob: 𝒚 ∈ 0,1 𝑛 𝑀(𝒙) 𝒇 + =𝒇(𝒙⊕𝒚) Examples: 𝒇(z) = 𝑂 𝑅 𝑖=1 𝑛 ( 𝑧 𝑖 ) ⇒ 𝒇 + : (not) Equality 𝒇(z) = ( 𝑧 0 > d) ⇒ 𝒇 + : Hamming Dist > d 𝑅 𝜖 1 𝒇 + = min.|M| so that Bob’s error prob. 𝜖

Communication Complexity of XOR-functions Well-studied (often for 2-way communication): [Montanaro,Osborne], ArXiv’09 [Shi, Zhang], QIC’09, [Tsang, Wong, Xie, Zhang], FOCS’13 [O’Donnell, Wright, Zhao,Sun,Tan], CCC’14 [Hatami, Hosseini, Lovett], FOCS’16 Connections to log-rank conjecture [Lovett’14]: Even special case for XOR-functions still open

Deterministic 1-way Communication Complexity of XOR-functions Alice: 𝒙 ∈ 0,1 𝑛 Bob: 𝒚 ∈ 0,1 𝑛 𝑀(𝒙) 𝒇 + =𝒇(𝒙⊕𝒚) 𝐷 1 𝒇 + = min.|M| so that Bob is always correct [Montanaro-Osborne’09]: 𝐷 1 𝒇 = 𝐷 𝑙𝑖𝑛 𝒇 𝐷 𝑙𝑖𝑛 𝒇 = deterministic 𝔽 2 -sketch complexity of 𝒇 + 𝐷 1 𝒇 + = 𝐷 𝑙𝑖𝑛 𝒇 = Fourier dimension of 𝒇

1-way Communication Complexity of XOR-functions Shared randomness Alice: 𝒙 ∈ 0,1 𝑛 Bob: 𝒚 ∈ 0,1 𝑛 𝑀(𝒙) 𝒇(𝒙⊕𝒚) 𝑅 𝜖 1 𝒇 = min.|M| so that Bob’s error prob. 𝜖 𝑅 𝜖 𝑙𝑖𝑛 𝒇 + = rand. 𝔽 2 -sketch complexity (error 𝜖 ) 𝑅 𝜖 1 𝒇 + ≤ 𝑅 𝜖 𝑙𝑖𝑛 𝒇 Conjecture: 𝑅 𝜖 1 𝒇 + ≈ 𝑅 𝜖 𝑙𝑖𝑛 𝒇 ?

𝑅 𝜖 1 𝒇 + ≈ 𝑅 𝜖 𝑙𝑖𝑛 𝒇 ? As we show holds for: 𝑅 𝜖 1 𝒇 + ≈ 𝑅 𝜖 𝑙𝑖𝑛 𝒇 ? As we show holds for: Majority, Tribes, recursive majority, addressing function (Almost all) symmetric functions Degree-𝒅 𝔽 2 -polynomials: 𝑅 5𝜖 𝑙𝑖𝑛 𝒇 =𝑂(𝒅 𝑅 𝜖 1 𝒇 + ) Analogous question for 2-way is wide open: [HHL’16] 𝑄 𝜖 ⊕−𝑑𝑡 𝒇 =𝑝𝑜𝑙𝑦( 𝑅 𝜖 𝒇 + )?

Distributional 1-way Communication under Uniform Distribution Alice: 𝒙 ∼𝑈( 0,1 𝑛 ) Bob: 𝒚 ∼𝑈( 0,1 𝑛 ) 𝑀(𝒙) 𝒇(𝒙⊕𝒚) 𝑅 𝜖 1 𝒇 = sup 𝐷 𝕯 𝜖 1,𝐷 𝒇 𝕯 𝜖 1,𝑈 𝒇 = min.|M| so that Bob’s error prob. 𝜖 is over the uniform distribution over (𝒙,𝒚) Enough to consider deterministic messages only Motivation: streaming/distributed with random input

Communication for Uniform Distribution Thm: If dim 𝝐 𝒇 = 𝒅−1 then 𝕯 1− 𝝐 𝟔 1,𝑈 𝒇 + ≥ 𝒅 6 . Optimal up to constant factors in dimension and error 𝒅-dim. linear sketch has error 1− 𝝐 𝟐

Application: Random Streams 𝒙∈ 0,1 𝑛 generated via a stream of updates Each update flips a random coordinate Goal: maintain 𝒇 𝒙 during the stream (error prob. 𝝐) Question: how much space necessary? Answer: 𝕯 𝜖 1,𝑈 and best algorithm is linear sketch After first O(𝑛 log⁡𝑛) updates input 𝒙 is uniform Big open question: Is the same true if 𝒙 is not uniform? True for VERY LONG ( 2 2 2 Ω 𝑛 ) streams (via [LNW’14]) True for streams of length (𝑂( 𝑛 2 )) (via [HLY’18])

Approximate 𝔽 2 -Sketching [Y.’17] 𝒇 𝑥 1 , …, 𝑥 𝑛 : 0,1 𝑛 →ℝ Normalize: 𝒇 𝟐 =1 Question: Can one compute 𝒇′: 𝔼[ 𝒇 𝒙 − 𝒇 ′ 𝒙 𝟐 ≤𝝐] from a small (𝒌≪𝑛) linear sketch over 𝔽 2 ?

Approximate 𝔽 2 -Sketching [Y.’17] Interesting facts: All results under the uniform distribution generalize directly to approximate sketching 𝐿 1 -sampling has optimal dependence on parameters: Optimal dependence: O 𝑓 1 2 𝜖 Open: Is 𝐿 1 -sampling optimal for Boolean functions?

Approximate 𝔽 2 -Sketching of Valuation Functions [Y.,Zhou’18] Additive ( 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 ): Θ min |𝑤 1 2 𝜖 ,𝑛 (optimal via weighted Gap Hamming) Budget-additive (min(b, 𝑖=1 𝑛 𝑤 𝑖 𝑥 𝑖 )): Θ min |𝑤 1 2 𝜖 ,𝑛 Coverage: Optimal Θ 1 𝜖 (via 𝐿 1 -Sampling) Matroid rank (various results depending on rank 𝑟) 𝛼-Lipschitz submodular functions: Ω(𝑛) communication lower bound for 𝛼=Ω(1/𝑛) Uses a large family of matroids from [Balcan, Harvey’10]

Thanks! Questions? Other stuff [Y., Zhou 18]: Linear Threshold Functions: Θ 𝜃 𝑚 log 𝜃 𝑚 Resolves a communication conjecture of [MO’09] Simple neural nets: LTF(ORs), LTF(LTFs) Blog post: http://grigory.us/blog/the-binary-sketchman

Sketching over Uniform Distribution Thm: If dim 𝝐 𝒇 = 𝒅−1 then 𝕯 1− 𝝐 𝟔 1,𝑈 𝒇 + ≥ 𝒅 6 . Optimal up to error as 𝒅-dim. linear sketch has error 1− 𝝐 𝟐 Weaker: If 𝝐 𝟐 > 𝝐 𝟏 , dim 𝝐 𝟏 (𝒇) = dim 𝝐 𝟐 (𝒇) = 𝒅−1 then: 𝕯 𝜹 1,𝑈 (𝒇)≥𝒅, where 𝜹=( 𝝐 𝟐 − 𝝐 𝟏 )/4. Corollary: If 𝒇 (∅)<𝐶 for 𝐶<1 then there exists 𝒅: 𝕯 Θ 1 𝑛 1,𝑈 𝒇 ≥𝒅. Tight for the Majority function, etc.

𝕯 𝜖 1,𝑈 and Approximate Fourier Dimension Thm: If 𝝐 𝟐 > 𝝐 𝟏 >0, dim 𝝐 𝟏 (𝒇) = dim 𝝐 𝟐 (𝒇) = 𝒅−1 then: 𝕯 𝜹 1,𝑈 (𝒇)≥𝒅, where 𝜹=( 𝝐 𝟐 − 𝝐 𝟏 )/4. 𝒚∈ 𝟎,𝟏 𝒏 𝒇(𝒙⊕𝒚) = 𝒇 𝒙 (𝒚) 𝒇 𝒙 𝟏 𝒇 𝒙 𝟐 00 01 10 11 𝒇 𝒙 𝟑 𝑀(𝒙)= 𝒙∈ 𝟎,𝟏 𝒏

𝕯 𝜖 1,𝑈 and Approximate Fourier Dimension If 𝑴 𝒙 =𝒅−1 average “rectangle” size = 2 𝒏−𝒅+𝟏 A subspace 𝐴 distinguishes 𝒙 𝟏 and 𝒙 𝟐 if: ∃𝑺∈𝐴 : 𝜒 𝑺 𝒙 𝟏 ≠ 𝜒 𝑺 𝒙 𝟐 Lem 1: Fix a 𝒅-dim. subspace 𝐴 𝒅 : typical 𝒙 𝟏 and 𝒙 𝟐 in a typical “rectangle” are distinguished by 𝐴 𝒅 Lem 2: If a 𝒅-dim. subspace 𝐴 𝒅 distinguishes 𝒙 𝟏 and 𝒙 𝟐 + 1) 𝒇 is 𝝐 𝟐 -concentrated on 𝐴 𝒅 2) 𝒇 is not 𝝐 𝟏 -concentrated on any (𝒅−1)-dim. subspace ⇒ Pr 𝑧∼𝑈( −1,1 𝑛 ) 𝒇 𝒙 𝟏 𝑧 ≠ 𝒇 𝒙 𝟐 𝑧 ≥ 𝝐 𝟐 − 𝝐 𝟏

𝕯 𝜖 1,𝑈 and Approximate Fourier Dimension Thm: If 𝝐 𝟐 > 𝝐 𝟏 >0, dim 𝝐 𝟏 (𝒇) = dim 𝝐 𝟐 (𝒇) = 𝒅−1 then: 𝕯 𝜹 1,𝑈 (𝒇)≥𝒅, Where 𝜹=( 𝝐 𝟐 − 𝝐 𝟏 )/4. Pr 𝑧∼𝑈( −1,1 𝑛 ) 𝒇 𝒙 𝟏 𝑧 ≠ 𝒇 𝒙 𝟐 𝑧 ≥ 𝝐 𝟐 − 𝝐 𝟏 Error for fixed 𝒚 = min( Pr 𝑥∈𝑅 [ 𝒇 𝒙 𝒚 =0], Pr 𝑥∈𝑅 [ 𝒇 𝒙 𝒚 =1]) Average error for (𝒙,𝒚)∈𝑅 = Ω( 𝝐 𝟐 − 𝝐 𝟏 ) 𝒚 𝒈 𝒙 𝟏 1 𝑅=“typical rectangle” 𝒈 𝒙 𝟐

Example: Majority 𝕯 𝑂(1/ 𝑛 ) 1,𝑈 (𝑴𝒂 𝒋 𝒏 )≥𝒏 Majority function: 𝑴𝒂 𝒋 𝒏 𝑧 1 ,…, 𝑧 𝑛 ≡ 𝑖=1 𝑛 𝑧 𝑖 ≥𝑛/2 𝑴𝒂 𝒋 𝒏 𝑺 only depends on 𝑺 𝑴𝒂 𝒋 𝒏 𝑺 =0 if |𝑺| is odd 𝑊 𝒌 𝑴𝒂 𝒋 𝒏 = 𝑺: 𝑺 =𝒌 𝑴𝒂 𝒋 𝒏 𝑺 =𝛼 𝒌 − 3 2 1±𝑂 1 𝒌 (𝑛−1)-dimensional subspace with most weight: 𝑨 𝒏−𝟏 =𝑠𝑝𝑎𝑛( 1 , 2 ,…,{𝑛−1}) 𝑺∈ 𝑨 𝒏−𝟏 𝑴𝒂 𝒋 𝒏 𝑺 =1− 𝛾 𝑛 ±𝑂 𝑛 −3/2 Set 𝝐 𝟐 =1−𝑂( 𝑛 −3/2 ), 𝝐 𝟏 =1− 𝛾 𝑛 +𝑂 𝑛 −3/2 𝕯 𝑂(1/ 𝑛 ) 1,𝑈 (𝑴𝒂 𝒋 𝒏 )≥𝒏