Streaming Symmetric Norms via Measure Concentration

Streaming Symmetric Norms via Measure Concentration
Robert Krauthgamer, Weizmann Institute of Science Joint with: Jaroslaw Blasiok, Vladimir Braverman, Stephen R. Chestnut, and Lin F. Yang Weizmann, January 2018 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

Today’s massive data sets require ever more efficient algorithms
Data-Stream Model Today’s massive data sets require ever more efficient algorithms Streaming algorithms (aka data-stream model): Input is a stream of tokens Typical model: One pass Low memory (called storage/space) Randomized main memory …

Frequency-Vector Model
Maintain 𝑣∈ ℝ 𝑛 (initially zero) Each input token is an “update”: Token (𝑗,𝑎) represents update 𝑣 𝑗 ← 𝑣 𝑗 +𝑎 Goal: At the “end”, estimate 𝑓(𝑣) Typically 𝑓: ℝ 𝑛 →ℝ is a norm, like 𝑙 𝑝 -norm for 0≤𝑝≤∞. Standard assumptions: Integral updates, usually just 𝑎∈{−1,+1} Stream length is 𝑚≤poly(𝑛), and every 𝑣 𝑗 ≤poly(𝑛) Why called frequency? It models the following important scenario: Stream is a sequence of items from large domain [𝑛] (e.g. IP addresses) Coordinate 𝑣 𝑗 counts the frequency of item 𝑗 Known as turnstile model: allow insertions + deletions, even deficit

Which Functions/Norms?
Fantastic progress on 𝑙 𝑝 -norms: Goal: (1+𝜖)-approximation of 𝑓(𝑣)= 𝑗 𝑣 𝑗 𝑝 1/𝑝 Essentially tight storage-bounds for every 𝑝 !! 0≤𝑝≤2: logarithmic storage 𝑂( log 𝑛 ) [Alon-Mattias-Szegedy’96, Indyk’00, …] 2<𝑝≤∞ : Polynomial storage 𝑂 ( 𝑛 1−2/𝑝 ) [Indyk-Woodruff’05, BarYossef-Jayram-Kumar-Sivakumar’04, …] Has led to many other results Other functions like Entropy norms, cascaded norms, additive functions Other problems like 𝑙 𝑝 -heavy hitters, 𝑙 𝑝 -sampling Powerful techniques like linear sketching, embeddings, heavy hitters, precision sampling, even algorithms for sparse recovery Communication complexity of indexing, set disjointness, gap hamming Too many references to do justice here!

Outstanding Questions
Q1: Which norms can be computed in the streaming model? Key examples: Earthmover distance, matrix norms (on ℝ 𝑛×𝑛 ) Q1’: Which distances can be computed in sketching model? Sketch = summary of the input (e.g., the memory image of a streaming algorithm at “intermediate time”) Q2: Is there a universal sketching for “all” norms? [See list of open problems at #30 and #5]

Broader Characterization of Norms?
We need generic framework or tools! Embedding = low-distortion mapping of “new” norm into “known” one Embeddings are “universal” for sketching norms [Andoni-K.-Razenshteyn’15] But are difficult to construct (and should be linear and explicit) And the “loss” is in approximation quality (not in storage) Effective techniques for “additive” functions 𝑓(𝑣)= 𝑗 𝜑 𝑣 𝑗 Heavy Hitters = Coordinates with “large” contribution to norm Hierarchical subsampling = virtual stream on subset of coordinates We characterize all symmetric norms

Symmetric Norms Definition: A norm ⋅ : ℝ 𝑛 →ℝ is called symmetric if
𝑥 is invariant under coordinate permutations 𝑥 is invariant under sign flips Implies monotonicity: ∀𝑖, 𝑥 𝑖 ≤ 𝑦 𝑖 ⇒ 𝑥 ≤ 𝑦 Examples: 𝑙 𝑝 -norms top-𝑘 norm, defined as Φ 𝑘 𝑥 = 𝑥 (1) +…+ 𝑥 (𝑘) 𝑘-support norm Non-examples: Cascaded norms Matrix operator norms

Modulus of Concentration
𝑏= sup { 𝑥 :𝑥∈ 𝑆 𝑛−1 } 𝑀=median { 𝑥 :𝑥∈ 𝑆 𝑛−1 } mc≔ 𝑏 𝑀 Theorem [from Levy’s Lemma]: If 𝑓: 𝑆 𝑛−1 →ℝ is 𝑏-Lipschitz with median 𝑀, then Pr 𝑥∈ 𝑆 𝑛−1 𝑓 𝑥 ∉(𝑀±2𝑏/ 𝑛 ) ≤1/3 , ∀𝜖>0, Pr 𝑥∈ 𝑆 𝑛−1 𝑓 𝑥 ∉ 1±𝜖 𝑀 ≤ 𝜋/2 𝑒 − 𝑀/𝑏 2 𝜖 2 𝑛/2 . Theorem [Dvoretzky, Milman]: For every norm ⋅ on ℝ 𝑛 and 𝜖>0 there is a linear subspace 𝑆 of dimension ≥𝑐 𝜖 𝑛/ mc 2 equipped with a Euclidean norm ⋅ 2 ′ such that ∀𝑥∈𝑆, 𝑥 2 ′ ≤ 𝑥 ≤ 1+𝜖 𝑥 2 ′

Examples Random vector hits the maximum Random vector hits the minimum
𝑙 1 𝑙 2 𝑙 𝑝 :𝑝>2 𝑏= 𝑛 𝑀≈ 𝑛 mc 𝑙 1 ≈1 Θ (1) 𝑏=1 𝑀=1 mc 𝑙 2 =1 Θ (1) 𝑏=1 𝑀≈ 𝑛 1 𝑝 − 1 2 mc 𝑙 𝑝 ≈ 𝑛 1 2 − 1 𝑝 Θ( 𝑛 1−2/𝑝 ) Streaming Complexity:

Wild Guess Is Θ mc 2 the optimal space complexity for stream-computing every symmetric norm? Define ⋅ = max ⋅ ∞ , 1 𝑛 ⋅ 1 𝑏 =1, 𝑀=Θ 1 ⇒ mc=𝑂(1) It contains a copy of 𝑙 ∞ of dimension 𝑛 , thus space is Ω 𝑛 . Reminder: For 𝑙 1 : 𝑏 = 𝑛 , 𝑀=Θ 𝑛 ⇒ mc=𝑂(1) For 𝑙 ∞ : 𝑏 =1, 𝑀=Θ log 𝑛 𝑛 ⇒ mc=𝑂 𝑛 log 𝑛 No! Lesson: Subspaces can be an obstacle!

Main Result mmc≔ max 𝑘≤𝑛 𝑏 (𝑘) 𝑀 (𝑘)
Theorem: Let ⋅ be a symmetric norm on ℝ 𝑛 . Then There is a 1-pass algorithm that (1+𝜖)-approximates the norm using mmc 2 ⋅poly 1 𝜖 log 𝑛 bits of storage. Moreover, every such algorithm requires Ω mmc 2 bits of storage. Maximum Modulus of Concentration: mmc≔ max 𝑘≤𝑛 𝑏 (𝑘) 𝑀 (𝑘) where 𝑏 (𝑘) = sup 𝑥,0,…,0 :𝑥∈ 𝑆 𝑘−1 𝑀 (𝑘) = median 𝑥,0,…,0 :𝑥∈ 𝑆 𝑘−1

Two Examples Top-𝑘 norm: Φ 𝑘 𝑥 = 𝑥 (1) +…+ 𝑥 (𝑘) mmc=Θ 𝑛 𝑘
Φ 𝑘 𝑥 = 𝑥 (1) +…+ 𝑥 (𝑘) mmc=Θ 𝑛 𝑘 Like in sparse recovery or PCA (picking signal/top eigenvalues) 𝑘-support norm: Unit ball 𝐵 𝑘 =conv 𝑥∈ ℝ 𝑛 : supp 𝑥 ≤𝑘, 𝑥 2 ≤1 mmc=𝑂 log 𝑛 Was used in machine learning

Algorithmic Outline Inspired by [Indyk-Woodruff’05] Analysis:
0. Assume wlog coordinates are nonnegative and sorted 1. Round the coordinates to powers of 1+𝜖, called levels 2. Forget some levels (of low contribution) 3. Estimate size of remaining levels Here, “storage” will be governed by mmc

1. Round the Coordinates Let 𝑉∈ ℝ 𝑛 be the vector 𝑣 after rounding
By monotonicity: 𝑣 is changed by factor ≤1+𝜖 rounding

2. Forget Some Levels Let 𝑉 𝑖 ∈ ℝ 𝑛 have the level 𝑖 coordinates of 𝑉 (i.e., zero all other coordinates) Definition: Level 𝑖 is 𝛽-contributing if 𝑉 𝑖 ≥𝛽 𝑉 Let 𝑉′∈ ℝ 𝑛 have all 𝛽-contributing levels of 𝑉 (i.e., zero every non-contributing level) Lemma: 𝑉′ ≥(1−𝛽 log 1+𝜖 𝑛 ) 𝑉

2½. Analysis of Medians Let 1 ( 𝑛 ′ ) be the vector with 𝑛’ ones (padded with zeros). Lemma 1 (Flat Median): For all 1≤𝑛’≤𝑛, 1 𝑛 ′ 1 ( 𝑛 ′ ) ≃𝑀 𝑛 ′ Proof uses Levy’s Lemma (measure concentration on the sphere) Lemma 2 (Median Monotonicity): For all 1≤𝑛’≤𝑛’’≤𝑛, 1 𝑛 ′ ( 𝑛 ′ ) ≲mmc⋅ 1 𝑛 ′′ ( 𝑛 ′′ ) Proof: 1 𝑛 ′ 1 ( 𝑛 ′ ) ≤ 𝑏 𝑛 ′′ ≤mmc⋅ 𝑀 𝑛 ′′ ≃mmc⋅ 𝑛 ′′ 1 ( 𝑛 ′′ )

3. Estimate Contributing Levels
Let 𝑏 𝑖 denote the cardinality of level 𝑖 Lemma 3 (Important Levels): If level 𝑖 is 𝛽-contributing then 𝑏 𝑖 1+𝜖 2𝑖 ≳ 𝛽 2 𝑚𝑚 𝑐 2 𝑗<𝑖 𝑏 𝑗 1+𝜖 2𝑗 (compares 𝑉 𝑖 vs. 𝑉 <𝑖 ) 𝑏 𝑖 ≳ 𝛽 2 𝑚𝑚 𝑐 2 𝑗>𝑖 𝑏 𝑗 (compares size of level) Proof relies on our Median Analysis Estimate 𝑏 𝑖 of all important levels by approach of [IW05] ( 𝑙 2 heavy hitters via CountSketch) Lemma 4: Given estimates 1−𝜖 𝑏 𝑖 ≤ 𝑏 𝑖 ≤ 𝑏 𝑖 , their corresponding 𝑉 ∈ ℝ 𝑛 satisfies 1−𝜖 𝑉 𝑖 ≤ 𝑉 𝑖 ≤ 𝑉 𝑖 Proof relies on norm being symmetric

The Lower Bound Communication complexity of Set Disjointness 𝑡 players
… 𝑡 players Case 1: unique intersection 1 1 1 1 𝑺 𝟏 𝑺 𝟐 𝑺 𝟑 𝑺 𝒕 player 𝑖 has set 𝑆 𝑖 ⊂[𝑛]

… 𝑡 players 1 1 1 Case 2: no intersection 𝑺 𝟏 𝑺 𝟐 𝑺 𝟑 𝑺 𝒕

… 𝑡 players Case 1: unique intersection 1 1 1 Case 2: no intersection One-way communication Player 𝑡 outputs decision Every randomized protocol must communicate total of Ω 𝑛 𝑡 bits [Chakrabarti-Khot-Sun’03, Gronemeier’09] 𝑺 𝟏 𝑺 𝟐 𝑺 𝟑 𝑺 𝒕

Streaming Algorithm yields a Protocol
… 𝑀(𝐴) 𝑀(𝐴) 𝑀(𝐴) 𝑀(𝐴) 𝑺 𝟏 𝑺 𝟐 𝑺 𝟑 𝑺 𝒕 The protocol: Map each 𝑆 𝑖 to a stream of updates 𝑓 𝑆 𝑖 Player 1 runs streaming algorithm 𝐴 on 𝑓( 𝑆 1 ) Passes the memory content 𝑀(𝐴) to next player. Each player 𝑖 continues execution of algorithm 𝐴 on her 𝑓 𝑆 𝑖 , etc. Last player outputs a result based on the output of 𝐴. If players succeed whp, then total communication is Ω(𝑛/𝑡) Thus, at least one message has size Ω(𝑛/ 𝑡 2 ) Thus, 𝐴 requires storage Ω(𝑛/ 𝑡 2 )

The Reduction Given a symmetric norm ⋅
Fix “bad” 𝑣∈ 𝑆 𝑛−1 that attains the maximum 𝑣 =𝑏 Set number of players 𝑡= 𝑛 mmc ≈ 𝑛 𝑀 𝑏 Players have 𝑛 2 shared values 𝑍 𝑖, 𝑗 ∼𝑁 0,1 for 𝑖,𝑗∈[𝑛] For intuition think 𝑍 𝑖, 𝑗 ∈{±1} Players implicitly agree on 𝑛 vectors 𝑉 1 = 𝑣 1 𝑍 11 , 𝑣 2 𝑍 12 , …, 𝑣 𝑛 𝑍 1𝑛 ∈ ℝ 𝑛 𝑉 2 = 𝑣 2 𝑍 21 , 𝑣 3 𝑍 22 , …, 𝑣 1 𝑍 2𝑛 ∈ ℝ 𝑛 … 𝑉 𝑛 =( 𝑣 𝑛 𝑍 𝑛1 , 𝑣 1 𝑍 𝑛2 , … ,𝑣 𝑛−1 𝑍 𝑛𝑛 )∈ ℝ 𝑛 Each player 𝑖: adds to the stream all vectors 𝑉 𝑗 s.t. 𝑗∈ 𝑆 𝑖

Analysis If there is no intersection: the final (total) vector is
𝑈= 𝑗∈ 𝑆 1 ∪ 𝑆 2 ∪…∪ 𝑆 𝑡 𝑉 𝑗  essentially a random vector! If there is unique intersection, say 𝑘∈[𝑛], the final vector is 𝑊= 𝑗∈ 𝑆 1 ∪ 𝑆 2 ∪…∪ 𝑆 𝑡 \ {𝑘} 𝑉 𝑗 +𝑡 𝑉 𝑘 Lemma: with constant probability, 𝑈 ≤40 𝑛 𝑀 (because each entry has magnitude O(1)) 𝑊 ≥60 𝑛 𝑀 (because 𝑡 𝑉 𝑘 ≳𝑡𝑏= 𝑛 𝑀)

Thank You! Concluding Remarks Extensions:
Tight tradeoff between storage (space) and accuracy (approximation) Further Directions: Simpler algorithm? Arbitrary norms? Matrix norms? Several papers recently... Reductions between problems? Thank You!

Streaming Symmetric Norms via Measure Concentration

Similar presentations

Presentation on theme: "Streaming Symmetric Norms via Measure Concentration"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Streaming Symmetric Norms via Measure Concentration

Similar presentations

Presentation on theme: "Streaming Symmetric Norms via Measure Concentration"— Presentation transcript:

Similar presentations

About project

Feedback