The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subsequence. Anna GalUT Austin Parikshit GopalanU. Washington.
Estimating the Sortedness of a Data Stream Parikshit GopalanU T Austin T. S. JayramIBM Almaden Robert KrauthgamerIBM Almaden Ravi KumarYahoo! Research.
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
An Optimal Algorithm for the Distinct Elements Problem
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Xiaoming Sun Tsinghua University David Woodruff MIT
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Data Stream Algorithms Frequency Moments
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak.
Distributional Property Estimation Past, Present, and Future Gregory Valiant (Joint work w. Paul Valiant)
Vladimir(Vova) Braverman UCLA Joint work with Rafail Ostrovsky.
ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.
Sketching for M-Estimators: A Unified Approach to Robust Regression
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Estimating Entropy for Data Streams Khanh Do Ba, Dartmouth College Advisor: S. Muthu Muthukrishnan.
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
Finding Frequent Items in Data Streams [Charikar-Chen-Farach-Colton] Paper report By MH, 2004/12/17.
Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
Information Theory for Data Streams David P. Woodruff IBM Almaden.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler.
Embedding and Sketching Sketching for streaming Alexandr Andoni (MSR)
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
Data Stream Algorithms Lower Bounds Graham Cormode
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
The Message Passing Communication Model David Woodruff IBM Almaden.
Clustering Data Streams A presentation by George Toderici.
Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)
Approximation Algorithms based on linear programming.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,
An Optimal Algorithm for Finding Heavy Hitters
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Information Complexity Lower Bounds
Stochastic Streams: Sample Complexity vs. Space Complexity
New Characterizations in Turnstile Streams with Applications
Streaming & sampling.
Approximate Matchings in Dynamic Graph Streams
Sublinear Algorithmic Tools 2
Sketching and Embedding are Equivalent for Norms
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Overview Massive data sets Streaming algorithms Regression
The Communication Complexity of Distributed Set-Joins
Range-Efficient Computation of F0 over Massive Data Streams
Streaming Symmetric Norms via Measure Concentration
Lecture 6: Counting triangles Dynamic graphs & sampling
Joint work with Morteza Monemizadeh
Presentation transcript:

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden

Data streams Algorithms access data in a sequential fashion One pass / small space Need to be randomized and approximate [FM, MP, AMS] Algorithm Main Memory

Frequency Moments and Norms Stream defines updates to a set of items 1,2,…,d. f i = weight of item i positive-only vs. turnstile model k-th Frequency Moment F k = i |f i | k p-th Norm: L p = k f k p = ( i |f i | p ) 1/p Maximum frequency: p= 1 Distinct Elements: p=0 Heavy hitters Assume length of stream and magnitude of updates is · poly(d)

Classical Results Approximating L p and F p is the same problem For 0 · p · 2, F p is approximable in O~(1) space (AMS, FM, Indyk, …) For p > 2, F p is approximable in O~(d 1-2/p ) space (IW) this is best-possible (BJKS, CKS)

Cascaded Aggregates Stream defines updates to pairs of items in {1,2,…n} x {1,2,…,d} f ij = weight of item (i,j) Two aggregates P and Q Q P P ± Q P ± Q = cascaded aggregate

Motivation Multigraph streams for analyzing IP traffic [Cormode-Muthukrishnan] Corresponds to P ± F 0 for different Ps F 0 returns #destinations accessed by each source Also introduced the more general problem of estimating P ± Q Computing complex join estimates Product metrics [Andoni-Indyk-Krauthgamer] Stock volatility, computational geometry, operator norms

k n n 1-2/k d 1 k=p p n 1-2/k d 1-2/p n 1-1/k £ (1) ? d 1-2/p d n 1-1/k The Picture Estimating L k ± L p We give a 1-pass O~(n 1-2/k d 1-2/p ) space algorithm when k ¸ p We also provide a matching lower bound based on multiparty disjointness We give a 1-pass O~(n 1-2/k d 1-2/p ) space algorithm when k ¸ p We also provide a matching lower bound based on multiparty disjointness We give the (n 1-1/k ) bound for L k ± L 0 and L k ± L 1 Õ(n 1/2 ) for L 2 ± L 0 without deletions [CM] Õ(n 1-1/k ) for L k ± L p for any p in {0,1} in turnstile [MW] We give the (n 1-1/k ) bound for L k ± L 0 and L k ± L 1 Õ(n 1/2 ) for L 2 ± L 0 without deletions [CM] Õ(n 1-1/k ) for L k ± L p for any p in {0,1} in turnstile [MW] [Ganguly] (without deletions) Follows from techniques of [ADIW] Follows from techniques of [ADIW] Our upper bound

Our Problem: F k ± F p F k ± F p (M) = i ( j |f ij | p ) k = i F p (Row i) k M =

High Level Ideas: F k ± F p 1. We want the F k -value of the vector (F p (Row 1), …, F p (Row n)) 2. We try to sample a row i with probability / F p (Row i) 3. Spend an extra pass to compute F p (Row i) 4. Could then output F p (M) ¢ F p (Row i) k-1 (can be seen as a generalization of [AMS]) How do we do the sampling efficiently??

Review – Estimating F p [IW] Level sets: Level t is good if |S t |(1+ε) 2t ¸ F 2 /B Items from such level sets are also good

² -Histogram [IW] Finds approximate sizes s t of level sets For all S t, s t · (1+ε)|S t | For good S t, s t ¸ (1- ε)|S t | Also provides O~(1) random samples from each good S t Space: O~(B)

Sampling Rows According to F p value Treat n x d matrix M as a vector: Run ε-Histogram on M for certain B Obtain (1 § ε)-approximation s t to |S t | for good t F k ± F p (M) ¸ (1-ε) F k ± F p (M), where M is M restricted to good items (Holders inequality) To sample, Choose a good t with probability s t (1+ε) pt /F p (M), where F p (M) = sum good t s t (1+ε) pt Choose random sample (i, j) from S t Let row i be the current sample Pr[row i] = t [ s t (1+ε) pt /F p (M)] ¢ [|S t Å row i|/|S t |] ¼ F p (row i)/F p (M) Pr[row i] = t [ s t (1+ε) pt /F p (M)] ¢ [|S t Å row i|/|S t |] ¼ F p (row i)/F p (M) Problems 1. High level algorithm requires many samples (up to n 1-1/k ) from the S t, but [IW] just gives O~(1). Cant afford to repeat in low space 2. Algorithm may misclassify a pair (i,j) into S t when it is in S t-1 Problems 1. High level algorithm requires many samples (up to n 1-1/k ) from the S t, but [IW] just gives O~(1). Cant afford to repeat in low space 2. Algorithm may misclassify a pair (i,j) into S t when it is in S t-1

High Level Ideas: F k ± F p 1. We want the F k -value of the vector (F p (Row 1), …, F p (Row n)) 2. We try to sample a row i with probability / F p (Row i) 3. Spend an extra pass to compute F p (Row i) 4. Could then output F p (M) ¢ F p (Row i) k-1 (can be seen as a generalization of [AMS]) How do we avoid an extra pass??

Avoiding an Extra Pass Now we can sample a Row i / F p (Row i) We design a new F k -algorithm to run on (F p (Row 1), …, F p (Row n)) which only receives IDs i with probability / F p (Row i) For each j 2 [log n], algorithm does: 1. Choose a random subset of n/2 j rows 2. Sample a row i from this set with Pr[Row i] / F p (Row i) We show that O~(n 1-1/k ) oracle samples is enough to estimate F k up to 1 § ε

New Lower Bounds AliceBob n x d matrix A n x d matrix B NO instance: for all rows i, ¢ (A i, B i ) · 1 YES instance: there is a unique row j for which ¢ (A j, B j ) = d, and for all i j, ¢ (A i, B i ) · 1 We show distinguishing these cases requires (n/d) randomized communication CC Implies estimating L k (L 0 ) or L k (L 1 ) needs (n 1-1/k ) space

Information Complexity Paradigm [CSWY, BJKS]: the information cost IC is the amount of information the transcript reveals about the inputs For any function f, CC(f) ¸ IC(f) Using their direct sum theorem, it suffices to show an (1/d) information cost of a protocol for deciding if ¢ (x,y) = d or ¢ (x,y) · 1 Caveat: distribution is only on instances where ¢ (x,y) · 1

Working with Hellinger Distance Given the prob. distribution vector ¼ (x,y) over transcripts of an input (x,y) Let à (x,y) ¿ = ¼ (x,y) ¿ 1/2 for all ¿ Information cost can be lower bounded by ¢ (u,v) = 1 k à (u,u) - à (u,v) k 2 Unlike previous work, we exploit the geometry of the squared Euclidean norm (useful in later work [AJP]) Short diagonals property: ¢ (u,v) = 1 k à (u,u) - à (u,v) k 2 ¸ (1/d) ¢ (u,v) = d k à (u,u) - à (u,v) k 2 a b c d e f a 2 + b 2 + c 2 + d 2 ¸ e 2 + f 2

Open Problems L k ± L p estimation for k < p Other cascaded aggregates, e.g. entropy Cascaded aggregates with 3 or more stages