Open Problems in Streaming

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Xiaoming Sun Tsinghua University David Woodruff MIT
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Lindsey Bleimes Charlie Garrod Adam Meyerson
An Improved Data Stream Summary: The Count-Min Sketch and its Applications Graham Cormode, S. Muthukrishnan 2003.
Generating Random Numbers
Analysis of Algorithms
Fast Algorithms For Hierarchical Range Histogram Constructions
Approximations of points and polygonal chains
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Randomized Algorithms Randomized Algorithms CS648 Lecture 6 Reviewing the last 3 lectures Application of Fingerprinting Techniques 1-dimensional Pattern.
Inferring Mixtures of Markov Chains Tuğkan BatuSudipto GuhaSampath Kannan University of Pennsylvania.
Massive Data Streams in Graph Theory and Computational Geometry Ph.D. Dissertation Defense Jian Zhang Advisor: Joan Feigenbaum Committee: Ravi Kannan Avi.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
The Power of Randomness in Computation 呂及人中研院資訊所.
Sampling Combinatorial Space Using Biased Random Walks Jordan Erenrich, Wei Wei and Bart Selman Dept. of Computer Science Cornell University.
FLANN Fast Library for Approximate Nearest Neighbors
Lecture 2 We have given O(n 3 ), O(n 2 ), O(nlogn) algorithms for the max sub-range problem. This time, a linear time algorithm! The idea is as follows:
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
The Lower Bounds of Problems
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
A Membrane Algorithm for the Min Storage problem Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca WMC.
Data Stream Algorithms Lower Bounds Graham Cormode
Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Clustering Data Streams
Information Complexity Lower Bounds
Stochastic Streams: Sample Complexity vs. Space Complexity
New Characterizations in Turnstile Streams with Applications
Approximating the MST Weight in Sublinear Time
Research in Computational Molecular Biology , Vol (2008)
Unsupervised Learning
Approximate Matchings in Dynamic Graph Streams
On Communication Protocols that Compute Almost Privately
Algorithms + Data Structures = Programs -Niklaus Wirth
Structural Properties of Low Threshold Rank Graphs
Sketching and Embedding are Equivalent for Norms
CS 154, Lecture 6: Communication Complexity
Lecture 4: CountSketch High Frequencies
Randomized Algorithms CS648
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Range-Efficient Counting of Distinct Elements
CIS 700: “algorithms for Big Data”
Haim Kaplan and Uri Zwick
Linear sketching with parities
Y. Kotidis, S. Muthukrishnan,
The Curve Merger (Dvir & Widgerson, 2008)
Near-Optimal (Euclidean) Metric Compression
The Communication Complexity of Distributed Set-Joins
Algorithms + Data Structures = Programs -Niklaus Wirth
Linear sketching over
Advances in Linear Sketching over Finite Fields
CSCI B609: “Foundations of Data Science”
Hidden Markov Models (HMMs)
Linear sketching with parities
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Range-Efficient Computation of F0 over Massive Data Streams
Classical Algorithms from Quantum and Arthur-Merlin Communication Protocols Lijie Chen MIT Ruosong Wang CMU.
Clustering.
Presentation transcript:

Open Problems in Streaming Sampath Kannan University of Pennsylvania

Outline Questions about the model(s) Relationship between streaming model(s) and other models Algorithm design questions

Current Model(s) Input stream(s): one or more; read head can only move to the right. Data order: stream is a sequence of items permuted adversarially. Var: Problem is about particular order in the stream. Workspace: bounded; polylogarithmic? Approximate solutions; randomization; single or multi pass.

Model too pessimistic? Adversarial input ordering makes life difficult. Order-specific problems may not be general enough. What are the alternatives?

Average-case analysis? Assume that the data in a stream is chosen by an adversary but its ordering is random. Open Problem: Is there a problem where this really helps? Conjecture: No! Simulate random order by sampling from prefix of stream...

Prover-assisted streaming Stream creator/prover allowed to annotate stream (create separate stream of anntn). Downstream verifier can use annotation to compute... but must not be fooled by a dishonest prover. Element distinctness/set disjointness can be verified with O(n) annotation [J. Zhang].

For what problems does prover-assistance help? Is there an example where annotation of length o(n) makes streaming algorithm possible?

Passes, space, approx. factor What are the tradeoffs? Munro and Paterson show trade-off between space and number of passes for exact computation of median. Clustering: current trade-off for arbitrary metric spaces: Better tradeoffs?

Distributed Streams [Gibbons & Tirthapura] study streaming complexity where there are multiple streams with one observer per stream. Each observer computes a “sketch” from which a function on the union of the streams is computed. If there is a canonical interleaving of streams what functions of this canonical interleaving can we compute?

Relation to other models Worst-case one-round communication complexity: Function f. Inputs partitioned between Alice and Bob in worst possible way. Number of bits Alice needs to communicate for Bob to compute f is WOCC(f). Conj: PASS(f) = WOCC(f).

Issues: Non-uniformity in communication complexity model: Even for the worst-case partition, Alice and Bob could exploit the fact that they know the partition. Does this allow WOCC(f) < PASS (f)? Can show: WOCC(f) < = PASS(f) there exists f : WOCC(f) = 2 PASS(f) = 3 [Ishay,K Strauss]

Algorithmic questions Stream items: points from metric space Compute diameter (For d-dimensional Euclidean space can approximate in space.) (Feigenbaum, K, Zhang). Better space by dimension reduction?

Clustering in specific metric spaces Can we get constant factor approximation with polylog space for Euclidean d-dimensional space? Other specific metric spaces? Computational geometry If promised real convex hull has few sides can we find approximate convex hull for a stream of points?

Data Mining Stream is interleaved output of one or more Markov Chains. Many questions. Given one or more Markov Chains and a stream of states find max. probability that stream generated by given chains. Can be solved in space = product of number of states of chains. (Batu, Guha, K)

Other questions: Markov Chain(s) not given: Find most likely. Assume good mix in stream. Markov Chain(s) “hidden”... stream is not a sequence of states but a sequence of edge labels. “Background” Markov Chain given. Find motifs --- other Markov Chains that explain “signals” in the stream.

Open Problem Are there other open problems?