Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Problems in Streaming

Similar presentations


Presentation on theme: "Open Problems in Streaming"— Presentation transcript:

1 Open Problems in Streaming
Sampath Kannan University of Pennsylvania

2 Outline Questions about the model(s)
Relationship between streaming model(s) and other models Algorithm design questions

3 Current Model(s) Input stream(s): one or more; read head can only move to the right. Data order: stream is a sequence of items permuted adversarially. Var: Problem is about particular order in the stream. Workspace: bounded; polylogarithmic? Approximate solutions; randomization; single or multi pass.

4 Model too pessimistic? Adversarial input ordering makes life
difficult. Order-specific problems may not be general enough. What are the alternatives?

5 Average-case analysis?
Assume that the data in a stream is chosen by an adversary but its ordering is random. Open Problem: Is there a problem where this really helps? Conjecture: No! Simulate random order by sampling from prefix of stream...

6 Prover-assisted streaming
Stream creator/prover allowed to annotate stream (create separate stream of anntn). Downstream verifier can use annotation to compute... but must not be fooled by a dishonest prover. Element distinctness/set disjointness can be verified with O(n) annotation [J. Zhang].

7 For what problems does prover-assistance
help? Is there an example where annotation of length o(n) makes streaming algorithm possible?

8 Passes, space, approx. factor
What are the tradeoffs? Munro and Paterson show trade-off between space and number of passes for exact computation of median. Clustering: current trade-off for arbitrary metric spaces: Better tradeoffs?

9 Distributed Streams [Gibbons & Tirthapura] study streaming
complexity where there are multiple streams with one observer per stream. Each observer computes a “sketch” from which a function on the union of the streams is computed. If there is a canonical interleaving of streams what functions of this canonical interleaving can we compute?

10 Relation to other models
Worst-case one-round communication complexity: Function f. Inputs partitioned between Alice and Bob in worst possible way. Number of bits Alice needs to communicate for Bob to compute f is WOCC(f). Conj: PASS(f) = WOCC(f).

11 Issues: Non-uniformity in communication
complexity model: Even for the worst-case partition, Alice and Bob could exploit the fact that they know the partition. Does this allow WOCC(f) < PASS (f)? Can show: WOCC(f) < = PASS(f) there exists f : WOCC(f) = 2 PASS(f) = 3 [Ishay,K Strauss]

12 Algorithmic questions
Stream items: points from metric space Compute diameter (For d-dimensional Euclidean space can approximate in space.) (Feigenbaum, K, Zhang). Better space by dimension reduction?

13 Clustering in specific metric spaces
Can we get constant factor approximation with polylog space for Euclidean d-dimensional space? Other specific metric spaces? Computational geometry If promised real convex hull has few sides can we find approximate convex hull for a stream of points?

14 Data Mining Stream is interleaved output of one or
more Markov Chains. Many questions. Given one or more Markov Chains and a stream of states find max. probability that stream generated by given chains. Can be solved in space = product of number of states of chains. (Batu, Guha, K)

15 Other questions: Markov Chain(s) not given: Find most likely. Assume good mix in stream. Markov Chain(s) “hidden”... stream is not a sequence of states but a sequence of edge labels. “Background” Markov Chain given. Find motifs --- other Markov Chains that explain “signals” in the stream.

16 Open Problem Are there other open problems?


Download ppt "Open Problems in Streaming"

Similar presentations


Ads by Google