Download presentation
Presentation is loading. Please wait.
Published byNeil Horton Modified over 9 years ago
1
Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal
2
Introduction Data streams: Very large input data arriving sequentially, too large to fit in memory Examples: – networks (traffic passing through a router) – databases (transaction logs) – scientific data (satellites, sensors, LHC,…) – financial data What can we compute about the data in such situations? Today’s lecture: Start with an illustrative example problem, and then some generalities about the streaming model and problems
3
Example: Counting
5
Counting
8
Performance of Morris counter
9
Boosting the success probability I
11
Performance of Morris counter
12
Boosting the success probability II
13
Boosting success probability II
14
Test your understanding: Why don’t we just use the median all the time for boosting the probability of success instead of the mean?
15
Recap
16
Questions to ponder
17
Streaming data: models and problems
18
Models for streaming data
20
Restrictions on the algorithm
21
Some streaming problems: frequency moments
22
A general template for many streaming algorithms Come up with a basic random estimator for the quantity of interest (usually the non-trivial part) Give an efficient algorithm to compute the estimator (may need the use of hashing or some other way of reducing randomness requirements) Improve the probability of success by some trick such as the median of means estimator
23
Plan for next few lectures
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.