Sameer Agarwal, Aurojit Panda, Barzan Moxafari Samuel Madden, Ion Stoica.

Sameer Agarwal, Aurojit Panda, Barzan Moxafari Samuel Madden, Ion Stoica

Objective Offer bounded response times + bounded error on queries Use pre-prepared samples

And it works

Obvious questions How to accurately represent the data with samples? Data is generally not uniform Queries care about small fractions of the data, e.g Count # Republicans in San Francisco How to tolerate unseen queries? If you cache stuff based on queries that have already occurred, it’s not that great for interactive exploration How to tolerate changing data? New data is continuously being added, how is that handled.

Stratified samples (deal with non-uniform data)

Optimization Can’t build stratified samples for everything, it grows too fast Optimize based on: How poorly uniform samples would perform (data skew) How likely the samples would be used based on query templates Storage costs for the samples By working at granularity of usage in WHERE / GROUP BY instead of queries, you increase tolerance of unseen queries

Changing data How to maintain guarantees with fast changing data? Sampling is offline Consider a database of request latencies for a large system. For locating errors, numbers abnormally larger than average are interesting They are poorly represented by uniform samples The interesting data might be the most recent data (samples take 5-30 min to generate when run) Could we run cheap queries at insert time to deduce if the inserted data changes the distribution? Can we merge existing stratified samples and new data with predictable error?

Questions Where else can stratified samples help? Is this applicable to workloads on online data? Can the stratified samples be maintained online? Can we use a similar technique to obtain results from a degraded cluster where some of the data is unreachable?

Sameer Agarwal, Aurojit Panda, Barzan Moxafari Samuel Madden, Ion Stoica.

Similar presentations

Presentation on theme: "Sameer Agarwal, Aurojit Panda, Barzan Moxafari Samuel Madden, Ion Stoica."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sameer Agarwal, Aurojit Panda, Barzan Moxafari Samuel Madden, Ion Stoica.

Similar presentations

Presentation on theme: "Sameer Agarwal, Aurojit Panda, Barzan Moxafari Samuel Madden, Ion Stoica."— Presentation transcript:

Similar presentations

About project

Feedback