Presentation is loading. Please wait.

Presentation is loading. Please wait.

SWiM 20031 Benchmark Brainstorming Dave Maier Mike Stonebraker and All of You! With thanks to Jim Gray for suggestions.

Similar presentations


Presentation on theme: "SWiM 20031 Benchmark Brainstorming Dave Maier Mike Stonebraker and All of You! With thanks to Jim Gray for suggestions."— Presentation transcript:

1 SWiM 20031 Benchmark Brainstorming Dave Maier Mike Stonebraker and All of You! With thanks to Jim Gray for suggestions

2 SWiM 20032 Benchmark Properties Streamish Credible Scalable Realistic Input Approximable Expressively Challenging Portable Runnable

3 SWiM 20033 Streamish Source-driven data delivery Rapid arrival Infeasible to store all? (or low value to save?) “Live” output (output during input)

4 SWiM 20034 Credible Motivated by a likely application Measures useful work Simple to understand One approach: find an existing application that is done with custom coding, abstract from it

5 SWiM 20035 Scalable Stream rate & output volume # of streams Size of stream elements? Number of queries Memory requirements Stored data

6 SWiM 20036 Realistic Input Streams vary –bursts –stalls –diurnal cycles Stream sources come and go

7 SWiM 20037 Approximable Best stream rate vs. best answer at a given rate vs. most queries at a given rate Need metric for answer quality –latency –precision –correctness –completeness

8 SWiM 20038 Expressively Challenging? Range of query types –full stream –windowed –historic Range of stream semantics –signal –snapshots –cyclic –deltas

9 SWiM 20039 Portable Representation neutral: can be done with tuples, XML, messages Can be implemented on a wide variety of platforms: RDBMS, stream database, web- service engine

10 SWiM 200310 Runnable Can be run in a reasonable time –hard to test space management –limit on variations and cases Can generate streams in a repeatable manner, controlled variability Can build harness for testing quality metrics –comparison to ideal –capture timings –hard to cheat

11 SWiM 200311 NEXMark Stream Benchmark Niagara Extension of XMark XMark: XML Query Benchmark Models an on-line auction site Person(id, name, email, ccard, city, state) Auction(id, itemname, desc, initbid, reserve, expires, seller, category) Bid(auction, bidder, price, dt-time) Plus static category data

12 SWiM 200312 Auction Monitoring System Category Data Bid Auction Person Bid Auction Monitoring System Streamed Results

13 SWiM 200313 Queries Full-stream and windowed –single-stream –stream and stored –multi-stream Query 5 (Hot items): Item with the most bids in past hour, each minute. SELECT Rstream(auction) FROM (SELECT B1.auction, count(*) AS num FROM Bid [RANGE 60 MINUTE SLIDE 1 MINUTE] B1 GROUP BY B1.auction) WHERE num >= ALL (SELECT count(*) FROM Bid [RANGE 60 MINUTE SLIDE 1 MINUTE] B2 GROUP BY B2.auction)

14 SWiM 200314 Metrics Quality-Latency Product Penalties for wrong, missing, extra tuples times average latency Can weight importance Output Matching Difference from ideal

15 SWiM 200315 Scaling Number of Bid streams Rate on Person, Auction streams Stored data size Test duration (?)

16 SWiM 200316 Application: TV Remote Controls Massive clickstream (thx to D. Schrader, NCR) –140 Million households w/ TV –3½ hours of viewing per day –19 clicks per hour You do the math … Obvious data mining uses, but also presents operational opportunities –Guarantee a given number “distinct viewings” of a commercial –need to correlate with schedule info (network, local station, cable co.)


Download ppt "SWiM 20031 Benchmark Brainstorming Dave Maier Mike Stonebraker and All of You! With thanks to Jim Gray for suggestions."

Similar presentations


Ads by Google