SWiM 20031 Benchmark Brainstorming Dave Maier Mike Stonebraker and All of You! With thanks to Jim Gray for suggestions.

Slides:



Advertisements
Similar presentations
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Advertisements

Multi-DNC Data Collection/Monitoring
Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte.
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
SWiM Panel on Engine Implementation Jennifer Widom.
CS591A1 Fall Sketch based Summarization of Data Streams Manish R. Sharma and Weichao Ma.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
SWIM 1/9/20031 QoS in Data Stream Systems Rajeev Motwani Stanford University.
C systems AESAR Semantic web technologies, Luxembourg, 22 nd and 23 rd November 2000 Slide no. 1 Framework 5 project proposal Web access to real time and.
The Northwestern Mutual Life Insurance Company – Milwaukee, WI Application Monitoring Jeremy Kalsow.
#1 Google #2 Facebook #3 Youtube #7 Ebay #8 Twitter #9 Craigslist.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Group practice in problem design and problem solving
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Chapter 1 Overview of Databases and Transaction Processing.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
Client: North Texas Food Bank Senior Design 2010 Nafees Ahmed Prajyot Bangera Shahrzad Rahimian Pablo De Santiago May 10, 2010 “Passionately pursuing a.
Net Optics Confidential and Proprietary Net Optics appTap Intelligent Access and Monitoring Architecture Solutions.
GroundsOpsStaff (Last Updated 9/9/2010) A Grounds Operations and Staffing Computer Application Based on APPA Operational Guidelines for Grounds Management.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
May 11, Today’s Agenda  What is CoStar?  Who is CoStar?  How CoStar can help YOU!  Property Professional  Recap  Q & A.
© EZ-R Stats, LLC Duplicate Payments Slide 1 Auditing for Duplicate Payments A better way … Presentation of
On-Demand View Materialization and Indexing for Network Forensic Analysis Roxana Geambasu 1, Tanya Bragin 1 Jaeyeon Jung 2, Magdalena Balazinska 1 1 University.
Real Time Utility Monitoring Case Study, RM & Customer Experiences Dave Everett Commercial Manager Indirect Purchasing RM Education.
An Extensible Test Framework for Microsoft StreamInsight Alex Raizman Asvin Ananthanarayan Anton Kirilov Badrish Chandramouli Mohamed Ali.
XRules An XML Business Rules Language Introduction Copyright © Waleed Abdulla All rights reserved. August 2004.
John Plummer Technical Specialist Data Platform Microsoft Ltd StreamInsight Complex Event Processing (CEP) Platform.
P-1 © 2005 NeuralWare. All rights reserved. Using Neural Networks in Decision Support Systems Introduction Core Technology Building and Deploying Neural.
Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;
© EZ-R Stats, LLC Duplicate Payments Slide 1 Auditing for Duplicate Payments A better way … Web CAAT.
Online Music Store. MSE Project Presentation III
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
Víctor Cuevas Vicenttín DATA STREAM QUERY PROCESSING THROUGH SERVICES COORDINATION
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Chapter 10 Verification and Validation of Simulation Models
© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
1 Rob 2  Regardless of what technology your solution will be built on (RDBMS, RDF + SPARQL, NoSQL etc) you need.
The Misra Gries Algorithm. Motivation Espionage The rest we monitor.
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
CSE SW Measurement and Quality Engineering Copyright © , Dennis J. Frailey, All Rights Reserved CSE8314M15 version 5.09Slide 1 SMU CSE.
Mining of Massive Datasets Ch4. Mining Data Streams
Scalability for Search Scaling means how a system must grow if resources or work grows –Scalability is the ability of a system, network, or process, to.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
Chapter 4: Marketing on the Web. 2 How do you reach customers? Identify groups of potential customers Select the appropriate media Build the right message.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Finding New Customers with Bid Data Ariel Geifman, Director of Marketing, Mintigo.
Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Multiplication Facts Step by Step © Math As A Second Language All Rights Reserved next.
AMI to SmartGrid “DATA”
Efficient Evaluation of XQuery over Streaming Data
Chapter 10 Verification and Validation of Simulation Models
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Year 2 Updates.
Probabilistic Databases
Dop d d 1 2 reconst reconst sop P P 1 2.
Microsoft Office Illustrated Fundamentals
Adaptive Query Processing (Background)
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
IS 135 Business Programming
Presentation transcript:

SWiM Benchmark Brainstorming Dave Maier Mike Stonebraker and All of You! With thanks to Jim Gray for suggestions

SWiM Benchmark Properties Streamish Credible Scalable Realistic Input Approximable Expressively Challenging Portable Runnable

SWiM Streamish Source-driven data delivery Rapid arrival Infeasible to store all? (or low value to save?) “Live” output (output during input)

SWiM Credible Motivated by a likely application Measures useful work Simple to understand One approach: find an existing application that is done with custom coding, abstract from it

SWiM Scalable Stream rate & output volume # of streams Size of stream elements? Number of queries Memory requirements Stored data

SWiM Realistic Input Streams vary –bursts –stalls –diurnal cycles Stream sources come and go

SWiM Approximable Best stream rate vs. best answer at a given rate vs. most queries at a given rate Need metric for answer quality –latency –precision –correctness –completeness

SWiM Expressively Challenging? Range of query types –full stream –windowed –historic Range of stream semantics –signal –snapshots –cyclic –deltas

SWiM Portable Representation neutral: can be done with tuples, XML, messages Can be implemented on a wide variety of platforms: RDBMS, stream database, web- service engine

SWiM Runnable Can be run in a reasonable time –hard to test space management –limit on variations and cases Can generate streams in a repeatable manner, controlled variability Can build harness for testing quality metrics –comparison to ideal –capture timings –hard to cheat

SWiM NEXMark Stream Benchmark Niagara Extension of XMark XMark: XML Query Benchmark Models an on-line auction site Person(id, name, , ccard, city, state) Auction(id, itemname, desc, initbid, reserve, expires, seller, category) Bid(auction, bidder, price, dt-time) Plus static category data

SWiM Auction Monitoring System Category Data Bid Auction Person Bid Auction Monitoring System Streamed Results

SWiM Queries Full-stream and windowed –single-stream –stream and stored –multi-stream Query 5 (Hot items): Item with the most bids in past hour, each minute. SELECT Rstream(auction) FROM (SELECT B1.auction, count(*) AS num FROM Bid [RANGE 60 MINUTE SLIDE 1 MINUTE] B1 GROUP BY B1.auction) WHERE num >= ALL (SELECT count(*) FROM Bid [RANGE 60 MINUTE SLIDE 1 MINUTE] B2 GROUP BY B2.auction)

SWiM Metrics Quality-Latency Product Penalties for wrong, missing, extra tuples times average latency Can weight importance Output Matching Difference from ideal

SWiM Scaling Number of Bid streams Rate on Person, Auction streams Stored data size Test duration (?)

SWiM Application: TV Remote Controls Massive clickstream (thx to D. Schrader, NCR) –140 Million households w/ TV –3½ hours of viewing per day –19 clicks per hour You do the math … Obvious data mining uses, but also presents operational opportunities –Guarantee a given number “distinct viewings” of a commercial –need to correlate with schedule info (network, local station, cable co.)