A new model and architecture for data stream management
Why on earth would one need it? Data Stream Management
The Problem: Tokyo Traffic Control
Stream Processing for Traffic Control 24-hour real-time control traffic intersections traffic signals Input Cameras Helicopters Police Citizen reports vehicle detectors Onboard vehicle sensors Traffic jams, accidents & closed streets Output Central monitors 300 traffic information boards Digital speed signs Route signs Affectors Adjusted traffic signal lights (7.000) Communications with officers on site
TTC: Center Display Board
TTC: Information Board
Example Domains Smart Energy Grid Management Network Traffic Management System Monitoring Road Traffic Monitoring Military Logistics Online Auctions Habitat Monitoring Immersive Environments
Stream Processing Engines HADP vs DAHP Events & Triggers Continuous Queries Real-time processing Transient data Lossy information
Overview Aurora
The Topic Aurora The prototype DBMS / SPE / DSMS UI The query language The project The authors
The Authors M.I.T., Department of EECS and Laboratory of Computer Science Michael Stonebraker Brandeis University, Department of Computer Science Daniel J. Abadi Mitch Cherniack Brown University, Department of Computer Science Don Carney Uğur Çetintemel Christian Convey Sangdon Lee Nesime Tatbul Stan Zdonik
Talk Overview Stream Processing Engines SQuAl Runtime Related work
SQuAl (Stream Query Algebra) Aurora
SQuAl Overview Connection Points Models Continuous Query View Ad-hoc Query Operators Order-agnostic Order-sensitive
SQuAl Operators Order-agnostic Filter Map Union Order-sensitive BSort Aggregate Join Resample Quirks!
Union (Unordered)
BSort (Ordered)
SQuAl: Example
Runtime Aurora
Query Optimization Dynamic Continuous Query Optimization Inserting projections Combining boxes Reordering boxes Ad-hoc query optimization
Real-time Scheduling Timestamped Tuples Train scheduling Interbox nonlinearities Intrabox nonlinearities Superboxes Introspection Static Run-time
Handling overload QoS specifications Response times Tuple drops Values produced Load Shedding Not Implemented at the time
Related work Aurora
Related work STREAM Stanford University, Telegraph UC Berkley, ? SASE UC Berkley / Mass Amherst, ? Cayuga Cornell University, ? PIPES University of Marburg, ? NiagaraCQ University of Wiscon-Madison,
Aurora’s Evolution TimespanProject Aurora (and Aurora*) Medusa Borealis (Medusa + Aurora*) 2003-presentStreamBase (Commercialized)
Complex Event Processing Today Oracle Oracle CEP Microsoft MS SQL Server StreamInsight Open Source OpenPDC Aleri Coral8 TruViso StreamBase Aurora’s Grandchild IBM SPADE Active Middleware Technology
Summary SPEs address different problems e.g. dynamic realtime monitoring Data Active, Human Passive Realtime, transient, even lossy data Aurora evolved into StreamBase SQuAl evolved into StreamSQL Many production-quality alternatives
Filter (Unordered)
Map (Unordered)
Aggregate (Ordered)
Join (Ordered)
Resample (Ordered) Based on RRDTool’s philosophy? Paper: Simple interpolation Use The Force, Read The Source: Average Count Sum Max Min LastVal