MPDS 2003 San Diego 1 Reducing Execution Overhead in a Data Stream Manager Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis University Alex Rasin Brown University Michael Stonebraker MIT Stan ZdonikBrown University
MPDS 2003 San Diego 2 Aurora from the Sky Queries App QoS App QoS App QoS
MPDS 2003 San Diego 3 Aurora from the Sky App QoS App QoS App QoS
MPDS 2003 San Diego 4 Runtime Operation Basic Architecture Scheduler QOS Monitor Box Processors Buffer Storage Manager Persistent Store … q1q1 … q2q2 … qiqi … q1q1 … qnqn … q2q2 Catalog Router inputs outputs
MPDS 2003 San Diego 5 Execution Model Traditional Thread-driven Execution Traditional Thread-driven Execution Thread per query or operatorThread per query or operator Resource management done by OSResource management done by OS Easy to program Easy to program Scalability problems Scalability problems State-based Execution State-based Execution Single scheduler thread maintains execution queueSingle scheduler thread maintains execution queue Small number of worker threads execute execution queue entriesSmall number of worker threads execute execution queue entries Enables application specific allocation of resourcesEnables application specific allocation of resources
MPDS 2003 San Diego 6 State-Based vs. Thread-Based
MPDS 2003 San Diego 7 Scheduling Two level scheduling Two level scheduling Inter-query scheduling (Which query?)Inter-query scheduling (Which query?) Intra-query scheduling (Operation order?)Intra-query scheduling (Operation order?) Batching Batching Tuple trainsTuple trains Fewer box executions -> fewer scheduling decisions Fewer box executions -> fewer scheduling decisions Also, better memory utilization Also, better memory utilization Superbox schedulingSuperbox scheduling Multiple boxes per decision -> fewer scheduling decisions Multiple boxes per decision -> fewer scheduling decisions Memory utilization: allocate for entire superbox at once Memory utilization: allocate for entire superbox at once State Monitoring (# tuples, latencies, etc) State Monitoring (# tuples, latencies, etc) Incremental and approximateIncremental and approximate
MPDS 2003 San Diego 8 Runtime Operation Scheduling: Minimizing Per Tuple Processing Overhead Train Scheduling: A B …xyz A (x)A (y)A (z)B (A (x))B (A (y))B (A (z)) = Scheduler Action AB …xyz B (A (x))B (A (y))B (A (z)) Box Trains: A B …xyz A (z, y, x) B (A (z), A (y), A (x)) Tuple Trains:
MPDS 2003 San Diego 9 Tuple Trains and Superboxes
MPDS 2003 San Diego 10 Overheads
MPDS 2003 San Diego 11 Overheads
MPDS 2003 San Diego 12 Other Issues Priority assignment Priority assignment Box Execution Order Box Execution Order QoS QoS
MPDS 2003 San Diego 13 Stay Tuned! SIGMOD Demo SIGMOD Demo VLDB ’03 paper “Operator Scheduling in a Data Stream Environment” VLDB ’03 paper “Operator Scheduling in a Data Stream Environment”
MPDS 2003 San Diego 14 A little closer App QoS App QoS
MPDS 2003 San Diego 15 A little closer App QoS App QoS
MPDS 2003 San Diego 16 Aurora from the Sky Query App QoS App QoS Query