1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A. Rundensteiner VLDB’05 August 31 st 2005 Presented by Yali Zhu Department of Computer Science Worcester Polytechnic Institute U.S.A
2 Uncertainties in Stream Query Processing Register Continuous Queries Stream Query Engine Stream Query Engine Streaming Data Streaming Result Real-time and accurate responses required May have time- varying rates and high-volumes Available resources for executing each operator may vary over time. Distribution and Run-time Adaptations are required. High workload of queries Memory- and CPU resource limitations
3 DAX (D-CAPE) System Architecture Local Statistics Gatherer Data Distributor CAPE-Continuous Query Processing Engine Data Receiver Query Processor Local Adaptation Controller Distribution Manager Streaming Data Networ k End User Global Adaptation Controller Runtime Monitor Query Plan Manager Repository Connection Manager Repository Application Server Stream Generator Global Plan Migrator Local Plan Migrator
4 Distributed Adaptation Techniques Workload Relocation Operator-level Partition-level Query Plan Reshaping Data Spilling
5 Initial Distribution Distribution Manager Machine 2 Machine OperatorProcessor Operator 1QP 1 Operator 2QP 1 Operator 3QP 2 Operator 4QP 2 Operator 5QP 1 Operator 6QP 1 Operator 7QP 2 Operator 8QP 2 Stream Source 3 12 Application
6 Distribution Manager Machine 2 Machine OperatorProcessor Operator 7QP 1 Stream Source 3 12 Application Workload Relocation – Operator-level
7 Workload Relocation – Partition-level ABC Split A m1m1 m2m2 Split B Split C Problem of operator-level adaptation: Operators have large states. Moving them across machines can be expensive. Solution as partition-level adaptation: Partition state-intensive operators [Gra90,SH03,LR05] Distribute Partitioned Plan into Multiple Machines How to partition and relocate multi-way joins at run time?
8 Dynamic Plan Reshaping and Migration op1 op2 op3op4 op1 op2 op3op4 M1 M2 op1 op2 op3op4 Distribution Manager op2 op3op4 op1 M1 op2 op3op4 op1 op2 op3op4 op1 M2 Migration Protocol 11-way handshaking How does the protocol guarantees correct query results? How to integrate with across-machine workload relocation?
9 State Spill ABC ABC How to keep high run-time query throughput? How to integrate with across-machine workload relocation? Secondary Storage Push part of operator state onto disk Quick relief of memory overflow problem
10 Summary Key Words Distributed system Continuous queries (multi-way joins) Various unique run-time adaptation techniques Demo Sessions: Wednesday 2-3:30 Friday 9-10