Download presentation
Presentation is loading. Please wait.
1
1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer Science, Worcester Polytechnic Institute 100 Institute Road, Worcester, MA 01609 Tel: 1-508-831-5857, Fax: 1-508-831-5776 {tims, binliu, jbantova, rundenst}@cs.wpi.edu CIKM’05 Poster http://davis.wpi.edu/dsrg/CAPE/index.html
2
2 Uncertainties in Stream Query Processing Register Continuous Queries Distributed Stream Query Engine Distributed Stream Query Engine Streaming Data Streaming Result Real-time and accurate responses required May have time- varying rates and high-volumes Available resources for executing each operator may vary over time. Distribution and Adaptations are required. High workload of queries Receive Answers Memory- and CPU resource limitations
3
3 Adaptation in DCAPE : Distributed Stream Processing in a Nutshell Adaptation Techniques: –Spilling data to disk –Relocating work to other machines –Reoptimizing and migrating query plan Granularity of Adaptation: –Operator-level distribution and adaptation –Partition-level distribution and adaptation Integrated Methodologies: –Consider trade-offs between spill vs redistribute –Consider trade-offs between migrate vs redistribute
4
4 Streaming Data CAPE System Architecture Local Statistics Gatherer Data Distributor CAPE-Continuous Query Processing Engine Data Receiver Query Processor Local Adaptation Controller Distribution Manager Networ k Global Adaptation Controller Runtime Monitor Query Plan Manager Repository Connection Manager Repository Stream Servers Global Plan Migrator Local Plan Migrator [LZ+05, TLJ+05] End User Application Servers Streaming Data Streaming Data
5
5 M1M2M3 Legend: M1 M2 M3 Random DistributionBalanced Network Aware Distribution Goal: To minimize network connectivity. Algorithm: Takes each query plan and creates sub-plans where neighbouring operators are grouped together. Goal: To equalize workload per machine. Algorithm: Iteratively takes each query operator and places it on the query processor with the least number of operators. Initial Distribution Policies
6
6 Distribution Manager Distribution Table M 2Operator 8 M 2Operator 7 M 1Operator 6 M 1Operator 5 M 2Operator 4 M 2Operator 3 M 1Operator 2 M 1Operator 1 MachineOperator Initial Distribution of Query Plan Across Cluster of Machines Stream Source Application 4321876512345 6 78 M1 M2 Step 1Step 2 Step 1: Create distribution table using initial distribution algorithm. Step 2: Send distribution information to processing machines (nodes).
7
7 Run-Time Plan Redistribution Op 3:.4 Op 4:.3 Op 8:.3.64M 2 Op 1:.15 Op 2:.15 Op 5:.15 Op 6:.15.71M 1 Operator CostCostMachine Balance Cost Table (current)Cost Table (desired) Op 3:.3 Op 4:.2 Op 7:.3 Op 8:.2.91M 2 Op 1:.25 Op 2:.25 Op 5:.25 Op 6:.25.44M 1 Operator CostCostMachine 4100 tuplesM 2 2000 tuplesM 1 Statistics Table M Capacity: 4500 tuples Operators redistributed based on a redistribution policy. Cost per machine is determined as percentage of memory filled with tuples. Redistribution policies in Cape: Balance and Degradation. Legend: 43 5 6 78 21 43 5 6 78 21 Legend : --------- M1; M2 Legend: 43 5 6 78 21 43 5 6 78 21 Legend : --------- M1; M2
8
8 Redistribution Protocol Across Machines No tuples lost No-duplicates produced No incorrect results produced Seamless
9
9 Experimental Results : Distribution and Redistribution Algorithms Query Plan Performance with Query Plan of 40 Operators. Observations: Initial distribution is important for query plan performance. Redistribution improves at run-time query plan performance.
10
10 From Operator- to Partition-level Adaptation Problem of operator-level adaptation: –Operators have large states. –Moving them across machines can be expensive. Solution as partition-level adaptation: –Partition state-intensive operators [Gra90,SH03,LR05] –Distribute Partitioned Plan into Multiple Machines ABC Split A m1m1 m2m2 Split B Split C ABC ABC m1m1 Union Join Split A Split B Split C m2m2 Union Join Split A Split B Split C m3m3 Union Join Split A Split B Split C m4m4 Union Join Split A Split B Split C
11
11 CAPE Publications and Reports [RDZ04] E. A. Rundensteiner, L. Ding, Y. Zhu, T. Sutherland and B. Pielech, “CAPE: A Constraint- Aware Adaptive Stream Processing Engine”. Invited Book Chapter. http://www.cs.uno.edu/~nauman/streamBook/. July 2004. http://www.cs.uno.edu/~nauman/streamBook/ [ZRH04] Y. Zhu, E. A. Rundensteiner and G. T. Heineman, "Dynamic Plan Migration for Continuous Queries Over Data Streams”. SIGMOD 2004, pages 431-442. [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, "Joining Punctuated Streams“. EDBT 2004, pages 587-604. [DR04] L. Ding and E. A. Rundensteiner, "Evaluating Window Joins over Punctuated Streams“. CIKM 2004, to appear. [DRH03] L. Ding, E. A. Rundensteiner and G. T. Heineman, “MJoin: A Metadata-Aware Stream Join Operator”. DEBS 2003. [RDSZBM04] E A. Rundensteiner, L Ding, T Sutherland, Y Zhu, B Pielech \ And N Mehta. CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. Demonstration Paper. VLDB 2004 [SR04] T. Sutherland and E. A. Rundensteiner, "D-CAPE: A Self-Tuning Continuous Query Plan Distribution Architecture“. Tech Report, WPI-CS-TR-04-18, 2004. [SPR04] T. Sutherland, B. Pielech, Yali Zhu, Luping Ding, and E. A. Rundensteiner, "Adaptive Multi-Objective Scheduling Selection Framework for Continuous Query Processing “. IDEAS 2005. [SLJR05] T Sutherland, B Liu, M Jbantova, and E A. Rundensteiner, D-CAPE: Distributed and Self-Tuned Continuous Query Processing, CIKM, Bremen, Germany, Nov. 2005. [LR05] Bin Liu and E.A. Rundensteiner, Revisiting Pipelined Parallelism in Multi-Join Query Processing, VLDB 2005. [B05] Bin Liu and E.A. Rundensteiner, Partition-based Adaptation Strategies Integrating Spill and Relocation, Tech Report, WPI-CS-TR-05, 2005. (in submission) CAPE Project: http://davis.wpi.edu/dsrg/CAPE/index.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.