Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.

Similar presentations


Presentation on theme: "Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester."— Presentation transcript:

1 Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester Polytechnic Institute Massachusetts, USA *Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

2 SIGMOD 20042 Motivation Continuous query over streams  Statistics unknown before start  Statistics changing during execution Stream rates, arrival pattern, distribution, etc Need for dynamic adaptation  Plan re-optimization Change the shape of the query plan tree

3 SIGMOD 20043 Run-time Plan Re-Optimization Step 1 - Decide when to optimize  Statistics Monitoring Step 2 – Generate new query plan  Query Optimization Step 3 – Replace current plan by new plan  Plan Migration

4 SIGMOD 20044 Naïve Plan Migration Strategy Migration Steps  Pause execution of old plan  Drain out all tuples inside old plan  Replace old plan by new plan  Resume execution of new plan AB BC AB C AB BC A B C Problem: Works for stateless operators only

5 SIGMOD 20045 Stateful Operator in CQ Why stateful  Need non-blocking operators in CQ  Operator needs to output partial results  State data structure keep received tuples AB AB b1 b2 b3 b4 b5 ax State AState B ax b2 axb3 Key Observation: The purge of tuples in states relies on processing of new tuples. Example: Symmetric NL join w/ window constraints

6 SIGMOD 20046 Naïve Migration Strategy Revisited Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan AB BC AB C (2) All tuples drained (4) Processing Resumed (3) Old Replaced By new Deadlock Waiting Problem:

7 SIGMOD 20047 Problem Definition Dynamic Plan Migration  Input (two migration boxes) One contains old plan One contains new plan Have same input and output queues  Result Old box is replaced by new box Valid Migration  No missing tuples  No duplicates Key points: - Involved plans contain stateful operators - Need to migrate yet still retain useful states and discard useless states.

8 SIGMOD 20048 State of the Art “Efficient mid-query re-optimization of sub- optimal query execution plans”  [Kabra, DeWitt 1998]  Only migrates unprocessed portion Query plan competing model  [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994]  Generate several candidate query plans before start  Execute all, choose one after a while

9 SIGMOD 20049 Outline Problem Motivation and Definition Dynamic Migration Strategies  Moving State Strategy  Parallel Track Strategy Experimental Results

10 SIGMOD 200410 Moving State Strategy Basic idea  Share common states between two migration boxes Key steps  State Matching Match states based on IDs.  State Moving Create new pointers for matched states in new box  What’s left? Unmatched states in new box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD Old BoxNew Box

11 SIGMOD 200411 Unmatched States State Recomputing  Recursively recompute unmatched S BC and S BCD from bottom up Why always possible?  Old and new boxes have same input queues  The states associated with input queues always match Why necessary? AB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD Q ABCD

12 SIGMOD 200412 Terms on Tuples New/Old tuples  Old: tuples already in old box when migration starts  New: tuples not exist in old box when migration starts Sub-tuples  Tuple ABCD is result of  Tuple A, B, C and D are sub-tuples of tuple ABCD  Tuple ABCD has 2 4 =16 possible combinations of old/new sub-tuples A BCD CD BC AB QAQA QBQB QCQC QDQD S ABC SCSC SASA SBSB SDSD SABSAB Q ABCD

13 SIGMOD 200413 Why Recompute Unmatched States To get the complete results of ABCD, we need all 16 old/new combinations AB CD BC QBQB QCQC QDQD QAQA SASA SDSD SBSB SCSC S BCD S BC If S BC not recomputed, will miss results with both B and C as OLD: Old Tuple New Tuple B CD A B CD A B CD A

14 SIGMOD 200414 Cost Estimation of MS Migration Cost of MS consists of  Cost of state matching ID comparison (neglectable)  Cost of state moving Create pointers (neglectable)  Cost of state recomputing Majority of cost Affecting parameters  Operator selectivities  # of tuples in states Estimated as (input rate x window size) See paper for detailed cost models One cost model conclusion: Cost of MS has polynomial relation to window size

15 SIGMOD 200415 MS Migration Pros and Cons Pros  Fast when # of tuples in states is small Low input rates, low selectivity or small window Cons  Output silence during entire migration stage Can query output even during migration?  Motivation for Parallel Track Strategy

16 SIGMOD 200416 Parallel Track Strategy Basic idea  Execute both plans in parallel and gradually “push” old tuples out of old box by purging Key steps  Connect boxes  Execute in parallel Until old box “expired” (no old tuple or sub-tuple)  Disconnect old box  Start execute new box only CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD

17 SIGMOD 200417 Potential Duplicates Tuple ABCD  2 4 =16 possible old/new sub- tuple combinations  Same case not generated by both boxes Otherwise we may have duplicates In new box  all states start empty  only generates ABCD as (new,new,new,new) In old box  may generate all 16 cases  duplicate the case of (new,new,new,new) CD BC AB QAQA QBQB QCQC QDQD S ABC SCSC SASA SBSB SDSD S AB Q ABCD At root op in old box: If both to-be-joined tuples have all-new sub-tuples, don’t join. Other op in old box: Proceed as normal Duplicate Prevention

18 SIGMOD 200418 Estimation of PT Migration T PT ≈ 2W 1 st W 2 nd W T M-start T M-end T New Old New Old Estimation Formula: CD BC AB QAQA QBQB QCQC QDQD S ABC SCSC SASA SBSB SDSD SABSAB Old Box W

19 SIGMOD 200419 PT Migration Duration Given enough system computing resources  new tuples processed right away  PT migration duration ≈ 2W If not enough system resources  New tuples accumulated in queues  PT migration duration > 2W

20 SIGMOD 200420 Cost Estimation of PT Migration Cost of PT = cost of process 2W tuples in old box + cost of process 2W tuples in new box Parameters:  Input rates, window size, selectivity Similar to MS strategy

21 SIGMOD 200421 PT Migrations Pros and Cons Pros  Keep on producing results even during migration no results during MS migration Cons  Migration duration is at least 2W MS may be faster depending on # tuples in states

22 SIGMOD 200422 Outline Problem Definition and Motivation Dynamic Migration Strategies  Moving State Strategy  Parallel Track Strategy Experimental Results

23 SIGMOD 200423 Experimental Setup Embed in the CAPE system  CAPE = Continuous Adaptive Processing Engine  A streaming query engine developed at DSRG, WPI VLDB’04 demo  Layers of Adaptations Punctuation exploring Adaptive scheduling Query migration Dynamic distribution Input Streams  By stream generator of CAPE  Poisson arrival pattern Experiments on migration duration  Vary window size

24 SIGMOD 200424 Migration Duration vs. Window Size

25 SIGMOD 200425 Conclusions Identify problem of migration for stateful operators First solutions for continuous query migration  Moving state strategy  Parallel track strategy Embed both strategies into stream system Cost model and experimental evaluation  Cost model confirmed by experiments  Identify performance trade-off of the two strategies

26 SIGMOD 200426 Thank You For more information, check the CAPE website @: http://davis.wpi.edu/~dsrg/CAPE/


Download ppt "Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester."

Similar presentations


Ads by Google