Presentation is loading. Please wait.

Presentation is loading. Please wait.

Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.

Similar presentations


Presentation on theme: "Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey."— Presentation transcript:

1 Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey F.Naughton Database Group University of Wisconsin Material is partially referenced from SIGMOD 2004 [1]

2 Overview Introduction Semantics of Sliding Window Continuous Queries Cost Model Load Shedding Optimization Framework Experiments

3 Introduction The intent of the paper Find a execution plan that minimizes resource usage when resources are sufficient Find an execution plan that sheds tuples when resources are insufficient. Given a continuous query in a steady state, each execution plan is similar to a Queuing Network System Arriving tuples are clients Query operators are servers Execution plan is feasible if the system is stable If the plan is infeasible, load shedding is needed

4 Feasible and Infeasible Query Plan 0.5+0.25<11+0.25>1 Load Shedding

5 Assumptions The time stamps are unique (no ties) Tuples arrive in the stream in a monotonically increasing order by its time stamp (no out of order arrival) There is no relational tables involved in the query Discussion: Why will make these assumptions? Static optimization –> Rates of input streams are slow changing Enough memory to hold the buffering requirements for any query plan

6 Semantics Definitions Data Stream Time-based Window Tuple-based Window Selection A filter takes a stream as input and outputs a stream Join A symmetric operator that takes two input streams The cost model

7 Variables

8 Rate and Window Calculations 1 Select output rate 2 Active window size 3 output rate of window join 4 Active size of window join 5 output rate of n-ary join of n streams 6 Active window size of n-ary join

9 Cost Model SELECT A.a, B.b, C.c FFROMA [ROWS 10] B [ROWS 10] C [ROWS 10] WHEREA.a = B.a ANDB.b = C.b An concrete example on the application of the cost model

10 Cost Model Plans

11 Outcome after Load Shedding

12 Load Shedding A form of approximation which reduces load by dropping tuples from the incoming streams Methods of Load Shedding Random dropping of tuples  Presented in this paper Achieved by inserting random drop boxes at several points in the query plan Semantic dropping of tuples Goal – Maximize output rate of the approximated query Problems addressed: Optimal placement of drop boxes in an execution plan and the optimal setting of their sampling rate Choice of plan to shed load from

13 Selection Only Queries Initial condition A query consisting of n consecutive filters An execution plan for it that orders the filters in asc order by a designated number n+1 possible combinations Observation: Only need to drop tuples directly from the streaming source before they are processed by any of the filters Conclusion: The plan with the lowest cost yields the highest rate

14 Join Queries Only consider tuple-based windows Shedding Load From a Specific Plan Choice of Plan for Load Shedding

15 Shedding Load from a Specific Plan Where do we put the drop boxes? Query plan joining n streams Binary joins Drop box can be put before each of the two inputs to the n - 1 join operators Plus a box right after the last join is performed 2n - 1 possible locations Obs: Sufficient to drop tuples from the input sources before they are processed by any join operator

16 Choice of Load Shedding Plan Intuition for Selection queries Pick plan with lowest resource utilization Join queries Plan with lowest resource utilization? This intuition does not always work Why?

17 Load Shedding Plan Example Plans shed load in the order of their average utilization Switch-over occurs ~ 4.5 milliseconds (plan b=best)

18 Observations from Example The plan with the lowest utilization is not always the best choice for shedding load When the join cost is ~ 14 milliseconds, the throughput of the best plan is more than twice the throughput of the lowest utilization plan Lowest utilization plan could be the worst choice Conclusion: Load shedding must be integrated in the optimization process

19 Optimization Framework Two areas Throughput of the plan Utilization cost of the plan Feasible queries Goal: Minimize cost of the plan Where throughput is fixed at its maximum value for all feasible queries Infeasible queries Goal: Maximize throughput of the plan Where cost is fixed at its maximum value for all p Assumption Search space of alternative plans always equipped with drop boxes All plans in the search space will be feasible Problem can be treated as unconstrained

20 Optimization Goal Maximize R(p) = plan throughput/plan cost Simplest optimization algorithm Generate the set of all plans of the query For each plan in the set Compute cost of the plan If cost > 1, insert drop boxes Compute R Return the plan that maximizes R(p)

21 Heuristic Optimizer Based on the original System R optimizer Builds the plan from the bottom-up by storing the best plans for successively larger subsets of the input streams Computing the best plan for any subset Test whether this subplan is feasible If infeasible, tune the values of the drop boxes placed at its input streams using load shedding alg

22 Computing the best subset plan Test whether this subplan is feasible If infeasible, tune the values of the drop boxes placed at its input streams using load shedding alg Store subplan At any stage If a drop box is placed in front of a stream which had another one from a previous round, the two are combined into one drop box whose selectivity is the product of the original two

23 Experiment Setup 1000 random continuous queries Each query reps join of five input streaming sources: A, B, C, D, E Window sizes and join selectivities fixed Rates were randomly picked from 10 to 1000 tuples/sec

24 Need for Reoptimization

25 Average Gain in Throughput over using the Lowest Utilization Plan At very low resources, the gain is very significant (almost 8 folds at the 1% mark)

26 Average and Maximum Gain

27 Heuristic Optimizer Except at very low resources, the performance of the heuristic optimizer is quite impressive

28 Summary Presented framework for static optimization of sliding window conjunctive queries over infinite streams Cost Model Load Shedding Load shedding must be integrated in the optimization process! Optimization Framework Experimental Results

29 References [1] http://web.cs.wpi.edu/~cs525/f06s-EAR/cs525- homepage_files/LITERATURE/SIGMOD04-opt-shed-wisconsin.pdf [2] http://se.uwaterloo.ca/~tozsu/courses/cs856/F05/Presentations/W eek8/Stream_Maryam.pdf


Download ppt "Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey."

Similar presentations


Ads by Google