Download presentation
Presentation is loading. Please wait.
Published byJames Nichols Modified over 9 years ago
1
Eddies: Continuously Adaptive Query Processing Ross Rosemark
2
Goals of Presentation What are traditional query optimizers? What are the problems with these optimizers? What is an eddy? How does an eddy work? Discuss how an eddy self tunes query plans?
3
Traditional Query Optimizer Metadata (statistics) about the distribution of data is collected by the query optimizer Based on the metadata the query optimizer determines the most energy efficient query plan Query plan is executed. This is good for snapshot queries (short lived queries)
4
Problems The problem with this approach is that data became streaming Internet Sensor nodes Etc. The query plan chosen by the query optimizer could eventually become inefficient if the metadata statistics change Cost of operators Operator selectivies could change Rate tuples arrive from inputs Also it’s difficult to just re-optimize a query Determining when to re-optimize the query is a difficult research issue that has not been addressed. Must determine not only when you should re-optimize but that re-optimizing still leaves produces the same state
5
Example Assume a query that asks for Employees age and salary > 100000 is issued into the sensor network. Initially the predicate salary > 100000 will be very selective… hence eliminating a lot of tuples As older employees in the database start to be read the predicate salary > 100000 will be less selective
6
Lets show some things Eddie must consider. Eddie should: Increase synchronization Increase the moments of Symmetry
7
Synchronization Barriers Synchronization barriers Where one operation hinders the speed of another operations Extreme example Assume you have a merge join on two duplicate-free inputs. (slowlo and fasthi) Assume that during processing the next tuple is always consumed from the relation that had the lowest values Assume that slowlo is a slowly delivered external relation with many low columns in it’s bandwith Assume that fasthi is high bandwith (i.e. delivers tuples fast) and has only high values in it’s column In this example fasthi is delayed why slowlo delivers tuples Known as a synchronization barrier Desirable to minimize the number of Synchronization barriers
8
Moments of Symmetry You can only re-optimize at a moment of symmetry A moment of symmetry is when the query is executed to a point that the optimizer can change the query plan without affecting the way the query plans predicates are performed Typically happens in joins Example Assume you have a nested loop join with inner relation R and outer relation S. In this example you can only re-optimize this join when R is completely scanned. It is possible to re-optimize in the middle of scanning S but the join algorithm would then have to be changed
9
Eddy An Eddy was designed to dynamically re- optimize queries. The authors implemented Eddies in a River A river is a shared nothing parallel query processing framework that dynamically adapts to fluctuations and workloads An eddy is implemented via a module in a river containing an arbitrary number of input relations, a number of participating unary and binary modules, and a single output relation An Eddy also maintains a fixed size buffer of tuples that need to be processed Each operator takes two tuples, processes them and delivers them back to the eddy
10
Eddy (Cont) A tuple in a eddy is associated with a tuple descriptor Contains a vector of Ready bits and Done bits The Eddy ships the tuple to only the operators that have the Ready bits turned on After an operators is processed it’s Done bits are set If all done bits are set the tuple is sent to the Eddy’s output Else it’s sent to another operators
11
Eddy (Cont) So how do you route tuples to the different Eddy operators In this paper they look at multiple different ways
12
Naïve Scheme Eddies buffer is implemented as a priority queue Recall the buffer is used to store tuples that should be processed by the Eddies When a tuple enters a buffer it’s priority is set to low After it’s processed by an operator in the Eddy and returned to the buffer it’s priority is set to high This ensures that tuples do not get stuck in the Eddy. I.e. starvation This scheme dynamically adjusts to work required of operators Operators that are slower (i.e. take 4 seconds vs. 1 second will receive less tuples) Note each operator has a fixed size queue Once queue is filled up no more tuples can be inserted into queue
13
Lottery Scheme Each time a tuple is routed to a operator the operator is credited with a ticket When the operator returns a tuple the eddy one ticket is debited from the eddies running count for that operator Tracks how efficiently a operator drains tuples from the system When a tuple is to be routed to an operator the Eddy holds a lottery Only the operators that have their Ready bit sets can participate in the lottery An operator’s chance of “winning the lottery” and receiving the tuple corresponds to the count of tickets for that operator. Dynamically adjusts to selectivity of operators
14
Window Scheme Problem with lottery scheme is that it uses to much past info Problem: An operator that gained a lot of ticket initially but then became slow In this scheme the lottery scheme is modified such that the lottery only looks at tickets gained by an operator in a fixed window. Keeps track of two types of tickets Banked tickets Used when running the lottery Escrow tickets Used to measure efficiency during the window At the end of a window Banked Tickets = Escrow Tickets Escrow Tickets = 0 Ensure operators re-approve themselves each window
15
Comparison Shows that due to “Fluid Dynamics” (i.e. the varying rates of operators) the Naïve approach naturally adjusts based on the cost of operators Shows that Lottery also adjusts based on workload
16
Comparison Naïve eddies does not adjust based on selectivity Naïve performs between the best and worse Lottery does adjust based on selectivity
17
Joins Shows that Lottery performs nearly optimally Naïve performs between the best and worst
18
Joins (Cont) All joins are ripple joins Change selectivity of join predicate In all cases the Eddy with the Lottery is close to optimal
19
Window Scheme Both cases are related to different query plans Shows eddy is in between the best and worst case
20
Adaptability Toggle the cost of the operator 3 times throughout experiment Notice that Eddy switches how many tuples are first processed by that operator
21
Problems Does not work well when there is initially a long delay at an operator Eddy gives all tuples to operator because operator is not returning tuples that satisfy the join predicates criteria Not until after the delay Notice eventually this problem is figured out by Eddy Just make take some time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.