Presentation is loading. Please wait.

Presentation is loading. Please wait.

Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.

Similar presentations


Presentation on theme: "Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz."— Presentation transcript:

1 Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz

2 Parser Optimizer Execution Engine R 1 R 2 R 3 R 4 ( (R 2 R 3 ) R 1 ) R 4 Query Optimization Integral component of declarative query processing Key problem: join ordering Most important (and most complex!) module of a DBMS

3 “Monolithic” Query Optimization Output: a single join order based on join selectivities between tables Plan: (P E) D

4 Partition-Based Query Optimization Output: multiple join orders based on selectivities between fragments of tables Plan: ( (P D 2 ) E )  ( (E D 1 ) P )

5 Selectivity-Based Partitioning Divide-and-Union paradigm Optimization problem and analysis Partitioning algorithm Experimental results

6 Roadmap Preliminaries Problem Definition Partitioning Algorithm Optimal Splits Iterative Partitioning Experimental Results Conclusions

7 Data and Query Model Chain-join queries Example: R 1 R 2 R 3 R 4 Relations may have optional selections Relation  Frequency matrix Left-deep evaluation plans Example: R 3 R 2 R 4 R 1 R3R3 R2R2 R4R4 R1R1

8 Problem Definition Given: query Q, maximum partition count N Goal: find partitioning of Q in n  N partitions that minimizes query cost On-the-fly partitioning vs. Off-line partitioning Difficult optimization problem! Determine the pivot relation Determine the number of partitions Compute a partitioning of the pivot Determine the orderings of partitioned plans R 1 R 2 R 3 R 4 R 1 R 21 R 4 R 3 R 3 R 22 R 1 R 4

9 Query Cost Function One possibility: optimizer’s cost model Accurate cost estimation Solution depends on low-level system details Difficult to gain intuitions Our approach: query cost = number of intermediate results Simple function that admits analysis Sound connections to realistic cost models (Cluet and Moerkotte, ICDT’95) Cost(R 3 R 2 R 4 R 1 ) = |R 3 R 2 | + |R 3 R 2 R 4 |

10 Roadmap Preliminaries Problem Definition Partitioning Algorithm Optimal Splits Iterative Partitioning Experimental Results Conclusions

11 Partitioning Algorithm - Overview State space: partitioned join orders Partitioning algorithm: Explore a set of states Compute optimal partitioning for each state Return global optimum Our approach: order joins then partition Another possibility: partition then order joins

12 Distributing Tuples Goal: Distribute tuples to minimize cost Optimal distribution depends on: Frequency matrices of other relations Position (m,l)

13 Optimal Split Theorem Distribute each value (m,l) independently Place (m,l) in partition that minimizes g(L,T,m,l)

14 Partitioning Algorithm - Overview State space: partitioned join orders Partitioning algorithm: Explore a set of states Compute optimal partitioning for each state Return global optimum

15 Search Algorithm Exhaustive search is impractical [ Pivot, Leading orders, Trailing orders ] Search heuristics: Tighter search space: [ Pivot, Optimal Leading orders ] Iterative Partitioning Guided search by using lower bounds on cost of partitions

16 Encoding of State Space State: [ Pivot, Optimal leading orders ] Transition: insert relation in a leading order

17 R 5 R 1 R 3 R 4 R 5 Iterative Partitioning Key idea: (Partition, Optimize)+ Compute optimal split for leading/trailing orders Optimize trailing orders for the current split Theorem: query cost can only decrease Idea extended to more detailed cost models R1R1 R 3 R 4 R2R2 R 21 R 22 R 3 R 5 R 4 R 1 R 5 R 21 R 22 LeadingTrailing

18 Search Algorithm Initial states: single-relation leading orders Search process: Compute partitions with IP Open more states with transition function Transitions are guided by lower bound on cost function Same lower bound can also prune states Stopping criteria: Search space is exhausted Time budget is exhausted

19 System Integration Parser Optimizer Execution Engine Parser Optimizer Execution Engine Partitioner MonolithicPartition-based

20 Roadmap Preliminaries Problem Definition Partitioning Algorithm Optimal Splits Iterative Partitioning Experimental Results Conclusions

21 Effect of Skew Synthetic Data

22 Execution Time Synthetic Data (Skew=1.5)

23 Varying Time Budget Synthetic Data (Skew=1.5)

24 Results on Real-Life Data SwissProt

25 Conclusions Monolithic optimization  Missed opportunities Selectivity-Based Partitioning Divide & Union approach Multiple join orders per query Join selectivity between relation fragments Partitioning Algorithm Iterative Partitioning Experimental Results Significant reduction of intermediate results

26 Future Work Extension to multiple pivots Partition-then-order optimization Efficient execution of partitioned plans Off-line workload-aware partitioning

27 Thank you!

28

29 Partitioning Model General case: Multi-relation partitioning Our approach: Single-relation partitioning R 1 R 2 R 3 R 4 R 1 R 21 R 4 R 3 R 31 R 22 R 1 R 4 R 1 R 22 R 32 R 4


Download ppt "Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz."

Similar presentations


Ads by Google