Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University.

Similar presentations


Presentation on theme: "Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University."— Presentation transcript:

1 Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University of Florida

2 View Selection Problem What’s a Data Warehouse? –stores info. collected from multiple, heterogeneous info. sources to support complex querying and analysis Materialized Views in a DW –pre-computed portions of frequently asked queries –maintenance : incremental, periodic refresh View Selection Problem –decide which views to materialize in DW –considers query response time, maintenance cost(?), and storage cost

3 Overview of Our Problem Maintenance-cost View Selection Problem [GM99] –decide which views to materialize in DW –minimize query response time, given an upper bound on maintenance cost (storage space is not considered) DW Configuration based on OR view graphs –Any view can be computed from any of its related views

4 Problems with existing Solutions Existing Solutions to the View Selection Problem –Heuristics-based Search Greedy Heuristics [Gup97,GM99] A* [Rou82, LQA97, GM99] –Exhaustive Search [TS97] Problems –Does not scale up well for more than 20 views –Time complexity is polynomial –DW evolution requires efficient re-computation of a configuration

5 Outline of Our Approach Use Randomized Algorithms –Randomized algorithms provide good solution within a small amount of time (time/quality tradeoff) –Specifically, use Genetic Algorithms (GA) Advantages of Our Approach –Near linear scaleup with a solution within 90% of optimal –Support DW evolution with fast reconfiguration

6 Genetic Algorithms Loop until termination condition = true t=t+1 Select P(t) from P(t-1) Recombine P(t) Evaluate P(t)

7 GA : Representation of Solution Genome –Candidate Solution of the problem to be solved –Represented as a String –Ordering problems : Alphanumeric String Selection problems : Binary String Binary String Representation –ex) v1 v2 v3 v4 v5 0 1 0 0 1  views v2 and v5 are selected

8 GA : Initialization of Population Initial Population in our experiments –Pool of randomly generated bit strings –population size is 300 Future experiments –generate more favorable initial population –use external knowledge of problem

9 GA : Selection, Crossover, Mutation, Termination Selection –Select superior genomes among previous population –Roulette Wheel Method [Mic94] Crossover –applied to two genomes by exchanging information Mutation –applied to a single genome : ex) flip a bit in the genome Termination –termination condition : 400 generations

10 GA : Evaluation Process Fitness Function –measures how good a genome is as a solution –high : close to optimal, low : further from optimal Use Penalty Function in Fitness Function –similar solution to 0/1 knapsack solution[Mic94] –Evaluate query benefit. If maintenance limit is exceeded, apply penalty.

11 GA : Evaluation Process Penalty Functions –Logarithmic (LG) –Linear (LN) –Exponential (EX) Penalty Application Methods –Subtract (S) –Divide (D) –Subtract&Divide (SD)

12 Evaluation of the Algorithm Environment –Pentium II 450 MHz PC, Windows NT 4.0 OR-view graphs –number of base tables : 10 tables –number of views : 5-20 views –edge density of graph : 15%, 30%, 50%, 75% –parameters for node (view) & parameters for edge RC : 100 - 10000 for base tables QC : 10 - 80% of RC of QF : 0.1- 0.9 source view UF : 0.1 - 0.9 MC : 10 - 150% of QC

13 Results : Quality of Solutions

14 Results : Execution Time

15 Prototype Development Used version 2.4.3 of Galib from MIT Microsoft Visual C++ Encoded our own Fitness Function –strategy for penalty is controlled by a control variable Encoded OR-view graph cost evaluation functions –total query cost, total maintenance cost OR-view graph costs –Node: Read Cost, Query Frequency, Update Frequency –Edge: Query Cost, Maintenance Cost

16 Conclusion Use of Genetic Algorithm for Maintenance-cost View Selection Problem –yields a solution within 10% of optimal solution –linear scale up for execution time w.r.t number of views –EX-D and EX-SD strategy produce best results –Suitable for use in DW evolution Future work –experiments with better initial population –various crossover and mutation operators, termination condition –AND-OR views, indexes –parallel version of GA


Download ppt "Speeding Up Warehouse Physical Design Using A Randomized Algorithm Minsoo Lee Joachim Hammer Dept. of Computer & Information Science & Engineering University."

Similar presentations


Ads by Google