Presentation is loading. Please wait.

Presentation is loading. Please wait.

PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.

Similar presentations


Presentation on theme: "PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T."— Presentation transcript:

1 PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T Research Yannis Papakonstantinou University of California, San Diego

2 Example

3 ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

4 Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price

5 Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price Problem: Retrieve WHOLE relation

6 Example ORDER BY 0.01· Mileage + 0.6·Year + 0.03· Price Problem: Retrieve WHOLE relation PREFER retrieves only part of relation

7 Applications Such preference queries are used in Web sites like: www.Zagat.com ( restaurants)www.Zagat.com www.personallogic.com (online retailer)www.personallogic.com

8 Definitions - Problem statement A preference query orders the tuples of a relation according to a function of the attribute values. eg: 0.01· Mileage + 0.6·Year + 0.03· Price Goal is to produce top-K answers of a preference query, retrieving the minimum # of tuples

9 Our Approach PREFER materializes a number of ranked views of the relation and uses them to efficiently answer to preference queries.

10 Our Approach Ranked view 0.08*Price + 0.2*Year 0.08 0.2 Price Year Ranked view 0.075*Price + 0.8*Year

11 Our Approach Ranked view 0.08*Price + 0.2*Year 0.08 0.2 Price Year Preference query: 0.07*Price + 0.35*Year 0.07 0.35 Ranked view 0.075*Price + 0.8*Year

12 Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? PREFER Architecture Views Creation Preprocessing stage

13 View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results Runtime Process Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation How to use a preference view to answer to a preference query

14 View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results Runtime Process How to use a preference view to answer to a preference query Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation

15 t1t1 Watermark = 14.26 Car ID...Doorsfqfq Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price last tuple Watermark

16 Calculating the Watermark Watermark

17 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price t1t1 1.Calculate Watermark for t 1, which is 14.26 Car ID

18 How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q Car ID

19 How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q Car ID

20 How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 Car ID 2 1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price

21 How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 1.Calculate Watermark for t 1, which is 14.26 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price

22 How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 13.1 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1

23 How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price 1.Calculate Watermark for t 1, which is 13.1 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 3

24 How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm t1t1 1.Calculate Watermark for t 1, which is 8.3 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 3

25 Ranked View, ordered by 0.02*Mileage+0.4*Year+0.04*Price How to use a ranked view to answer a preference query (cont’d) PipelineResults Algorithm Result, ordered by 0.01*Mileage+0.6*Year+0.03*Price t1t1 1.Calculate Watermark for t 1, which is 8.3 2.Find prefix of view with f v greater than watermark value and sort them by f q 3.Output tuples up to t 1 4.Repeat using first unprocessed as t 1 Car ID 2 1 3 5 4

26 View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results How to use a preference view to answer to a preference query Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation Runtime Process

27 Define coverage 0.8 0.2 Year Price Ranked view 0.8*Price + 0.2*Year V1V1 q1q1 Preference query: 0.7*Price + 0.35*Year 0.7 0.35 V 1 covers q 1 : At most k tuples are retrieved from V 1 in order to output first result of q 1.

28 Which ranked view should we use to answer to a specific preference query? Ranked view 0.8*Price + 0.2*Year 0.8 0.2 Price Year Ranked view 0.75*Price + 0.8*Year

29 Ranked view 0.8*Price + 0.2*Year 0.8 0.2 Price Year Ranked view 0.75*Price + 0.8*Year Which ranked view should we use to answer to a specific preference query?

30 Ranked view 0.8*Price + 0.2*Year 0.8 0.2 Price Year Preference query: 0.7*Price + 0.35*Year 0.7 0.35 Ranked view 0.75*Price + 0.8*Year V 1 covers q 1 Which ranked view should we use to answer to a specific preference query? V1V1 q1q1

31 View Selection Query Pipelining Algorithm Query Ranked View id Mat.Views Output results How to use a preference view to answer to a preference query Which ranked view should we use to answer to a specific preference query? PREFER Architecture index of mat. views Preprocessing stage Relation Space constraints Discretization of ranked views’ vectors. Which ranked views should we materialize? Views Creation Runtime Process

32 Which ranked views should we materialize? ViewSelection Algorithm                     while (not all preference vectors in [0,1] n covered) Randomly pick v  [0,1] n and add it to the list of views L VIEWS  for i = 1 to C do select v  L that covers the maximum number of uncovered vectors in [0,1] n VIEWS  VIEWS  v

33 Which ranked views should we materialize? (cont’d) ViewSelection Algorithm                     while (not all preference vectors in [0,1] n covered) Randomly pick v  [0,1] n and add it to the list of views L VIEWS  for i = 1 to C do select v  L that covers the maximum number of uncovered vectors in [0,1] n VIEWS  VIEWS  v                   

34 Which ranked views should we materialize? (cont’d) ViewSelection Algorithm                     while (not all preference vectors in [0,1] n covered) Randomly pick v  [0,1] n and add it to the list of views L VIEWS  for i = 1 to C do select v  L that covers the maximum number of uncovered vectors in [0,1] n VIEWS  VIEWS  v                   C = 3

35 Constraint on # of views  Maximum coverage problem using the minimum # of materialized views is NP- Hard.  Greedy Heuristic is approximation for maximum coverage.

36 Related Work Preference Query Framework [AW00] Top-k queries –Joins Fagin [F99,F96,F01], equijoins of ordered data –Selections [reduce top-k selection to range query] Histograms to estimate cutoff [Chaudhuri&Gravano 99] Probabilistic model [Donjerkovic&Ramakrishnan 99] Partitioning [Carey & Kossman 97,98]

37 Related Work The Onion Technique (Sigmod 2000). Main observation: the points of interest lie on the convex hull of the tuple space. Drawbacks of Onion: Does not scale Computing the convex hull is very computationally intensive Not efficient if the domain of an attribute has a small cardinality Not efficient for more than the top-1 result

38 Experiments Measured parameters # attributes size of relation # views constraint on max # tuples retrieved

39 Parameters of Experiments synthetic datasets 3 to 5 attributes 10,000 to 500,000 tuples random & correlated data discretization of 0.1 or 0.05

40 Experiments (cont’d) Dual PII CPU, 512MB RAM, 4 attr, 50,000 tuples, 34 Views

41 Experiments (cont’d) 4 attr, constraint = 500 tuples, discretization = 0.1

42 Experiments (cont’d) 500,000 tuples, constraint = 500 tuples, discretization = 0.05...0.1

43 Experiments (cont’d) 4 attr, discretization = 0.1

44 Experiments (cont’d) 4 attr, discretization = 0.1

45 Experiments (cont’d) 50,000 tuples, 3 attr, discretization = 0.05

46 More Resources www.db.ucsd.edu/PREFER PREFER demo PREFER Application –Construct Materialized Views –Issue preference queries MS Windows, on top of Oracle DBMS

47 Conclusions Methodology to efficiently answer to top-K linearly weighted queries Algorithm that uses a ranked view to answer to a preference query Ranked materialized views were used Experimental evaluation


Download ppt "PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T."

Similar presentations


Ads by Google