Download presentation
Presentation is loading. Please wait.
Published byPaola Curtis Modified over 10 years ago
2
Introduction Views Related Work Preliminaries Problems Discussed Algorithm LPTA View Selection Problem Experimental Results
3
Answering Top-k Queries Active research topic Retrieve quickly a number(k) of highest ranking tuples in presence of monotone ranking functions defined on attributes of underlying relations Algorithms Threshold Algorithm (TA) by Fagin et. al., Independently by Guntzer et. al., Nepal et. al., INTRODUCTIONINTRODUCTION
4
Materialized Views A database table that contains the results of the query previously asked. Actually constructed and stored. Problem Discussed To find efficient methods of answering a query using a set of previously defined materialized views over the database. Why Views? Relevance to a variety of data management problems. Promised increased in performance. Views are materialized (incurring a space overhead) with the hope to gain in performance for some queries. INTRODUCTIONINTRODUCTION
5
Views do not specify any selection conditions on the attributes they aim to rank. Example: (TOP-k) INTRODUCTIONINTRODUCTION tidX1X2X3 182159 2531983 32912 4802290 528887 6125582 7169942 8184267 942123 10232188 R tidScore 7527 6299 4270 8246 2201 tidScore 6219 4202 10197 f1=2x1+5x2 f2=x2+2x3 View1 (V1) Top-5 Query View2 (V2) Top-3 Query
6
Given a top-2 query defined using function f3=3x1+10x2+5x3, we can apply standard top-k algorithm(e.g., TA) using the data from R and obtain answer to the query. Using Views? Feasibility Guarantee an answer Speed of using R directly vs. Using Views INTRODUCTIONINTRODUCTION
7
Multimedia Context: Uses ordered lists Threshold Algorithm: This algorithm requires the scoring function to be monotonic. i.e. For tuples t and u, t[i]<u[i], 1≤i≤100, then Score Q (t)≤Score Q (u). TA requires that each attribute has an index mechanism that allows all tids to be accessible in sorted order. A single random access is required to resolve all attributes of a tid. In our paper we focus on Additive scoring functions(monotonic), where Score Q (t)=w 1 t[1]+ w 2 t[2]+….+ w m t[m] RELATEDWORKRELATEDWORK
8
Variants: TA-Sorted - Lists are always accessed sequentially and NO random accesses are performed. PREFER [Hristidis et. al.,] : Storing multiple copies of ‘R’. It assumes to utilize only one copy of a relation which is closest to the new query to answer the new query. RELATEDWORKRELATEDWORK
9
Consider Relation R with m numeric attributes (X1, X2…Xm) Dom i =[lb i, ub i ] domain of ith attribute. Tuple t is viewed as numeric vector t=(t[1], t[2]… t[m]) Top-k Ranking Queries in SQL-like syntax: SELECT TOP[k] FROM R WHERE Range Q ORDER BY Score Q Expressed as a triple Q=( Score Q, k, Range Q ) Score Q : Function that assigns a numeric score to any tuple ‘t’. Range Q : Boolean function that defines a selection condition for the tuples of ‘R’. The semantics requires that the system retrieve the k tuples with the top scores satisfying the selection condition. PRELIMINARIESPRELIMINARIES
10
Materialized Ranking View(V): Materialized result of the tuples of a previously executed top-k query Q, ordered according to the scoring function Score Q. Q’=(Score Q’, k’, Range Q’ ) Corresponding materialized ranking view’ is a set of k(tid, Score Q (tid) pairs, ordered by decreasing the values of Score Q (tid). PRELIMINARIESPRELIMINARIES
11
Problem 1: TOP-k QUERY ANSWERING USING VIEWS Given a set of views and a query Q, obtain an answer to Q combining all the information conveyed by the views in U. SOLUTION: Algorithm named LPTA. Problem 2: VIEW SELECTION Given a collection of views V={V 1, V 2 … V R } that includes the base views(thus r ≥ m) and a query Q, determine the most efficient subset U ⊆ V to execute Q on. Such a subset U will be provided as input to LPTA. Should identify a set of views that can provide an answer to the query and at same time provide the answer faster than running TA on the base set of views, if possible. PROBLEMSPROBLEMS
12
ALGORITHMLPTAALGORITHMLPTA An adaptation of TA algorithm in the sense that it answers top- k queries using multiple ranking views Requires the scoring functions of the query & the views to be linear and additive Sorted access on pairs (tid, score Q (tid)) Views and Queries are of the form V’ = (Score V’, n, *) and Q=(Score Q, k, *) respectively. Pseudo code Example General Approach
13
ALGORITHMLPTAALGORITHMLPTA Pseudo code Initialize top-k buffer to empty. Retrieve the tids from the views V1 and V2 in a lock-step fashion, in the order of decreasing score. Retrieve corresponding tuple by random access on R. Compute score according to f3 and update top-k buffer to contain largest scores. Check the stopping condition. Once the stopping condition is satisfied we will have the results in the top-k buffer.
14
ALGORITHMLPTAALGORITHMLPTA Stopping Condition: After dth iteration, let the tuple read from V1= (tid 1 d, s 1 d ) and V2= (tid 2 d, s 2 d ) and minimum score in the top-k buffer be top-k min At this point the unseen tuples have to satisfy the following inequalities: ( Domain of each attribute of R = [1, 100]) 0≤X1, X2, X3≤100 2x1 + 5x2 ≤ s 1 d x2 + 2x3 ≤ s 2 d This will represent a convex region in 3-d space. unseen max will be the solution to the linear program where we maximize the function f3=3x1+10x2+5x3
15
ALGORITHMLPTAALGORITHMLPTA Example: (TOP-k Query Answering using Views) tidX1X2X3 182159 2531983 32912 4802290 528887 8184267 942123 10232188 R tidScore 4270 8246 2201 tidScore 10197 f1=2x1+5x2 f2=x2+2x3 View1 (V1) Top-5 Query View2 (V2) Top-3 Query f3=3x1+10x2+5x3 Query = (f3, k, *) top-2 buffer 7 6 527 299 6219 4202 {tid i d, s i d }={(7,1248), (6,996)} Linear Programming Solution with s 1 d =527 and s 2 d =219 gives unseen max = 1388 (7,1248) (6,996) 7 16 99 42 6 12 55 82
16
ALGORITHMLPTAALGORITHMLPTA Example: (TOP-k Query Answering using Views) tidX1X2X3 182159 2531983 32912 528887 7169942 8184267 942123 10232188 R tidScore 4270 8246 2201 tidScore 10197 f1=2x1+5x2 f2=x2+2x3 View1 (V1) Top-5 Query View2 (V2) Top-3 Query f3=3x1+10x2+5x3 Query = (f3, k, *) top-2 buffer (7, 1248) (6, 996) 7 6 527 299 6219 4202 {tid i d, s i d }={(6,996), (4, 910)} Linear Programming Solution with s 1 d =299 and s 2 d =202 gives unseen max = 953.5 6 12 55 82 4 80 22 90 ≤ top-k min
17
ALGORITHMLPTAALGORITHMLPTA V1 s11s11 tid 1 2 s12s12 tid 1 3 s13s13 tid 1 4 s14s14 tid 1 5 s15s15 V2 s21s21 tid 2 2 s22s22 tid 2 3 s23s23 tid 2 4 s24s24 tid 2 5 s25s25 tid 1 1 R(X 1, X 2 )Top-1 V1 V2 Q stopping condition X1X1 X2X2 R=(1,1) tid 2 1 tid 1 1 P=(1,0) O=(0,0) T=(0,1)
18
ALGORITHMLPTAALGORITHMLPTA 0 ≤ x1, x2, x3 ≤ 100 2x1 + 5x2 ≤ s 1 d x2 + 2x3 ≤ s 2 d fV1=2x1+5x2 fV2=x2+2x3 Q: fQ=3x1+10x2+5x3 R(X 1, X 2 ) tidscore tid 1 d s1ds1d tidscore tid 2 d s2ds2d d iteration View1 (V1) View2 (V2) unseen max ≤ top-k min
19
ALGORITHMLPTAALGORITHMLPTA V1 tid 1 1 s11s11 s12s12 tid 1 3 s13s13 tid 1 4 s14s14 tid 1 5 s15s15 V2 tid 2 1 s21s21 s22s22 tid 2 3 s23s23 tid 2 4 s24s24 tid 2 5 s25s25 R(X 1, X 2 ) tid 1 2 tid 2 2 V1 V2 Q stopping condition Top-1 X1X1 X2X2 P=(1,0) O=(0,0) T=(0,1) R=(1,1) tid 2 1 tid 1 1
20
TAVSLPTATAVSLPTA LPTA essentially becomes TA when the set of views U equal to the set of base views In terms of execution cost both have Sequential as well as Random Access Execution Efficiency: I/O Operations play a significant role – they overshadow the costs of CPU operations such as updated top-k buffer, testing for stopping condition & so on. Highly correlated: every sequential access incurs a random access. Determining factor: If d = number of lock-step iterations and r = no. of views, then running Cost is O(dr).
21
VIEWSELECTIONVIEWSELECTION Given a collection of views Ѵ = {V 1,V 2,…. V r } that includes base views determine the most efficient subset U ⊆ Ѵ to execute the query Q on. Conceptual Discussion View Selection in Two Dimensions View Selection in Higher Dimensions
22
V I E W S E L E C T I O N 2D R=(1,1) O=(0,0) P=(1,0) T=(1,0) V2V2 V1V1 Q A1 A’1 A A’2 M B’1 B’2 B2 B Min top-k tuple X Y
23
V I E W S E L E C T I O N HD For Ѵ = {V 1,V 2,…. V r } being a set of views for m- dimensional dataset, Q being query, the optimal execution of LPTA requires the use of a subset of the views U ⊆ Ѵ such that |U| < m.
24
COSTESTIMATIONCOSTESTIMATION Compute histograms representing the distribution of scores along each view in U. Estimate top k min from H q by determining the bucket which contains the k th highest tuple. “Walkdown” these histograms until the stopping condition is reached. Check stopping condition by linear programming. When Unseen max < top k min then perform logarithmic search within last bucket. Number of sorted accesses ((d-1)n/b + n’)r’. Running time of algorithm is O((d-1)+log n’)
25
SELECTVIEWSSELECTVIEWS Consider MinCost and MinCurCost = ∞, U={ }, Vє Ѵ-U Compare the cost estimate for V with MinCurCost, if EstimateCost < MinCurCost, add V to MinV. MinCurCost is now is EstimateCost of V. ∀ V, above steps are followed When MinCurCost < MinCost, V is added U This is repeated for all the attributes m considered.
26
Select Views(Q,V) / Exhaustive : Estimates cost of all possible ( r p )subsets of V to select one with minimum cost. Simple Greedy Heuristic : Iterates the set of views, selects the one that reduces the total cost by the greatest amount. SELECTVIEWSSELECTVIEWS
27
Select Views Spherical(Q,V) : it has to solve linear program just once and is very effective for highly restrictive data sets. Select view By Angles : sorts the view vectors by increasing angle with query vector returning top-m views. SELECTVIEWSSELECTVIEWS
28
Views that Only Materialize their Top-k Tuples Truncate the histograms Accommodating Range Conditions Select the views that cover the range conditions. Truncate each attribute’s histogram MOREGENERALQUERIES&VIEWSMOREGENERALQUERIES&VIEWS
29
EXPERIMENTALRESULTSEXPERIMENTALRESULTS Real Data, performance comparison of PREFER, LPTA, TA (2d) (3d)
30
REFERENCESREFERENCES Answering Top-k Queries Using Views: Gautam Das, Dimitrios Gunopulos, Nick Koudas aitrc.kaist.ac.kr/~vldb06/slides/R13-1.ppt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.