Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seung-won Hwang, Kevin Chen-Chuan Chang

Similar presentations


Presentation on theme: "Seung-won Hwang, Kevin Chen-Chuan Chang"— Presentation transcript:

1 Seung-won Hwang, Kevin Chen-Chuan Chang
Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach Seung-won Hwang, Kevin Chen-Chuan Chang

2 Problem: Web “Middleware” Top-k Query Processing
To evaluate each predicate pi source Si provides: sorted access e.g., returning the restaurant with the next highest rating random access for each object uj e.g., returning the rating for a specific restaurant uj S2:superpages.com Middleware top-k Algorithm F=min(p1, p2) k v1: F[v1] … ... vk: F[vk] S1:dineme.com p2: close p1: rating

3 Goal: Minimizing Access Cost
Various cost scenarios Cost model: Aggregate cost of all predicate accesses Goal: Minimizing the access cost s1 =32ms r1 =700ms s1, s2,s3 =44ms r1, r2,r3 =0ms dineme.com p1: rating hotels.com p1: rating s2 =344ms r2 =1400ms p2: close superpages.com p2: close p3: cheap

4 Beyond State-of-the-art: How to be General and Adaptive?
Current state-of-the-art: Fixed algorithms for a specific scenario Random Access Sorted Access r =1 (cheap) r = h (expensive) r = ¥ (impossible) FA, TA, QuickCombine CA, SR-Combine s =1 (cheap) NRA, StreamCombine The space is not complete in two senses: There are missing algorithms in this space; There are unmodeled scenarios in this space FA, TA, QuickCombine NRA, StreamCombine s = h (expensive) TAz, MPro, Upper s = ¥ (impossible) TAz, MPro, Upper

5 Solution: A Cost-based Approach
Cost-based optimization: Finding optimal algorithm, with minimum cost, from a space  General across a wide range of scenarios One “algorithm” for all Adaptive to the specific one at run time Truly optimal (in principle)

6 Challenges: Enabling Cost-based Optimization
Challenge #1: Defining algorithm space  Analogy: SQL queries are composed of logical operators to schedule into a query plan.  What are such “logical tasks” for top-k queries, as a building block of algorithm space? Challenge #2: Searching for Mopt   Analogy: SQL queries are optimized with systematic heuristics (e.g., left-deep joins) and search schemes (e.g., dynamic programming)  What are efficient search schemes for top-k queries?

7 Challenge #1: Defining Algorithm Space
Basis: View of logical tasks For every object ui, any algorithm must satisfy logical task wi: If ui is top-k: wi must compute the exact score; Otherwise: wi must indicate (by some partial scores) that score will be less than lowest-topk-score How to define an algorithm space? How to identify unsatisfied tasks?

8 Challenge #2: Searching for Mopt  
Space reduction: By “systematic” heuristics S-then-R: For each predicate pi, perform sorted accesses first to depths di, before any random accesses Global schedule: For each object, follow the same schedule H for random accesses of any object Cost estimation: Sampling (getting “statistics”): Sample a representative subset from DB Simulation (getting overall costs) Simulate query plans on sample to estimate their costs Dynamic search over different query plans Hill-climbing and query-driven strategies

9 Contribution: Unification and Contrast
Unification: For symmetric function, e.g., avg(p1, p2), framework NC behaves similarly to TA Contrast: For asymmetric function, e.g., min(p1, p2), NC adapts with different behaviors and outperforms TA cost cost N N T depth into p2 depth into p2 T N depth into p1 depth into p1

10 Contribution: Generality and Adaptivity
Over 1000 random configurations, For unstudied scenarios (74%), NC generalizes with significantly better performances For existing scenarios (26%), NC adapts to similar behaviors to specific algorithms existing scenarios unstudied scenarios

11 Conclusion: Summary For a general and adaptive optimization of top-k queries, we developed: Key insight: Abstracting top-k query as a task scheduling problem Algorithm space for top-k queries: Defining an algorithm space considering only those scheduling unsatisfied tasks Dynamic search schemes: Identifying efficient search schemes for top-k queries

12 Thank You! For more information:
The AIM Project:


Download ppt "Seung-won Hwang, Kevin Chen-Chuan Chang"

Similar presentations


Ads by Google