Seung-won Hwang, Kevin Chen-Chuan Chang

Slides:



Advertisements
Similar presentations
CS CS4432: Database Systems II Logical Plan Rewriting.
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Best-Effort Top-k Query Processing Under Budgetary Constraints
Cost-Based Plan Selection Chapter 16 Section 5 ID: 213.
Sharing Aggregate Computation for Distributed Queries Ryan Huebsch, UC Berkeley Minos Garofalakis, Yahoo! Research † Joe Hellerstein, UC Berkeley Ion Stoica,
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.
Fast Force-Directed/Simulated Evolution Hybrid for Multiobjective VLSI Cell Placement Junaid Asim Khan Dept. of Elect. & Comp. Engineering, The University.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
5-1 Chapter 5 Tree Searching Strategies. 5-2 Breadth-first search (BFS) 8-puzzle problem The breadth-first search uses a queue to hold all expanded nodes.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
Optimal Scheduling of File Transfers with Divisible Sizes on Multiple Disjoint Paths Mugurel Ionut Andreica Polytechnic University of Bucharest Computer.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Efficient Processing of Top-k Spatial Preference Queries
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
1 Lecture 25: Query Optimization Wednesday, November 26, 2003.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Boolean + Ranking: Querying a Database by K-Constrained Optimization Joint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won.
1 VLDB, Background What is important for the user.
1 Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song Presented by: Mariam John CSE /20/2006 RankSQL: Query Algebra and Optimization for Relational.
Chiu Luk CS257 Database Systems Principles Spring 2009
Chapter 14: Query Optimization
Supporting Ranking and Clustering as Generalized Order-By and Group-By
Boolean + Ranking: Querying a Database by K-Constrained Optimization
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Last time: search strategies
Computing and Compressive Sensing in Wireless Sensor Networks
Efficient Join Query Evaluation in a Parallel Database System
Chapter 13: Query Optimization
A paper on Join Synopses for Approximate Query Answering
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Computer Science cpsc322, Lecture 14
On Spatial Joins in MapReduce
A Unifying View on Instance Selection
Heuristic search INT 404.
Sheqin Dong, Song Chen, Xianlong Hong EDA Lab., Tsinghua Univ. Beijing
Adaptive Cleaning for RFID Data Streams
Informed search algorithms
Efficient Subgraph Similarity All-Matching
Design & Analysis of Algorithms Combinatorial optimization
A Framework for Testing Query Transformation Rules
RFID Object Localization
Identification of Variation Points Using Dynamic Analysis
Search.
Search.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Tree Searching Strategies
Efficient Processing of Top-k Spatial Preference Queries
Lecture 24: Wednesday, November 27, 2002.
Presentation Title September 22, 2019
Presentation transcript:

Seung-won Hwang, Kevin Chen-Chuan Chang Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach Seung-won Hwang, Kevin Chen-Chuan Chang

Problem: Web “Middleware” Top-k Query Processing To evaluate each predicate pi source Si provides: sorted access e.g., returning the restaurant with the next highest rating random access for each object uj e.g., returning the rating for a specific restaurant uj S2:superpages.com Middleware top-k Algorithm F=min(p1, p2) k v1: F[v1] … ... vk: F[vk] S1:dineme.com p2: close p1: rating

Goal: Minimizing Access Cost Various cost scenarios Cost model: Aggregate cost of all predicate accesses Goal: Minimizing the access cost s1 =32ms r1 =700ms s1, s2,s3 =44ms r1, r2,r3 =0ms dineme.com p1: rating hotels.com p1: rating s2 =344ms r2 =1400ms p2: close superpages.com p2: close p3: cheap

Beyond State-of-the-art: How to be General and Adaptive? Current state-of-the-art: Fixed algorithms for a specific scenario Random Access Sorted Access r =1 (cheap) r = h (expensive) r = ¥ (impossible) FA, TA, QuickCombine CA, SR-Combine s =1 (cheap) NRA, StreamCombine The space is not complete in two senses: There are missing algorithms in this space; There are unmodeled scenarios in this space FA, TA, QuickCombine NRA, StreamCombine s = h (expensive) TAz, MPro, Upper s = ¥ (impossible) TAz, MPro, Upper

Solution: A Cost-based Approach Cost-based optimization: Finding optimal algorithm, with minimum cost, from a space  General across a wide range of scenarios One “algorithm” for all Adaptive to the specific one at run time Truly optimal (in principle)

Challenges: Enabling Cost-based Optimization Challenge #1: Defining algorithm space  Analogy: SQL queries are composed of logical operators to schedule into a query plan.  What are such “logical tasks” for top-k queries, as a building block of algorithm space? Challenge #2: Searching for Mopt   Analogy: SQL queries are optimized with systematic heuristics (e.g., left-deep joins) and search schemes (e.g., dynamic programming)  What are efficient search schemes for top-k queries?

Challenge #1: Defining Algorithm Space Basis: View of logical tasks For every object ui, any algorithm must satisfy logical task wi: If ui is top-k: wi must compute the exact score; Otherwise: wi must indicate (by some partial scores) that score will be less than lowest-topk-score How to define an algorithm space? How to identify unsatisfied tasks?

Challenge #2: Searching for Mopt   Space reduction: By “systematic” heuristics S-then-R: For each predicate pi, perform sorted accesses first to depths di, before any random accesses Global schedule: For each object, follow the same schedule H for random accesses of any object Cost estimation: Sampling (getting “statistics”): Sample a representative subset from DB Simulation (getting overall costs) Simulate query plans on sample to estimate their costs Dynamic search over different query plans Hill-climbing and query-driven strategies

Contribution: Unification and Contrast Unification: For symmetric function, e.g., avg(p1, p2), framework NC behaves similarly to TA Contrast: For asymmetric function, e.g., min(p1, p2), NC adapts with different behaviors and outperforms TA cost cost N N T depth into p2 depth into p2 T N depth into p1 depth into p1

Contribution: Generality and Adaptivity Over 1000 random configurations, For unstudied scenarios (74%), NC generalizes with significantly better performances For existing scenarios (26%), NC adapts to similar behaviors to specific algorithms existing scenarios unstudied scenarios

Conclusion: Summary For a general and adaptive optimization of top-k queries, we developed: Key insight: Abstracting top-k query as a task scheduling problem Algorithm space for top-k queries: Defining an algorithm space considering only those scheduling unsatisfied tasks Dynamic search schemes: Identifying efficient search schemes for top-k queries

Thank You! For more information: The AIM Project: http://aim.cs.uiuc.edu