Query Reranking As A Service

Slides:



Advertisements
Similar presentations
Group Recommendation: Semantics and Efficiency
Advertisements

Web Information Retrieval
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Best-Effort Top-k Query Processing Under Budgetary Constraints
CS 540 Database Management Systems
PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Querying for Information Integration: How to go from an Imprecise Intent to a Precise Query? Aditya Telang Sharma Chakravarthy, Chengkai Li.
Search Engine – Metasearch Engine Comparison By Ali Can Akdemir.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
SMS-Based web Search for Low- end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University
Niranjan Balasubramanian Aruna Balasubramanian Arun Venkataramani University of Massachusetts Amherst Energy Consumption in Mobile Phones: A Measurement.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Databases & Data Warehouses Chapter 3 Database Processing.
PRESENTED BY- HARSH SINGH A Random Walk Approach to Sampling Hidden Databases By Arjun Dasgupta, Dr. Gautam Das and Heikki Mannila.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
Case Study ProsperaSoft’s global sourcing model gives the maximum benefit to customers in terms of cost savings, improved quality, access to highly talented.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
University of Minnesota Campus Event Finder Department of Computer Science and Engineering, University of Minnesota Presented by Murat Demiray & Mustafa.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
The Ohio State University Efficient and Effective Sampling Methods for Aggregation Queries on the Hidden Web Fan Wang Gagan Agrawal Presented By: Venu.
May 30, 2016Department of Computer Sciences, UT Austin1 Using Bloom Filters to Refine Web Search Results Navendu Jain Mike Dahlin University of Texas at.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
The Business Model of Google MBAA 609 R. Nakatsu.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
Search Engines.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
Combining Fuzzy Information: An Overview Ronald Fagin.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,
B+ Trees: An IO-Aware Index Structure Lecture 13.
February 4, Location Based M-Services Soon there will be more on-line personal mobile devices than on-line stationary PCs. Location based mobile-services.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International University Gerhard Weikum, MPI Informatik.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Search Engine Optimization
Search Engine Optimization
Search Engines and Search techniques
CPS216: Data-intensive Computing Systems
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Indices.
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Discovering the Skyline of Web Databases
Chapter 11: Indexing and Hashing
Rank Aggregation.
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Popular Ranking Algorithms
Exploratory search: New name for an old hat?
Data Mining Chapter 6 Search Engines
A Restaurant Recommendation System Based on Range and Skyline Queries
PBKM: A Secure Knowledge Management Framework
Overview of Query Evaluation
Chapter 11: Indexing and Hashing
The Skyline Query in Databases Which Objects are the Most Important?
Prefer: A System for the Efficient Execution
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Query Reranking As A Service Abolfazl Asudeh Nan Zhang Gautam DaS University of Texas at Arlington George Washington University © 2016 VLDB Endowment 21508097/16/03

Ranked Retrieval Model

Practical Challenges How to design a small number of “good” ranking functions when different users often have diverse and sometimes contradicting preferences e.g., “transit time” in airfare search, “mileage per year” in used car search Similarly, users with disabilities or special needs often need special ranking functions Some database owners lack the expertise, resources, or even motivation to properly study the requirements of their users

Motivation User Backend DB Top-k web Interface Reranking Service filtering+ ranking Reranking Service User

Applications Meta-Ranking Service remember user preferences across multiple web databases (e.g., multiple car dealers) Applications dedicated for users with disabilities, special needs, etc.

Critical Requirements Accuracy: The output query answer must precisely follow the user-specified ranking function Versatility: Support a wide variety of user-specific ranking function (so long as it is monotonic). Allow pass-through of selection conditions. No knowledge of system ranking function. Efficiency: the number of queries issued to the web database must be minimized. Because almost all such databases enforce certain query-rate limit. e.g., Google Flights allows only 50 free queries per user per day.

Solution Novelty: Baselines Crawling the entire database Problem: efficiency ”Page down” [TZD13a, TZD13b] to retrieve more than k tuples Problem: no guarantee of accuracy unless retrieving all n tuples

Database Model Hidden (web) Database Limited query interface Ranked Retrieval Model (Top-k) based on its-own ranking function(s)

Problem Statement Query Reranking (Get-Next) Problem: Consider: a hidden web database D with a top-k interface with an arbitrary, unknown, system ranking function. Given: a user query q a user-specied monotonic ranking function S the top-h (h ≥ 0 can be greater than, equal to, or smaller than k) tuples satisfying q according to S Find the No. (h + 1) tuple for q while minimizing the number of queries issued D.

1D-RERANK

1D Re-Ranking Problem Given: Find: Why? An attribute th, the h-ranked tuple according to the attribute Find: th+1, the (h+1)-ranked tuple according to the attribute Why? 1D  MD using TA or Fagin’s algorithm Rank by “transit time”

1D-BASELINE q2 q1 q0 NULL th t1 t0 th+1 Problem: the query cost depend on the correlation between the system ranking function and the attribute! Negative Result: no algorithm can outperform 1D-BASELINE in the worst-case scenario.

1D-BINARY q2 q3 q1 q0 NULL NULL th t1 t0 th+1 query cost:

until search region is not dense yet 1D-RERANK Key Observation: dense regions  repeated incurrence of high query cost for re-ranking Key Idea: Crawl dense regions to benefit future re-ranking requests (On-The-Fly Indexing) follow 1D-BINARY until search region is not dense yet Oracle ask from oracle if region is dense

MD-RERANK

Applying Sorted-List Algorithms (e.g. TA) on 1D-RERANK The Get-next operation designed by 1D-RERANK can get used to develop a sorted-access top-k algorithm (e.g. TA) on top of it and enable HD Get-next. It has a major efficiency problem, mainly because of not leveraging the full power provided by client- server databases.

MD-BASELINE y apply 1D-RERANK l(y)  q0: select *  t0 q1: where x < t0[x] and y < l(y)  null q2: where x ≥ t0[x] and x < l(x) and y < t0[y]  t1 q3: where x ≥ t0[x] and x < b(x) and y < t0[y]  null q4: where x ≥ b(x) and x < I(x) and y < b(y) null q5: where x < t1[x] and y < t1[y]  null t0 t1 b(y) x b(x) l(x)

MD-BINARY y apply 1D-RERANK l(y)  q0: select *  t0 q1: where x < t0[x] and y < l(y)  null q2: where x ≥ t0[x] and x < l(x) and y < t0[y]  t1 q3: where x ≥ t0[x] and x < v0[x] and y < v0[y]  null q4: where x ≥ t0[x] and x < b(x) and y < t0[y]  null q5: where x ≥ b(x) and x < I(x) and y < b(y) null q6: where x < t1[x] and y < t1[y]  null t0 t1 V0 b(y) x b(x) l(x)

MD-RERANK High-level idea: follow MD-BINARY until a remaining search space (1) is covered by a region in the index; or (2) has volume smaller than the threshold. On-The-Fly Indexing: proactively record as an index densely located tuples once we encounter them, so that we do not need to incur a high query cost every time a query q triggers visits to the same dense region.

Experiments setup Simulating the Top-k web interface on top of an offline dataset. US Department of Transportation (DOT): 457,013 tuples and over 28 attributes. Considered two system ranking functions: SR1(positively correlated ): 0.3 AIR-TIME + TAXI-IN (SR1) – Default function SR2 (negatively correlated): -0.1 DISTANCE - DEP-DELAY (SR2) Default k = 10. Online Experiments Blue Nile (BN) diamonds: largest online retailer of diamonds; contained 117,641 tuples (diamonds) over 12 attributes. Yahoo! Autos (YA): offers a popular search service for used cars; contained 13,169 cars within 30 mile of New York city over 8 attributes.

Offline Experiment Results 1D, Impact of n (SR1) 1D, Impact of n (SR2) 1D, Impact of System-k

Offline Experiment Results MD, Impact of n (SR1) MD, Impact of n (SR2) MD, Impact of System-k

Online Experiment Results BN, 1D: Top-k query cost YA, 1D: Top-k query cost

Online Experiment Results BN, MD: Top-k query cost YA, MD: Top-k query cost

Thanks!