Probabilistic Ranking of Database Query Results

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
CS4432: Database Systems II
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Representing and Querying Correlated Tuples in Probabilistic Databases
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International University Gerhard Weikum, MPI Informatik.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Physical Database Monitoring and Tuning the Operational System.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
Computer Science & Engineering 2111 Introduction to Database Management Systems Relationships and Database Creation 1 CSE 2111 Introduction to Database.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
MIS 3053 Database Design & Applications The University of Tulsa Professor: Akhilesh Bajaj RM/SQL Lecture 1 ©Akhilesh Bajaj, 2000, 2002, 2003, All.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
MULTIMEDIA DATA MODELS AND AUTHORING
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International University Gerhard Weikum, MPI Informatik.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
15.1 – Introduction to physical-Query-plan operators
Database Management System
Ripple Joins for Online Aggregation
Chapter 12: Query Processing
Introduction to Query Optimization
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Probabilistic Ranking of Database Query Results
CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.
Web Couple: Coupling web information
Lecture 2- Query Processing (continued)
Overview of Query Evaluation
Probabilistic Databases
Prefer: A System for the Efficient Execution
Anthony Okorodudu CSE Answering Imprecise Queries over Autonomous Web Databases By Ullas Nambiar and Subbarao Kambhampati Anthony Okorodudu.
Probabilistic Information Retrieval
Presentation transcript:

Probabilistic Ranking of Database Query Results Gautam Das, Surajit Chaudhuri, Vagelis Hristidis, Gerhard Weikum Presented by: Z.M. Joseph Spring 2006, CSE, UT Arlington

Introduction Addresses the Many-Answers problem Thus: Not a very selective query – Has many matching tuples Thus needs some ranking Thus: Specified attributes all match Must look into non-specified attributes

Challenge How do you select based on non-specified attributes? Difficult to get correlation information Expensive to manage

Approach Build off Probabilistic Information Retrieval Combine: Global Score Contains global importance of unspecified attributes Conditional Score Captures strength of correlation between unspecified and specified attributes Preprocessing at Intermediate Knowledge Representation Layer

Recall from PIR We already know that for a tuple t: t can be broken down as: X: As the set of specified attributes Y: The list of unspecified attributes R is the ideal set of result tuples D is a single database table (approximated to ~R)

Structured Data Simplifies to: This automatically increases probability for unspecified attributes that occur more in the ideal tuple set R

Limited Independence Assumptions Possible to capture dependencies and correlations from structured data Efficient approach: X and Y values within themselves are independent of each other Allows derivation of: This assumption may not always be correct!

Workload-Based R Estimation In order to use these techniques, the ideal result set R must be known. Use statistics gathered from the workload View the workload as a set of tuples containing each query and the specified attributes Thus can replace P(y|R) with P(y|X,W) Properties of R can be obtained by examining the workload for queries that retrieved X in the past

Workload-Based R Estimation Thus the ranking function is: Does not contain R Quantities are all ‘atomic’ and can be computed First part is global, second part is conditional Can use association rules for , etc. These values stored in intermediate knowledge representation layer

Implementation Atomic Probabilities Module – stores atomic quantities in the intermediate knowledge representation layer Index Module – Uses inputs and association rules to create global and conditional scores Scan Algorithm – Selects tuples that satisfy the condition and then finds the ranking based on the scores List Merge Algorithm – Alternate to scanning

Conclusion Gives a ranking for the Many-Answer problem by factoring in unspecified attributes Automated Makes use of workload statistics and correlations Can still be adjusted by users and/or domain experts Can use user feedback as well