A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB 2009 2010.

Slides:



Advertisements
Similar presentations
Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.
Advertisements

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
指導教授:陳良弼 老師 報告者:鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
Representing and Querying Correlated Tuples in Probabilistic Databases
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Jiang Chen Columbia University Ke Yi HKUST. Motivation  Uncertain data naturally arises in many applications: sensor data, fuzzy data integration, data.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Efficient Query Evaluation on Probabilistic Databases
Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,
Cleaning Uncertain Data for Top-k Queries Luyi Mo, Reynold Cheng, Xiang Li, David Cheung, Xuan Yang The University of Hong Kong {lymo, ckcheng, xli, dcheung,
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal, Jian Li & Amol Deshpande.
Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented.
On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
Probabilistic Databases Amol Deshpande, University of Maryland.
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Presented by Xi Zhang Feburary 8 th, 2008.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Efficient Processing of Top-k Spatial Preference Queries
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Query Processing over Incomplete Autonomous Databases Presented By Garrett Wolf, Hemal Khatri, Bhaumik Chokshi, Jianchun Fan, Yi Chen, Subbarao Kambhampati.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue.
A Unified Framework for Efficiently Processing Ranking Related Queries
Probabilistic Data Management
Probabilistic Data Management
Lecture 16: Probabilistic Databases
Probabilistic Data Management
Probabilistic Data Management
Probabilistic Databases
Efficient Processing of Top-k Spatial Preference Queries
Probabilistic Ranking of Database Query Results
Presentation transcript:

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB Presented by JongHeum Yeon

Probabilistic Databases  Motivation Increasing amounts of uncertain data Information retrieval, data integration and cleaning, text analytics, social network analysis  Probabilistic Databases Annotate tuples with existence probabilistics, and/or attribute values with probability distribution Interpretation according to the “possible worlds semantics” 2Copyright© 2010 by CEBTCenter for E-Business Technology

Possible World Semantics IDScoreProb t1t t2t t3t Copyright© 2010 by CEBTCenter for E-Business Technology IDScore t1t1 200 t2t2 150 t3t3 100 IDScore t1t1 200 t2t2 150 IDScore t2t2 150 t3t3 100 pw1 pw2 pw3 w.p A probabilistic table (assume tuple-independence) w.p w.p

Possible World Semantics IDScoreProb t1t t2t t3t Copyright© 2010 by CEBTCenter for E-Business Technology IDScore t1t1 200 t2t2 150 t3t3 100 IDScore t1t1 200 t2t2 150 IDScore t2t2 150 t3t3 100 pw1 pw2 pw3 w.p A probabilistic table (assume tuple-independence) w.p w.p Score values are used to rank the tuples in every pw The top-1 answer for each possible world

Probabilistic Databases (cont’d) 5Copyright© 2010 by CEBTCenter for E-Business Technology

Top-k Queries  Multi-criteria optimization problem [score=100, Pr(t 1 )=0.5] vs. [score=50, pr(t 2 )=1.0]  Many Prior Proposals Return k tuples t with the highest score(t)Pr(t) [exp. score] Return the most probable top k-answer [U-top-k] – [Solimanet al. ’07] At rank i, return tuple with max. prob. Of being at rank i [U-rank-k] – [Solimanet al. ’07] Return k tuples t with the largest Pr(r(t)≤k) values [PT-k/GT-k] – [Huaet al. ’08] [Zhang et al. ’08] Return k tuples t with smallest expected rank:∑ pw Pr(pw)r pw (t) – [Cormodeet al. ’09] 6Copyright© 2010 by CEBTCenter for E-Business Technology

Top-k Queries (cont’d)  Which one should we use?  Comparing different ranking functions Normalized Kendall Distance between two top-k answers: – Penalizes #reversals and #mismatches – Lies in [0,1], 0: Same answers; 1: Disjoint answers 7Copyright© 2010 by CEBTCenter for E-Business Technology

Unified Approach  Define two parameterized ranking functions: PRF w ; PRF e that can simulate or approximate a variety of ranking functions PRF e much more efficient to evaluate (than PRF w ) 8Copyright© 2010 by CEBTCenter for E-Business Technology

Outline  Parameterized Ranking Functions  Computing PRF Independent Tuples Computing PRF e (α) x-Tuples Summery of Other Results  Approximating Ranking Functions  Experiments 9Copyright© 2010 by CEBTCenter for E-Business Technology

Parameterized Ranking Function  Weight Function  Parameterized Ranking Function (PRF)  Return k tuples with the highest values. 10Copyright© 2010 by CEBTCenter for E-Business Technology

Parameterized Ranking Function (cont’d)  ω(t,i)= 1 : Rank the tuples by probabilities  ω(t,i)=score(t): E-Score  PRF e (α): ω(i)= α i where α can be a real or a complex number  PRF ω (h): Generalizes PT/GT-k and U-Rank 11Copyright© 2010 by CEBTCenter for E-Business Technology

Computing PRF: Independent Tuples 12Copyright© 2010 by CEBTCenter for E-Business Technology

Computing PRF: Independent Tuples (cont’d) 13Copyright© 2010 by CEBTCenter for E-Business Technology

Computing PRF: Independent Tuples (cont’d) 14Copyright© 2010 by CEBTCenter for E-Business Technology

Computing PRF e (α): Independent Tuples     15Copyright© 2010 by CEBTCenter for E-Business Technology

Summary of Other Results  PRF computation for probabilistic and/xor trees Generalizes x-tuplesby allowing both mutual exclusivity and coexistence PRF computation : O(n3) PRF e computation : O(nlogn+nd) (d = height of the tree)  PRF computation on graphical models A polynomial time algorithm when the junction tree has bounded treewidth A nontrivial dynamic program combined with the generating function method. 16Copyright© 2010 by CEBTCenter for E-Business Technology

Approximating Ranking Functions 17Copyright© 2010 by CEBTCenter for E-Business Technology

Experiments: Approximation Ability 18Copyright© 2010 by CEBTCenter for E-Business Technology

Experiments: Execution Time 19Copyright© 2010 by CEBTCenter for E-Business Technology

Conclusion  Proposed a unifying framework for ranking over probabilistic databases through: Parameterized ranking functions Incorporation of user feedback  Designed highly efficient algorithms for computing PRF and PRF e  Developed novel approximation techniques for approximating PRF w with PRF e 20Copyright© 2010 by CEBTCenter for E-Business Technology