Lecture 16: Probabilistic Databases

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
n-ary Relations and Their Applications
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization Christopher Re and Dan Suciu University of Washington 1.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
Representing and Querying Correlated Tuples in Probabilistic Databases
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington.
“Lineage/Provenance” Workgroup Report Birgitta, Amol, Ihab, Thomas, Anish, Martin, Matthias.
Top-K Query Evaluation on Probabilistic Data Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 13: Incorporating Uncertainty into Data Integration PRINCIPLES OF DATA INTEGRATION.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Presenter : Amit Goyal Discussion Lead : Jonatan.
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
1 Probabilistic/Uncertain Data Management -- III Slides based on the Suciu/Dalvi SIGMOD’05 tutorial 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic.
Ch1: File Systems and Databases Hachim Haddouti
1 Probabilistic/Uncertain Data Management Slides based on the Suciu/Dalvi SIGMOD’05 tutorial 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic.
MystiQ The HusQies* *Nilesh Dalvi, Brian Harris, Chris Re, Dan Suciu University of Washington.
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Presented by Xi Zhang Feburary 8 th, 2008.
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
1 Probabilistic/Uncertain Data Management -- IV 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic databases”, VLDB’ Sen, Deshpande. “Representing.
09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Lowell 2003 Challenges Alon Y. Halevy University of Washington.
Introduction to Database Systems Motivation Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Model Based DSS Creating Information Under Conditions of Uncertainty and Complexity Interface MODEL BASE DATA BASE MBMSDBMS DATA WAREHOUSING ON LINE ANALYTICAL.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
MIS 327 Database Management system 1 MIS 327: DBMS Dr. Monther Tarawneh Dr. Monther Tarawneh Week 2: Basic Concepts.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Hippo a System for Computing Consistent Query Answers to a Class of SQL Queries Jan Chomicki University at Buffalo Jerzy Marcinkowski Wroclaw University.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Query Processing over Incomplete Autonomous Databases Presented By Garrett Wolf, Hemal Khatri, Bhaumik Chokshi, Jianchun Fan, Yi Chen, Subbarao Kambhampati.
Database Architecture Models and Design Ian Horrocks & Robert Stevens room: 2.75/2.91
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
1 Information Retrieval LECTURE 1 : Introduction.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Lecture 03 Constraints. Example Schema CONSTRAINTS.
Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Modified by Veeranjaneyulu Sadhanala.
Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International University Gerhard Weikum, MPI Informatik.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Supporting Ranking and Clustering as Generalized Order-By and Group-By
A Course on Probabilistic Databases
Probabilistic Data Management
مدیریت اطلاعات و داده های سازمان یافته
Queries with Difference on Probabilistic Databases
February 7th – Exam Review
Lecture 12: Data Wrangling
DBMS with probabilistic model
مدیریت اطلاعات و داده های سازمان یافته
Probabilistic Databases
Probabilistic Databases with MarkoViews
Presentation transcript:

Lecture 16: Probabilistic Databases Slides by Gerome Miklau Based on a tutorial by Dan Suciu

Today’s Agenda Motivation Probabilistic Data Semantics Representation Systems Complexity

Section 1 1. Motivation

Motivating Applications Section 1 Motivating Applications Text extraction & record linkage Inconsistent data Ranking query answers

Section 1 Text extraction

Section 1 Record Linkage

Section 1 Inconsistent Data Goal: consistent query answers from inconsistent databases Applications: Integration of autonomous data sources Un-enforced integrity constraints Temporary inconsistencies

Section 1 Repair semantics

Alternative probabilistic approach Section 1 Alternative probabilistic approach

Ranking query answers Database is deterministic Section 1 Ranking query answers Database is deterministic Query answers are uncertain: Query terms loosened due to user’s lack of understanding of the data or schema The query returns a ranked list of tuples; user interested in top-k

Summary: motivating applications Section 1 Summary: motivating applications

2. Probabilistic Data Semantics Section 2 2. Probabilistic Data Semantics

Possible worlds semantics Section 2 Possible worlds semantics

Section 2 The definition

Section 2 Example

Section 2 Tuples as Events

Section 2 Tuple correlation

Section 2 Example

Section 2 Query semantics

Section 2 Query semantics

Example: Query Semantics Section 2 Example: Query Semantics

Section 2 Query semantics

3. Representation Systems Section 3 3. Representation Systems

Representation systems Section 3 Representation systems

Representation systems Section 3 Representation systems

Tuple independent probabilistic database Section 3 Tuple independent probabilistic database

Tuple Prob. -> Possible Worlds Section 3 Tuple Prob. -> Possible Worlds

Tuple Prob. -> Query evaluation Section 3 Tuple Prob. -> Query evaluation

Tuple-independent distributions Section 3 Tuple-independent distributions

Section 3 Intensional database

Intensional DB => Possible Worlds Section 3 Intensional DB => Possible Worlds

Possible Worlds => Intensional DB Section 3 Possible Worlds => Intensional DB

Closure under operators Section 3 Closure under operators

Summary of Intensional Databases Section 3 Summary of Intensional Databases

Section 4 4. Complexity

Probability of boolean expressions Section 4 Probability of boolean expressions

Section 4 Example

Complexity of Boolean Expression Probability Section 4 Complexity of Boolean Expression Probability

Section 4 Query complexity

Intensional query evaluation Section 4 Intensional query evaluation

Extensional query evaluation Section 4 Extensional query evaluation

Section 4

Section 4 Query complexity

Summary on query complexity Section 4 Summary on query complexity