A Course on Probabilistic Databases

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization Christopher Re and Dan Suciu University of Washington 1.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter
Representing and Querying Correlated Tuples in Probabilistic Databases
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
Top-K Query Evaluation on Probabilistic Data Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington.
A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Efficient Query Evaluation on Probabilistic Databases
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 13: Incorporating Uncertainty into Data Integration PRINCIPLES OF DATA INTEGRATION.
1 Probabilistic/Uncertain Data Management -- III Slides based on the Suciu/Dalvi SIGMOD’05 tutorial 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.
MystiQ The HusQies* *Nilesh Dalvi, Brian Harris, Chris Re, Dan Suciu University of Washington.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Probabilistic/Uncertain Data Management -- IV 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic databases”, VLDB’ Sen, Deshpande. “Representing.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
1 CS 430 Database Theory Winter 2005 Lecture 6: Relational Calculus.
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
IST 210 The Relational Language Todd S. Bacastow January 2004.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
A Semantic Caching Method Based on Linear Constraints Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
Parallel Evaluation of Conjunctive Queries Paraschos Koutris and Dan Suciu University of Washington PODS 2011, Athens.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Modified by Veeranjaneyulu Sadhanala.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Extensions of Datalog Wednesday, February 13, 2001.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 Finite Model Theory Lecture 9 Logics and Complexity Classes (cont’d)
1 Relational Algebra. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational model supports.
Learning Bayesian Networks for Complex Relational Data
Relational Calculus Chapter 4, Section 4.3.
Relational Algebra & Calculus
Chapter 2: Database System Concepts and Architecture - Outline
Relational Calculus Chapter 4, Part B
Relational Algebra Chapter 4, Part A
Queries with Difference on Probabilistic Databases
CS405G: Introduction to Database Systems
Database Applications (15-415) Relational Calculus Lecture 6, September 6, 2016 Mohammad Hammoud.
Lecture 16: Probabilistic Databases
Week #2 – 4/6 September 2002 Prof. Marie desJardins
Overview of Query Evaluation
Probabilistic Databases
Datalog Inspired by the impedance mismatch in relational databases.
Relational Algebra & Calculus
Relational Calculus Chapter 4, Part B 7/1/2019.
Probabilistic Databases with MarkoViews
Representations & Reasoning Systems (RRS) (2.2)
Relational Calculus Chapter 4, Part B
Presentation transcript:

A Course on Probabilistic Databases June, 2014 Probabilistic Databases - Dan Suciu A Course on Probabilistic Databases Dan Suciu University of Washington

Outline Motivating Applications The Probabilistic Data Model Chapter 2 June, 2014 Probabilistic Databases - Dan Suciu Outline Part 1 Motivating Applications The Probabilistic Data Model Chapter 2 Extensional Query Plans Chapter 4.2 The Complexity of Query Evaluation Chapter 3 Extensional Evaluation Chapter 4.1 Intensional Evaluation Chapter 5 Conclusions Part 2 Part 3 Part 4

Probabilistic Data Model: Outline June, 2014 Probabilistic Databases - Dan Suciu Probabilistic Data Model: Outline Review: Relational Data Model Incomplete Databases Probabilistic Databases (PDB) Query semantics Block Independent Disjoint

Review: Relational Data Model June, 2014 Probabilistic Databases - Dan Suciu Review: Relational Data Model Owner Location Data: stored in relations (= tables) Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18

Review: Relational Data Model June, 2014 Probabilistic Databases - Dan Suciu Review: Relational Data Model Owner Location Data: stored in relations (= tables) Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Queries: SQL, Find all owners of objects in the Office -- SQL: e.g. postgres SELECT DISTINCT Owner.name FROM Owner, Location WHERE Owner.object = Location.object and Location.loc = ‘Office’

Review: Relational Data Model June, 2014 Probabilistic Databases - Dan Suciu Review: Relational Data Model Owner Location Data: stored in relations (= tables) Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Queries: SQL, Unions of Conjunctive Queries Find all owners of objects in the Office -- SQL: e.g. postgres SELECT DISTINCT Owner.name FROM Owner, Location WHERE Owner.object = Location.object and Location.loc = ‘Office’ Q(z) = Owner(z,x), Location(x,t,y),y=‘Office’ Note that x,t, are existentially quantified: Q(z) = ∃x ∃t (Owner(z,x), Location(x,t,’Office’))

Review: Relational Data Model June, 2014 Probabilistic Databases - Dan Suciu Review: Relational Data Model Owner Location Data: stored in relations (= tables) Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Queries: SQL, Unions of Conjunctive Queries Find all owners of objects in the Office -- SQL: e.g. postgres SELECT DISTINCT Owner.name FROM Owner, Location WHERE Owner.object = Location.object and Location.loc = ‘Office’ Q(z) = Owner(z,x), Location(x,t,y),y=‘Office’ Note that x,t, are existentially quantified: Q(z) = ∃x ∃t (Owner(z,x), Location(x,t,’Office’)) Name Joe Jim Query answer: Q=

Review: Relational Data Model June, 2014 Probabilistic Databases - Dan Suciu Review: Relational Data Model Owner Location Data: stored in relations (= tables) Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Queries: SQL, Unions of Conjunctive Queries Find all owners of objects in the Office -- SQL: e.g. postgres SELECT DISTINCT Owner.name FROM Owner, Location WHERE Owner.object = Location.object and Location.loc = ‘Office’ Q(z) = Owner(z,x), Location(x,t,y),y=‘Office’ Note that x,t, are existentially quantified: Q(z) = ∃x ∃t (Owner(z,x), Location(x,t,’Office’)) Name Joe Jim Query answer: Q=

Review: Complexity of Query Evaluation Query Q, database D Data complexity: fix Q, complexity = f(D) Query complexity: fix D, complexity = f(Q) Combined complexity: complexity = f(D,Q) This course Moshe Vardi Data complexity is the right metric for query language, and this is what I will discuss today. This is what allows database systems to scale to large data sets. ALL query languages that exists today in systems have PTIME data complexity: SQL, datalog, datalog with negation, xquery, sparql, etc etc. So this is the lens through which we will study query evaluation on probabilistic databases. Data complexity is unique to database research

June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database Definition An Incomplete Database is a finite set of database instances W = (W1, W2, …, Wn) Each Wi is called a possible world

June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database Definition An Incomplete Database is a finite set of database instances W = (W1, W2, …, Wn) Each Wi is called a possible world W1 W2 W3 W4 Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18

June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database Definition An Incomplete Database is a finite set of database instances W = (W1, W2, …, Wn) Each Wi is called a possible world W1 W2 W3 W4 Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office

June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database Definition An Incomplete Database is a finite set of database instances W = (W1, W2, …, Wn) Each Wi is called a possible world W1 W2 W3 W4 Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Jim Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18

Incomplete Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database: Query Semantics Definition Given query Q, incomplete database W: An answer t is certain, if ∀Wi, t ∈Q(Wi) An answer t is possible if ∃Wi, t ∈Q(Wi)

Incomplete Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database: Query Semantics Definition Given query Q, incomplete database W: An answer t is certain, if ∀Wi, t ∈Q(Wi) An answer t is possible if ∃Wi, t ∈Q(Wi) Q(z) = Owner(z,x), Location(x,t,’Office’) W1 W2 W3 W4 Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Joe Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18

Incomplete Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database: Query Semantics Definition Given query Q, incomplete database W: An answer t is certain, if ∀Wi, t ∈Q(Wi) An answer t is possible if ∃Wi, t ∈Q(Wi) Q(z) = Owner(z,x), Location(x,t,’Office’) W1 W2 W3 W4 Q= Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Joe Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Joe Jim Joe Joe Joe Jim

Incomplete Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Incomplete Database: Query Semantics Definition Given query Q, incomplete database W: An answer t is certain, if ∀Wi, t ∈Q(Wi) An answer t is possible if ∃Wi, t ∈Q(Wi) Q(z) = Owner(z,x), Location(x,t,’Office’) Certain answers to Q: Joe Possible answers to Q: Joe, Jim W1 W2 W3 W4 Q= Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Joe Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Joe Jim Joe Joe Joe Jim

Probabilistic Database June, 2014 Probabilistic Databases - Dan Suciu Probabilistic Database Definition A Probabilistic Database is (W, P), where W is an incomplete database, and P: W  [0,1] a probability distribution: Σi=1,n P(Wi) = 1

Probabilistic Database June, 2014 Probabilistic Databases - Dan Suciu Probabilistic Database Definition A Probabilistic Database is (W, P), where W is an incomplete database, and P: W  [0,1] a probability distribution: Σi=1,n P(Wi) = 1 W1 W2 W3 W4 0.3 0.4 0.2 0.1 Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Jim Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18

Probabilistic Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Probabilistic Database: Query Semantics Definition Given query Q, probabilistic database (W,P): The marginal probability of an answer t is: P(t) = Σ { P(Wi) | Wi ∈ W, t ∈Q(Wi) }

Probabilistic Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Probabilistic Database: Query Semantics Definition Given query Q, probabilistic database (W,P): The marginal probability of an answer t is: P(t) = Σ { P(Wi) | Wi ∈ W, t ∈Q(Wi) } Q(z) = Owner(z,x), Location(x,t,’Office’) W1 W2 W3 W4 0.3 0.4 0.2 0.1 Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Jim Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18

Probabilistic Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Probabilistic Database: Query Semantics Definition Given query Q, probabilistic database (W,P): The marginal probability of an answer t is: P(t) = Σ { P(Wi) | Wi ∈ W, t ∈Q(Wi) } Q(z) = Owner(z,x), Location(x,t,’Office’) W1 W2 W3 W4 0.3 0.4 0.2 0.1 Q= Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Joe Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Joe Jim Joe Joe Joe Jim

Probabilistic Database: Query Semantics June, 2014 Probabilistic Databases - Dan Suciu Probabilistic Database: Query Semantics Definition Given query Q, probabilistic database (W,P): The marginal probability of an answer t is: P(t) = Σ { P(Wi) | Wi ∈ W, t ∈Q(Wi) } Q(z) = Owner(z,x), Location(x,t,’Office’) P(Joe) = 1.0 P(Jim) = 0.4 W1 W2 W3 W4 0.3 0.4 0.2 0.1 Q= Owner Owner Owner Owner Name Object Joe Book302 Laptop77 Jim Fred GgleGlass Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Name Object Joe Laptop77 Name Object Joe Book302 Jim Laptop77 Fred GgleGlass Location Location Location Location Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Object Time Loc Book302 8:18 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Object Time Loc Laptop77 5:07 Hall 9:05 Office Book302 8:18 Joe Jim Joe Joe Joe Jim

June, 2014 Probabilistic Databases - Dan Suciu Discussion Intuition: a probabilistic database says that the database can be in one of possible states, each with a probability Possible query answers: a set of answers annotated with probabilities: (t1, p1), (t2, p2), (t3, p3), … Usually: p1 ≥ p2 ≥ p3 ≥ … Problem: the number of possible world in a probabilistic database is astronomically large. To represent it, we impose some restrictions

Independent, Disjoint Tuples June, 2014 Probabilistic Databases - Dan Suciu Independent, Disjoint Tuples Definition Given a probabilistic database (W, P). Two tuples t1, t2 are called: Independent, if: P(t1 t2) = P(t1) P(t2) Disjoint (or exclusive), if: P(t1t2) = 0

Independent, Disjoint Tuples June, 2014 Probabilistic Databases - Dan Suciu Independent, Disjoint Tuples Definition Given a probabilistic database (W, P). Two tuples t1, t2 are called: Independent, if: P(t1 t2) = P(t1) P(t2) Disjoint (or exclusive), if: P(t1t2) = 0 Definition A probabilistic database is called Block-Independent-Disjoint (BID), if its tuples are grouped into blocks such that: Tuples from the same block are disjoint Tuples from different blocks are independent

Example: BID Table W={ } Object Time Loc P Laptop77 9:07 Rm444 p1 Hall June, 2014 Probabilistic Databases - Dan Suciu Example: BID Table BID Table Object Time Loc P Laptop77 9:07 Rm444 p1 Hall p2 Book302 9:18 Office p3 p4 Lift p5 Disjoint Inde- pen- dent Disjoint W={ Object Time Loc Laptop77 9:07 Rm444 Book302 9:18 Office Object Time Loc Laptop77 9:07 Rm444 Book302 9:18 } Possible worlds Object Time Loc Laptop77 9:07 Rm444 Book302 9:18 Lift Object Time Loc Laptop77 9:07 Hall Book302 9:18 Office Object Time Loc Laptop77 9:07 Hall Book302 9:18 Rm444 Object Time Loc Laptop77 9:07 Hall Book302 9:18 Lift p1p3 Object Time Loc Laptop77 9:07 Rm444 p1p4 Object Time Loc Laptop77 9:07 Hall Object Time Loc Book302 9:18 Office Object Time Loc Book302 9:18 Rm444 Object Time Loc Book302 9:18 Lift p1(1- p3-p4-p5) Object Time Loc

The Query Evaluation Problem June, 2014 Probabilistic Databases - Dan Suciu The Query Evaluation Problem Given: a BID database D, a query Q, and output tuple t Compute: P(t) Note: D has, say, 1000000 tuples, while the number of possible worlds is 21000000 Challenge: compute P(t) efficiently, in the size of D Data complexity: the complexity of P depends dramatically on Q

An Example P(Q) = 1- {1- } * p1*[ ] 1-(1-q1)*(1-q2) {1- } p2*[ ] June, 2014 Probabilistic Databases - Dan Suciu An Example Boolean query SELECT DISTINCT ‘true’ FROM R, S WHERE R.x = S.x Q() = R(x), S(x,y) P(Q) = 1- {1- } * p1*[ ] 1-(1-q1)*(1-q2) {1- } p2*[ ] 1-(1-q3)*(1-q4)*(1-q5) One can compute P(Q) in PTIME in the size of the database D S x y P a1 b1 q1 b2 q2 a2 b3 q3 b4 q4 b5 q5 R x P a1 p1 a2 p2 a3 p3

Summary of the Probabilistic Data Model June, 2014 Probabilistic Databases - Dan Suciu Summary of the Probabilistic Data Model Possible Worlds Semantics: very powerful but difficult to represent Block-Independent-Disjoint databases have efficient representation: D is stored in a traditional database Independent Databases: even simpler Main challenge: evaluate Q efficiently in the size of D This tutorial