1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab.

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
Top-K Query Evaluation on Probabilistic Data Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 13: Incorporating Uncertainty into Data Integration PRINCIPLES OF DATA INTEGRATION.
Uncertainty Lineage Data Bases Very Large Data Bases
Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal, Jian Li & Amol Deshpande.
PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.
Data Management for XML: Research Directions By: Jennifer Widom Stanford University Reviewer: Kristin Streilein.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.
Chapter 5 Normalization Transparencies © Pearson Education Limited 1995, 2005.
Representation Formalisms for Uncertain Data Jennifer Widom with Anish Das Sarma Omar Benjelloun Alon Halevy Trio and other participants in the Trio Project.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Relational Calculus R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion. Ronald Graham Elements of Ramsey.
Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom Stanford University.
ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.
CS609 Introduction. Databases Current state? Future?
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Normalization Transparencies
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
Chapter 10 Normalization Pearson Education © 2009.
Indexes and Views Unit 7.
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 9 A First Course in Database Systems.
Good Papers and Good Research Jennifer Widom Stanford University Shamelessly drawn from Research Principles Revealed “Research Principles Revealed” Codd.
Jennifer Widom Relational Databases The Relational Model.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Datab ase Systems Week 1 by Zohaib Jan.
The Object-Oriented Database System Manifesto
Advanced Database System
STRUCTURE OF PRESENTATION :
Approximate Lineage for Probabilistic Databases
TRIO Data Uncertainty Lineage Data Model Query Language System
Trio A System for Data, Uncertainty, and Lineage
Relational Algebra Chapter 4, Part A
Translation of ER-diagram into Relational Schema
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Chapter 3 The Relational Database Model
Probabilistic Data Management
Data Integration with Dependent Sources
Database.
Lecture 16: Probabilistic Databases
Relational Algebra 1.
Lec 3: Object-Oriented Data Modeling
Trio A System for Integrated Management of Data, Accuracy, and Lineage
Normalization Dale-Marie Wilson, Ph.D..
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
The Trio System for Data, Uncertainty, and Lineage: Overview and Demo
Research on Personal Dataspace Management
CS 186, Fall 2002, Lecture 8 R&G, Chapter 4
Probabilistic Databases
國立臺北科技大學 課程:資料庫系統 2015 fall Chapter 14 Normalization.
On Provenance of Queries on Linked Web Data
Relational Model B.Ramamurthy 5/28/2019 B.Ramamurthy.
Deniz Beser A Fundamental Tradeoff in Knowledge Representation and Reasoning Hector J. Levesque and Ronald J. Brachman.
Relational Calculus Chapter 4, Part B
Presentation transcript:

1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab

2 Introduction Trio Project: We’re building a new kind of DBMS in which: 1.Data 2.Uncertainty 3.Lineage are all first-class interrelated concepts Motivating applications –Scientific and sensor databases –Data cleaning and integration –Information extraction –And others…

3 Introduction (contd.) Started by investigating the uncertainty component Most of this talk: uncertainty Toward the end: uncertainty + lineage

4 Models for Uncertainty 20+ years of work (mostly theoretical) Our goal: intuitive and simple (i.e., usable system) yet expressive enough Appears to be fundamental trade-off between expressiveness & intuitiveness

5 Space of Uncertainty Models Model #1 What else is in this space? “Model-1” C-Tables [IL84] + Expressive ̶ Nonintuitive + Simple, intuitive ̶ Incomplete, not even closed

6 Next in the Talk Model-1 Completeness and closure

7 Model-1 1. Or-sets 2. Maybe-tuples (denoted “?”) PersonDay AliceMonday Bob{Monday,Tuesday} ? ICDE-Attendees

8 Formal Semantics Definition: An uncertain database represents a set of possible (certain) databases –a.k.a. “possible worlds” “possible instances” PersonsDay AliceMonday BobMonday PersonDay AliceMonday Bob{Monday, Tuesday} ? Three possible instances PersonsDay AliceMonday BobTuesday PersonsDay AliceMonday

9 Completeness and Closure Completeness: A model M is complete if every finite set of possible instances can be represented in M Closure: A model M is closed under an operation Op if the result of Op on M can be represented in M

10 Incompleteness of Model-1 TuesdayBob Monday day Alice person day AliceMonday Instance1 Instance2 personday BobTuesday Instance3 TuesdayBob Monday day Alice person ? ? generates 4 th instance: empty relation

11 Closure Easy and natural (re)definition for any standard database operation Op Closure: up-arrow always exists Completeness  Closure Note: Completeness  Closure D I 1, I 2, …, I n J 1, J 2, …, J m D′D′ possible instances Op on each instance rep. of instances Op′ direct implementation

12 Non-closure of Model-1 { Monday,Tuesday }Alice dayperson dayactivity MondayReception TuesdayBanquet ⋈ Result has two possible instances: Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2 Not representable with or-sets and ?

13 Completeness vs. Closure All sets-of-instances Representable sets-of-instances Op1 Op2 Completeness: inner=outer Closure: arrow stays in inner

14 Rest of Talk More models Their properties and relationships “Back to the future”

15 C-Tables [Imielinski, Lipski 84] Tuples + Variables + Conditions C-Tables are complete (and hence, closed) But, free variables make it nonintuitive for the casual user (X=0) MondayBob TuesdayBob Monday Day Alice Person Tuple-Condition (X≠0) AND (Y=1)

16 Space of Uncertainty Models Model #1 What else is in this space? Model-1 C-Tables [IL84] + Expressive ̶ Nonintuitive + Simple, intuitive ̶ Incomplete, not even closed Approach: What’s missing in Model-1?

17 Revisit Join Example { Monday,Tuesday }Alice dayperson dayactivity MondayReception TuesdayBanquet ⋈ Result has two possible instances: Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2 Not representable with or-sets and ?

18 Need Exclusive-OR Two possible instances: Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2 XOR on tuples Representable with XOR on tuples persondayActivity Alice Monday Reception AliceTuesdayBanquet t1t1 t2t2 Constraint over tuples: t 1 XOR t 2

19 Another Example { Monday,Tuesday }Alice dayperson dayactivity TuesdayBanquet TuesdayBoat Trip ⋈ Result has two possible instances: BanquetTuesdayAlice person Boat Trip activity Tuesday day Instance1 persondayactivity Instance 2 Again not representable

20 Need Iff IFF on tuples Representable with IFF on tuples persondayActivity AliceTuesdayBanquet AliceTuesdayBoat Trip t1t1 t2t2 BanquetTuesdayAlice person Boat Trip activity Tuesday day Instance1 persondayActivity Instance 2 Constraint over tuples: t 1  t 2

21 Model-2 (Family) Constructs: –Or-sets –Maybe annotation, ‘?’ –Boolean constraints over tuples Constraints  Completeness? –Full propositional logic: YES –XOR and IFF: NO –General 2-clauses: NO How about “tuple-sets”?

22 Model-3 Tuple-sets (Alice, Monday) (Bob,Monday) (Bob,Tuesday) (person,day) Complete?NO: IFF still not expressible

23 Hierarchy of Models R relations A or-sets ? maybe-tuples 2 2-clauses prop Full propositional logic sets tuple-sets R ? A Model-1 R prop A Complete R sets Model-3

24 Closure May Be Good Enough Completeness may not be necessary –Original data representable in model –Only restricted operations performed Which models are closed under which operations?

25 Closure Table

26 Model Transition Diagram Not shown: (1)Self-loops (2)Subsumed arrows

27 Not Covered from Paper Membership Problems Given tuple t and uncertain relation R, is t in any instance of R? (and 3 other problems) Approximation How best can we approximate an M1 relation in M2? M1 M2 R I 1, I 2, …, I n possible instances t?

28 Back to the Future Trio Project Unleashed CIDR Jan ‘05 ICDE Submission June ‘05 Now April ‘06 Studying Theory + Modeling … Time ULDBs Query Processing Prototype Implementation ULDBs Query Processing Prototype Implementation

29 Lineage to the Rescue lineage uncertainty

30 ULDBs: Uncertainty-Lineage Databases (person,day) (Alice,Monday) (Alice,Tuesday) (day,activity) (Monday,Reception) (Tuesday,Banquet) ⋈ (Alice,Monday,Reception) (Alice,Tuesday,Banquet) (person,day,activity) ? ? Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2

31 Marrying Lineage and Uncertainty [see new papers] Model-3 (tuple-sets) + Lineage = Completeness Relational operations performed naturally Easy extension to confidences (probabilities) –with efficient query processing

32 The Trio Project Data Model –ULDBs Query Language –TriQL: Simple extension to SQL –Ability to query confidences and lineage System (version 1) –On top of conventional DBMS

33 Related Work (Uncertainty, brief) Modeling –C-tables [IL84], Probabilistic Databases [CP87], using Nested Relations [F90] –And lots lots more Systems –ProbView [LLRS97], MYSTIQ [BDM+05], ORION [CSP05], Trio [BDHW05]

34 Trio Current and Future Topics Data Model –Continuous uncertainty, incomplete relations, correlations Query Processing –Updates, top-K, confidence computations System –Storage, indexes, statistics, query optimization, …

35 Thank You Search “stanford trio”