Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab.

Similar presentations


Presentation on theme: "1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab."— Presentation transcript:

1 1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab

2 2 Introduction Trio Project: We’re building a new kind of DBMS in which: 1.Data 2.Uncertainty 3.Lineage are all first-class interrelated concepts Motivating applications –Scientific and sensor databases –Data cleaning and integration –Information extraction –And others…

3 3 Introduction (contd.) Started by investigating the uncertainty component Most of this talk: uncertainty Toward the end: uncertainty + lineage

4 4 Models for Uncertainty 20+ years of work (mostly theoretical) Our goal: intuitive and simple (i.e., usable system) yet expressive enough Appears to be fundamental trade-off between expressiveness & intuitiveness

5 5 Space of Uncertainty Models Model #1 What else is in this space? “Model-1” C-Tables [IL84] + Expressive ̶ Nonintuitive + Simple, intuitive ̶ Incomplete, not even closed

6 6 Next in the Talk Model-1 Completeness and closure

7 7 Model-1 1. Or-sets 2. Maybe-tuples (denoted “?”) PersonDay AliceMonday Bob{Monday,Tuesday} ? ICDE-Attendees

8 8 Formal Semantics Definition: An uncertain database represents a set of possible (certain) databases –a.k.a. “possible worlds” “possible instances” PersonsDay AliceMonday BobMonday PersonDay AliceMonday Bob{Monday, Tuesday} ? Three possible instances PersonsDay AliceMonday BobTuesday PersonsDay AliceMonday

9 9 Completeness and Closure Completeness: A model M is complete if every finite set of possible instances can be represented in M Closure: A model M is closed under an operation Op if the result of Op on M can be represented in M

10 10 Incompleteness of Model-1 TuesdayBob Monday day Alice person day AliceMonday Instance1 Instance2 personday BobTuesday Instance3 TuesdayBob Monday day Alice person ? ? generates 4 th instance: empty relation

11 11 Closure Easy and natural (re)definition for any standard database operation Op Closure: up-arrow always exists Completeness  Closure Note: Completeness  Closure D I 1, I 2, …, I n J 1, J 2, …, J m D′D′ possible instances Op on each instance rep. of instances Op′ direct implementation

12 12 Non-closure of Model-1 { Monday,Tuesday }Alice dayperson dayactivity MondayReception TuesdayBanquet ⋈ Result has two possible instances: Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2 Not representable with or-sets and ?

13 13 Completeness vs. Closure All sets-of-instances Representable sets-of-instances Op1 Op2 Completeness: inner=outer Closure: arrow stays in inner

14 14 Rest of Talk More models Their properties and relationships “Back to the future”

15 15 C-Tables [Imielinski, Lipski 84] Tuples + Variables + Conditions C-Tables are complete (and hence, closed) But, free variables make it nonintuitive for the casual user (X=0) MondayBob TuesdayBob Monday Day Alice Person Tuple-Condition (X≠0) AND (Y=1)

16 16 Space of Uncertainty Models Model #1 What else is in this space? Model-1 C-Tables [IL84] + Expressive ̶ Nonintuitive + Simple, intuitive ̶ Incomplete, not even closed Approach: What’s missing in Model-1?

17 17 Revisit Join Example { Monday,Tuesday }Alice dayperson dayactivity MondayReception TuesdayBanquet ⋈ Result has two possible instances: Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2 Not representable with or-sets and ?

18 18 Need Exclusive-OR Two possible instances: Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2 XOR on tuples Representable with XOR on tuples persondayActivity Alice Monday Reception AliceTuesdayBanquet t1t1 t2t2 Constraint over tuples: t 1 XOR t 2

19 19 Another Example { Monday,Tuesday }Alice dayperson dayactivity TuesdayBanquet TuesdayBoat Trip ⋈ Result has two possible instances: BanquetTuesdayAlice person Boat Trip activity Tuesday day Instance1 persondayactivity Instance 2 Again not representable

20 20 Need Iff IFF on tuples Representable with IFF on tuples persondayActivity AliceTuesdayBanquet AliceTuesdayBoat Trip t1t1 t2t2 BanquetTuesdayAlice person Boat Trip activity Tuesday day Instance1 persondayActivity Instance 2 Constraint over tuples: t 1  t 2

21 21 Model-2 (Family) Constructs: –Or-sets –Maybe annotation, ‘?’ –Boolean constraints over tuples Constraints  Completeness? –Full propositional logic: YES –XOR and IFF: NO –General 2-clauses: NO How about “tuple-sets”?

22 22 Model-3 Tuple-sets (Alice, Monday) (Bob,Monday) (Bob,Tuesday) (person,day) Complete?NO: IFF still not expressible

23 23 Hierarchy of Models R relations A or-sets ? maybe-tuples 2 2-clauses prop Full propositional logic sets tuple-sets R ? A Model-1 R prop A Complete R sets Model-3

24 24 Closure May Be Good Enough Completeness may not be necessary –Original data representable in model –Only restricted operations performed Which models are closed under which operations?

25 25 Closure Table

26 26 Model Transition Diagram Not shown: (1)Self-loops (2)Subsumed arrows

27 27 Not Covered from Paper Membership Problems Given tuple t and uncertain relation R, is t in any instance of R? (and 3 other problems) Approximation How best can we approximate an M1 relation in M2? M1 M2 R I 1, I 2, …, I n possible instances t?

28 28 Back to the Future Trio Project Unleashed CIDR Jan ‘05 ICDE Submission June ‘05 Now April ‘06 Studying Theory + Modeling … Time ULDBs Query Processing Prototype Implementation ULDBs Query Processing Prototype Implementation

29 29 Lineage to the Rescue lineage uncertainty

30 30 ULDBs: Uncertainty-Lineage Databases (person,day) (Alice,Monday) (Alice,Tuesday) (day,activity) (Monday,Reception) (Tuesday,Banquet) ⋈ (Alice,Monday,Reception) (Alice,Tuesday,Banquet) (person,day,activity) ? ? Alice person Reception activity Monday day Instance1 persondayActivity AliceTuesdayBanquet Instance 2

31 31 Marrying Lineage and Uncertainty [see new papers] Model-3 (tuple-sets) + Lineage = Completeness Relational operations performed naturally Easy extension to confidences (probabilities) –with efficient query processing

32 32 The Trio Project Data Model –ULDBs Query Language –TriQL: Simple extension to SQL –Ability to query confidences and lineage System (version 1) –On top of conventional DBMS

33 33 Related Work (Uncertainty, brief) Modeling –C-tables [IL84], Probabilistic Databases [CP87], using Nested Relations [F90] –And lots lots more Systems –ProbView [LLRS97], MYSTIQ [BDM+05], ORION [CSP05], Trio [BDHW05]

34 34 Trio Current and Future Topics Data Model –Continuous uncertainty, incomplete relations, correlations Query Processing –Updates, top-K, confidence computations System –Storage, indexes, statistics, query optimization, …

35 35 Thank You Search “stanford trio”


Download ppt "1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab."

Similar presentations


Ads by Google