Download presentation
Presentation is loading. Please wait.
Published byXander Burgett Modified over 10 years ago
1
LIVE A lineage-supported, versioned DBMS Anish Das Sarma Martin Theobald Jennifer Widom
2
ULDB Data Model and the Trio System Uncertainty & Lineage LIVE Data Model (LDM) Uncertainty, Lineage & Versioning Data Modifications Insert/Delete Tuples, Update Values, Update Confidences Query Evaluation Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity Experiments/Conclusions Agenda 27.03.2015 2 LIVE - A lineage-supported, versioned DBMS
3
ULDB Data Model 27.03.2015 3 LIVE - A lineage-supported, versioned DBMS Different types of uncertainty: 1. Tuple Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences Implementation of the ULDB data model: Trio System TriQL query language TrioExplorer browser frontend, trioplus client, API Enhanced PostgreSQL backend (SPI) Search for “Stanford Trio”
4
ULDBs – Alternatives 27.03.2015 4 LIVE - A lineage-supported, versioned DBMS 1. Alternatives: uncertainty about attribute values 2. ‘?’ (Maybe) Annotations 3. Confidences Saw (witness, color, car) Amy red, Honda ∥ red, Toyota ∥ orange, Mazda Three possible worlds
5
ULDBs – Maybe Annotations 27.03.2015 5 LIVE - A lineage-supported, versioned DBMS Six possible worlds 1. Alternatives 2. ‘?’ (Maybe): uncertainty about tuple presence 3. Confidences ? Saw (witness, color, car) Amy red, Honda ∥ red, Toyota ∥ orange, Mazda Bettyblue, Acura
6
ULDBs – Confidences 27.03.2015 6 LIVE - A lineage-supported, versioned DBMS 1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences: weighted uncertainty Six possible worlds, each with a probability ? Saw (witness, color, car) Amy red, Honda 0.5 ∥ red, Toyota 0.3 ∥ orange, Mazda 0.2 Betty blue, Acura 0.6
7
ULDBs – Closure 27.03.2015 7 LIVE - A lineage-supported, versioned DBMS Saw (witness, car) Cathy Mazda ∥ Honda Drives (person, car) Jimmy, Toyota ∥ Jimmy, Mazda Billy, Honda ∥ Frank, Honda Hank, Honda Suspects Jimmy Billy ∥ Frank Hank Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible worlds in the result! CANNOT
8
ULDBs – Lineage 27.03.2015 8 LIVE - A lineage-supported, versioned DBMS IDSaw (witness, car) 11Cathy Honda ∥ Mazda IDDrives (person, car) 21 Jimmy, Toyota ∥ Jimmy, Mazda 22 Billy, Honda ∥ Frank, Honda 23Hank, Honda IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank Suspects = π person (Saw ⋈ Drives) ? ? ? λ (31) = (11,2) (21,2) λ (32,1) = (11,1) (22,1) λ (33) = (11,1) 23 ; λ (32,2) = (11,1) (22,2)
9
ULDBs – Summary 27.03.2015 9 LIVE - A lineage-supported, versioned DBMS 1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences 4. Lineage ULDBs are closed and complete Uncertainty-Lineage Databases (ULDBs)
10
Can exclusively utilize lineage in order to compute the confidence of a result tuple. #P-complete for general Boolean formulas Approximation algorithms: Luby-Karp, etc. Lineage & Confidences 27.03.2015 10 LIVE - A lineage-supported, versioned DBMS λ (21) = (11 12 13) IDSaw(witness, car) 11(Mary, Honda) : 0.8 12(Susan, Honda) : 0.9 13(Betty, Honda) : 0.5 IDSuspectCars(car) 21 Honda : ? Select distinct car from Saw; P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) 0.99
11
IDPhoto(Number,Name) 2 11 (1, Amy) [0,1] : 1.0 12 (1, Bob) [0, ] : 0.6 13 (2, Carl) [0,1] : 0.3 14 (3, Dale) [1,1] : 0.1 Versioning (LDM Data Model) 27.03.2015 11 LIVE - A lineage-supported, versioned DBMS Version intervals for tuples Contiguous version numbers 0,…, Database has current version v D Tuples have a validity intervals [s, e] Valid-At Queries: Select * from Photo valid-at 2; Snapshot Queries: View Photo at 2; Possible Worlds: LDM databases encode lists of sets of possible worlds. IDPhoto(Number,Name) 2 12 (1, Bob) [0, ] : 0.6 IDPhoto @2 (Number,Name) 12 (1, Bob) : 0.6
12
Insert Tuple: Insert t with version [v D +1, ] commit; Increase v D Data Modifications – Insert 27.03.2015 12 LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 0 21 (Bob, NY, Analyst) [0, ] : 1.0 22 (Carl, IL, Teacher) [0, ] : 1.0 23 (David, PA, Manager) [0, ] : 0.6 24 (Frank, CA, Eng.) [1, ] : 0.3 IDPeople(Name, State, Job) 1 IDPeople(Name, State, Job) 2 25 (David, PA, CEO) [2, ] : 0.3 (1) (2)
13
Insert Tuple: Insert t with version [v D +1, ] Delete Tuple: Set end(t) to v D commit; Increase v D Data Modifications – Delete 27.03.2015 13 LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 2 21 (Bob, NY, Analyst) [0, ] : 1.0 22 (Carl, IL, Teacher) [0, ] : 1.0 23 (David, PA, Manager) [0, ] : 0.6 24 (Frank, CA, Eng.) [1, ] : 0.3 25 (David, PA, CEO) [2, ] : 0.3 22 (Carl, IL, Teacher) [0,2] : 1.0 IDPeople(Name, State, Job) 3 (1) (2) (3) (2)
14
Insert Tuple: Insert t with version [v D +1, ] Delete Tuple: Set end(t) to v D Update Value: Set end(t) to v D Insert t’ with version [v D +1, ] commit; Increase v D Data Modifications – Update 27.03.2015 14 LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 3 21 (Bob, NY, Analyst) [0, ] : 1.0 22 (Carl, IL, Teacher) [0,2] : 1.0 23 (David, PA, Manager) [0, ] : 0.6 24 (Frank, CA, Eng.) [1, ] : 0.3 25 (David, PA, CEO) [2, ] : 0.3 21 (Bob, CA, Student) [4, ] : 0.3 21 (Bob, NY, Analyst) [0,3] : 1.0 (1) (2) (3) (2) (4) IDPeople(Name, State, Job) 4
15
Insert Tuple: Insert t with version [v D +1, ] Delete Tuple: Set end(t) to v D Update Value: Set end(t) to v D Insert t’ with version [v D +1, ] Update Probability: Set end(t) to v D Insert t’=t with probability p’ and version [v D +1, ] commit; Increase v D Data Modifications – Update 27.03.2015 15 LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 4 21 (Bob, NY, Analyst) [0,3] : 1.0 22 (Carl, IL, Teacher) [0,2] : 1.0 23 (David, PA, Manager) [0, ] : 0.6 24 (Frank, CA, Eng.) [1, ] : 0.3 25 (David, PA, CEO) [2, ] : 0.3 21 (Bob, CA, Student) [4, ] : 0.3 (1) (2) (3) (2) (4) 21 (Bob, CA, Student) [5, ] : 0.7 21 (Bob, CA, Student) [4,4] : 0.3 (5) IDPeople(Name, State, Job) 5
16
Insert Tuple: Insert t with version [v D +1, ] Delete Tuple: Set end(t) to v D Update Value: Set end(t) to v D Insert t’ with version [v D +1, ] Update Probability: Set end(t) to v D Insert t’=t with probability p’ and version [v D +1, ] Possible worlds: Updates may create duplicate worlds, which are merged (a t any version v). Data Modifications – Summary 27.03.2015 16 LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 4 21 (Bob, NY, Analyst) [0,3] : 1.0 22 (Carl, IL, Teacher) [0,2] : 1.0 23 (David, PA, Manager) [0, ] : 0.6 24 (Frank, CA, Eng.) [1, ] : 0.3 25 (David, PA, CEO) [2, ] : 0.3 26 (Bob, CA, Student) [4, ] : 0.3 (1) (2) (3) (2) (4) 21 (Bob, CA, Student) [5, ] : 0.7 21 (Bob, CA, Student) [4,4] : 0.3 (5) IDPeople(Name, State, Job) 5
17
1) Data Computation (regular SQL, including lineage) 2) Interval Computation (stored procedure) Query Evaluation 27.03.2015 17 LIVE - A lineage-supported, versioned DBMS D D D 1, D 2, …, D n1 possible worlds at versions Q on each world encoding of possible worlds Q(D 1 ), Q(D 2 ), …, Q(D n ) implementation of Q operational semantics D + Result D 1, D 2, …, D n2 @ (0) @ (1) D 1, D 2, …, D nv @ (v D ) … … @ (0)
18
Can exclusively utilize lineage in order to compute the confidence of any result tuple. Can exclusively utilize lineage in order to compute the version interval of any result tuple. Lineage, Confidences & Versions 27.03.2015 18 LIVE - A lineage-supported, versioned DBMS
19
Positive Lineage (disjunctions & conjunctions) In the lineage formula λ (t) Replace every tuple t’ by its version interval Replace every with and every with Version Interval Computation 27.03.2015 19 LIVE - A lineage-supported, versioned DBMS λ (21) = (11 12 13) IDSaw(witness, car) 3 11 (Mary, Honda) [1, ] : 0.8 12 (Susan, Honda) [2, ] : 0.9 13 (Betty, Honda) [3, ] : 0.5 ID SuspectCars(car) 3 21 (Honda) ? : ? Select distinct car from Saw; P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) [1, ] : 0.99
20
Positive Lineage (disjunctions & conjunctions) In the lineage formula λ (t) Replace every tuple t’ by its version interval Replace every with and every with Version & Confidence Computation 27.03.2015 20 LIVE - A lineage-supported, versioned DBMS λ (21) = (11 12) IDSaw(witness, car) 3 11 (Mary, Honda) [1, ] : 0.8 12 (Susan, Honda) [2, ] : 0.9 13 (Betty, Honda) [3, ] : 0.5 ID SuspectCars(car) 3 21 (Honda) [1, ] : 0.99 Select distinct car from Saw; P(21) = 1 – (1-0.8) X (1-0.9) ID SuspectCars(car) 2 21 (Honda) ? : ? Select distinct car from Saw valid-at 2; [1, ] : 0.98
21
27.03.2015 21 LIVE - A lineage-supported, versioned DBMS Can decouple interval computation from data computation Or: push interval computation into query plans only when there is no negation. Interval Computations & Query Plans Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); r=(a) [0,10] u=(a) [0,10] t=(a) [0,10] r=(a) [0,10] s=(a) [5,15] – – Select R.A from R,S Where R.A=S.A; r=(a) [0,10] s=(a) [5,15] t=(a) [5,10]
22
Positive Lineage (disjunctions & conjunctions) Version interval computation PTIME (linear) Confidence computation #P-complete Arbitrary Lineage (including negation) Version interval computation PTIME (linear) if all confidences are known NP-hard if confidences are not known (need to check for idempotence of negated tuples) Confidence computation #P-complete Complexity Results 27.03.2015 22 LIVE - A lineage-supported, versioned DBMS
23
Probabilistic & versioned TPC-H setting Queries over Lineitem, Orders tables with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders) Update 0.1% to 1% of the input data Assign probabilities within [0,1] uniform-randomly to tuples Additional indexes for versioning Two B + -trees on (start, end) and end points of intervals Rewrite valid-at & snapshot queries using WHERE (start ≤ v ≤ end) predicates Experiments – Setup 27.03.2015 23 LIVE - A lineage-supported, versioned DBMS
24
Experiments – Results (I) 27.03.2015 24 LIVE - A lineage-supported, versioned DBMS Join query Overhead of versioned system vs. non-versioned system (versions not computed) Join query Overhead of computing versions (versioned system) (%)
25
Experiments – Results (II) 27.03.2015 25 LIVE - A lineage-supported, versioned DBMS Join query Progressive data updates (overwrite multiple times) Join query Valid-at queries vs. full version computation
26
Experiments – Results (III) 27.03.2015 26 LIVE - A lineage-supported, versioned DBMS Overhead of version computation, different query types (1% data modified)
27
LDMs are closed and complete Generalizes to full ULDB data model (including value alternatives & maybe (?) annotations) Can employ lineage also for update propagations Supports all of INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations Conclusions 27.03.2015 27 LIVE - A lineage-supported, versioned DBMS Lineage UncertaintyVersioning DBMS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.