Presentation is loading. Please wait.

Presentation is loading. Please wait.

ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Similar presentations


Presentation on theme: "ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom."— Presentation transcript:

1 ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom

2 Running Example: Crime- Solving Saw(witness,car) // may be uncertain Drives(person,car) // may be uncertain Suspects(person) = π person (Saw ⋈ Drives)

3 Model for Uncertainty

4 1.X-Tuples –more expressive than or-attributes 2. ‘?’ (Maybe) Annotations

5 Our Model for Uncertainty 1. X-Tuples: uncertainty about value 2. ‘?’ (Maybe) Annotations Saw (witness,car) (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) witnesscar Amy{ Honda, Toyota, Mazda } = Three possible instances

6 Our Model for Uncertainty 1. X-Tuples: uncertainty about value 2. ‘?’ (Maybe) Annotations Saw (witness,car) (Amy, Honda) ∥ (Sally, Toyota) ∥ (Amy, Mazda) Three possible instances Not expressible using or-attributes

7 Six possible instances Our Model for Uncertainty 1. X-Tuples 2. ‘?’ (Maybe): uncertainty about presence Saw (witness,car) (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) (Betty, Acura) ?

8 Our Model is Not Closed Saw (witness,car) (Cathy, Honda) ∥ (Cathy, Mazda) Drives (person,car) (Jimmy, Toyota) ∥ (Jimmy, Mazda) (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) Suspects Jimmy Billy ∥ Frank Hank Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT

9 Lineage

10 Lineage to the Rescue Lineage –Captures “where data came from” –In Trio: A function λ from alternatives to other alternatives (or external sources) Model, with lineage, is complete –proof omitted

11 Example with Lineage IDSaw (witness,car) 11 (Cathy, Honda) ∥ (Cathy, Mazda) IDDrives (person,car) 21 (Jimmy, Toyota) ∥ (Jimmy, Mazda) 22 (Billy, Honda) ∥ (Frank, Honda) 23(Hank, Honda) IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? Suspects = π person (Saw ⋈ Drives) λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 Correctly captures possible instances in the result

12 Example: What is the result of joining these tables? IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda)

13 What is a legal instance of a ULDB? Each tuple t in a ULDB is associated by with a set of pairs (i,j) such that the j-th alternative of the i-th tuple was used to derive i IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23

14 What is a legal instance of a ULDB? Let S be the set of all symbols (i.e., pairs (i,j)) in the database An instance of D is derived by picking a set S’µ S such that –if (i,j)2 S’ then for every j  j’, (i,j’)  S’ – 8 (i,j) 2 S’, (i,j)µ S’ –if, for some X-tuple t i, there does not exist a (i,j)2 S’, then t i is a maybe-tuple and for all (i,j’)2 t i, either (i,j) = ; or (i,j)* S’

15 Example: What are all legal instances of the following ULDB? ? (41,1) = {(21,1),(31,1)} IDAccuses(Witness, Person) 41(Amy, Jimmy) 42(Amy, Jimmy) 43(Amy, Billy) 44(Betty, Billy) ? (42,1) = {(21,2),(32,1)} ? (41,1) = {(21,1),(33,1)} ? (41,1) = {(23,1),(34,1)} IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda)

16 Well-Behaved Lineage In principle, may be any function – * is the transitive closure of However, useful to restrict to be well behaved: –Acyclic: 8 (i,j), (i,j)  * (i,j) –Deterministic: 8 (i,j), (i,j’), if j  j’ then either (i,j)  (i,j’) or (i,j)=; –Uniform: 8 (i,j),(i,j’), B(i,j)=B(i,j’) where B(i,j) = {k | 9 l, (k,l)2 (i,j)}

17 Example: Is this ULDB Well- Behaved? IDA 11apple 12pear IDB 21red 22green (11,1) = {(21,1)} (21,1) = {(11,1)}

18 Example: Is this ULDB Well- Behaved? IDA 11apple 12pear IDB 21red || green 22green (21,1) = {(11,1)} (21,2) = {(11,1)}

19 Example: Is this ULDB Well- Behaved? IDA 11apple || peach 12pear || grape IDB 21red || pink 22green || purple (21,1) = {(11,1)} (21,2) = {(11,2)} (22,1) = {(12,1)} (21,2) = {(11,2)}

20 Querying

21 Querying How do we query a ULDB? What tuples are in the answer? How is the lineage of the answer defined? –for join? –projection? –minus? Only consider projection, multi-set selection, join, multiset union –why?

22 Query Evaluation Algorithm Given, ULDB D and query Q Step 1: Create D’, an ordinary database derived by taking all alternatives of all tuples IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) IDSaw(Witness, Car) 21, 1(Amy, Mazda) 21, 2(Amy, Toyota) 23, 1(Betty, Honda)

23 Query Evaluation Algorithm Step 2: Evaluate the query normally IDSaw(Witness, Car) 21, 1(Amy, Mazda) 21, 2(Amy, Toyota) 23, 1(Betty, Honda) IDAccuses(Witness, Person) 41(Amy, Jimmy) 42(Amy, Jimmy) 43(Amy, Billy) 44(Betty, Billy) BC

24 Query Evaluation Algorithm Step 3: Group tuples in result by the tuple identifiers (the i value) corresponding to their lineage by the evaluation Step 4: For each group of tuple identifiers –create a maybe tuple t l with all tuples in group as alternatives –set lineage as derived by the evaluation Note: all tuples created are maybe-tuples!!

25 Examples Complete example from previous slides Compute the result of the query: –(R(A,B) BC S(B,C)) [ T(D,E) IDR(A,B) 11(1,2) || (1,3) 12(4,1) || (5,1) IDS(B,C) 11(2,4) || (2,5) 12(1,3) || (2,3) IDT(D,E) 11(7,8) 12(9,10) || (9,11)

26 Minimality

27 Minimality ULDBs may contain superfluous information Two types of minimality: –data minimality: ? may be unneeded, entire tuple may be unneeded –lineage minimality

28 Data Minimality: Example 1 IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda) IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 Which ? is not needed?

29 Data Minimality: Example 2 What is unneeded in the result of the following query: –(SawBC Car1) BC witness (SawBC Car2) IDSaw(Witness, Car) 1(Amy, Mazda)||(Amy, Toyota) IDCar1(Car) 2Mazda IDCar2(Car) 3Toyota

30 Data-Minimality: Formally An alternative (i,j) is extraneous if removing it from the relation does not change the set of possible instances A ? on a tuple is extraneous if removing it does not change the set of possible instances

31 Checking for Data-Minimality Theorem: Let D be a well-behaved ULDB. An alternative (k,l) is extraneous if and only if there exist (i,j), (i,j’)2 (k,l) with j  j’ –Proof?

32 Checking for Data-Minimality Let h(t) be the set of base tuples of t –tuples that are used to derive an alternative in t, which have empty lineage Let m(t) be the number of alternative of t that are not extraneous Theorem: Let D be a well-behaved ULDB. A ? on an x-tuple t2 D is extraneous if and only if: –none of the tuples in h(t) have a ? –m(t) =  t ’ 2 h(t) m(t’)

33 Test Yourself Go back to slides 28-29 and prove what is extraneous, using the characteristics

34 Tuple Membership Problems

35 Tuple Membership, Tuple Certainty Recall that: –The tuple membership problem is to determine if a tuple is a member in some instance of the ULDB –The tuple certainty problem is to determine if a tuple is a member in some instance of the ULDB How would you answer tuple membership? Tuple certainty? What is the complexity of these problems?


Download ppt "ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom."

Similar presentations


Ads by Google