ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom
Running Example: Crime- Solving Saw(witness,car) // may be uncertain Drives(person,car) // may be uncertain Suspects(person) = π person (Saw ⋈ Drives)
Model for Uncertainty
1.X-Tuples –more expressive than or-attributes 2. ‘?’ (Maybe) Annotations
Our Model for Uncertainty 1. X-Tuples: uncertainty about value 2. ‘?’ (Maybe) Annotations Saw (witness,car) (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) witnesscar Amy{ Honda, Toyota, Mazda } = Three possible instances
Our Model for Uncertainty 1. X-Tuples: uncertainty about value 2. ‘?’ (Maybe) Annotations Saw (witness,car) (Amy, Honda) ∥ (Sally, Toyota) ∥ (Amy, Mazda) Three possible instances Not expressible using or-attributes
Six possible instances Our Model for Uncertainty 1. X-Tuples 2. ‘?’ (Maybe): uncertainty about presence Saw (witness,car) (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) (Betty, Acura) ?
Our Model is Not Closed Saw (witness,car) (Cathy, Honda) ∥ (Cathy, Mazda) Drives (person,car) (Jimmy, Toyota) ∥ (Jimmy, Mazda) (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) Suspects Jimmy Billy ∥ Frank Hank Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT
Lineage
Lineage to the Rescue Lineage –Captures “where data came from” –In Trio: A function λ from alternatives to other alternatives (or external sources) Model, with lineage, is complete –proof omitted
Example with Lineage IDSaw (witness,car) 11 (Cathy, Honda) ∥ (Cathy, Mazda) IDDrives (person,car) 21 (Jimmy, Toyota) ∥ (Jimmy, Mazda) 22 (Billy, Honda) ∥ (Frank, Honda) 23(Hank, Honda) IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? Suspects = π person (Saw ⋈ Drives) λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 Correctly captures possible instances in the result
Example: What is the result of joining these tables? IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda)
What is a legal instance of a ULDB? Each tuple t in a ULDB is associated by with a set of pairs (i,j) such that the j-th alternative of the i-th tuple was used to derive i IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23
What is a legal instance of a ULDB? Let S be the set of all symbols (i.e., pairs (i,j)) in the database An instance of D is derived by picking a set S’µ S such that –if (i,j)2 S’ then for every j j’, (i,j’) S’ – 8 (i,j) 2 S’, (i,j)µ S’ –if, for some X-tuple t i, there does not exist a (i,j)2 S’, then t i is a maybe-tuple and for all (i,j’)2 t i, either (i,j) = ; or (i,j)* S’
Example: What are all legal instances of the following ULDB? ? (41,1) = {(21,1),(31,1)} IDAccuses(Witness, Person) 41(Amy, Jimmy) 42(Amy, Jimmy) 43(Amy, Billy) 44(Betty, Billy) ? (42,1) = {(21,2),(32,1)} ? (41,1) = {(21,1),(33,1)} ? (41,1) = {(23,1),(34,1)} IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda)
Well-Behaved Lineage In principle, may be any function – * is the transitive closure of However, useful to restrict to be well behaved: –Acyclic: 8 (i,j), (i,j) * (i,j) –Deterministic: 8 (i,j), (i,j’), if j j’ then either (i,j) (i,j’) or (i,j)=; –Uniform: 8 (i,j),(i,j’), B(i,j)=B(i,j’) where B(i,j) = {k | 9 l, (k,l)2 (i,j)}
Example: Is this ULDB Well- Behaved? IDA 11apple 12pear IDB 21red 22green (11,1) = {(21,1)} (21,1) = {(11,1)}
Example: Is this ULDB Well- Behaved? IDA 11apple 12pear IDB 21red || green 22green (21,1) = {(11,1)} (21,2) = {(11,1)}
Example: Is this ULDB Well- Behaved? IDA 11apple || peach 12pear || grape IDB 21red || pink 22green || purple (21,1) = {(11,1)} (21,2) = {(11,2)} (22,1) = {(12,1)} (21,2) = {(11,2)}
Querying
Querying How do we query a ULDB? What tuples are in the answer? How is the lineage of the answer defined? –for join? –projection? –minus? Only consider projection, multi-set selection, join, multiset union –why?
Query Evaluation Algorithm Given, ULDB D and query Q Step 1: Create D’, an ordinary database derived by taking all alternatives of all tuples IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) IDSaw(Witness, Car) 21, 1(Amy, Mazda) 21, 2(Amy, Toyota) 23, 1(Betty, Honda)
Query Evaluation Algorithm Step 2: Evaluate the query normally IDSaw(Witness, Car) 21, 1(Amy, Mazda) 21, 2(Amy, Toyota) 23, 1(Betty, Honda) IDAccuses(Witness, Person) 41(Amy, Jimmy) 42(Amy, Jimmy) 43(Amy, Billy) 44(Betty, Billy) BC
Query Evaluation Algorithm Step 3: Group tuples in result by the tuple identifiers (the i value) corresponding to their lineage by the evaluation Step 4: For each group of tuple identifiers –create a maybe tuple t l with all tuples in group as alternatives –set lineage as derived by the evaluation Note: all tuples created are maybe-tuples!!
Examples Complete example from previous slides Compute the result of the query: –(R(A,B) BC S(B,C)) [ T(D,E) IDR(A,B) 11(1,2) || (1,3) 12(4,1) || (5,1) IDS(B,C) 11(2,4) || (2,5) 12(1,3) || (2,3) IDT(D,E) 11(7,8) 12(9,10) || (9,11)
Minimality
Minimality ULDBs may contain superfluous information Two types of minimality: –data minimality: ? may be unneeded, entire tuple may be unneeded –lineage minimality
Data Minimality: Example 1 IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda) IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 Which ? is not needed?
Data Minimality: Example 2 What is unneeded in the result of the following query: –(SawBC Car1) BC witness (SawBC Car2) IDSaw(Witness, Car) 1(Amy, Mazda)||(Amy, Toyota) IDCar1(Car) 2Mazda IDCar2(Car) 3Toyota
Data-Minimality: Formally An alternative (i,j) is extraneous if removing it from the relation does not change the set of possible instances A ? on a tuple is extraneous if removing it does not change the set of possible instances
Checking for Data-Minimality Theorem: Let D be a well-behaved ULDB. An alternative (k,l) is extraneous if and only if there exist (i,j), (i,j’)2 (k,l) with j j’ –Proof?
Checking for Data-Minimality Let h(t) be the set of base tuples of t –tuples that are used to derive an alternative in t, which have empty lineage Let m(t) be the number of alternative of t that are not extraneous Theorem: Let D be a well-behaved ULDB. A ? on an x-tuple t2 D is extraneous if and only if: –none of the tuples in h(t) have a ? –m(t) = t ’ 2 h(t) m(t’)
Test Yourself Go back to slides and prove what is extraneous, using the characteristics
Tuple Membership Problems
Tuple Membership, Tuple Certainty Recall that: –The tuple membership problem is to determine if a tuple is a member in some instance of the ULDB –The tuple certainty problem is to determine if a tuple is a member in some instance of the ULDB How would you answer tuple membership? Tuple certainty? What is the complexity of these problems?