ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

Slides:



Advertisements
Similar presentations
Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.
Advertisements

Whats your favorite color? a pear a green pear.
CS 319: Theory of Databases
Dr. Alexandra I. Cristea CS 319: Theory of Databases.
LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
Domain Restriction on Relation domain restriction operator,, restricts a relation to only those members whose domain is in a specified set. domain restriction.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Pushdown Automata Chapter 12. Recognizing Context-Free Languages We need a device similar to an FSM except that it needs more power. The insight: Precisely.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Uncertainty Lineage Data Bases Very Large Data Bases
Managing Uncertain Data Anish Das Sarma Stanford University May 19, Anish Das Sarma.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture15: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
1 Design Theory. 2 Let U be a set of attributes and F be a set of functional dependencies on U. Suppose that X  U is a set of attributes. Definition:
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Introduction to Database Systems 1 Relational Algebra Relational Model: Topic 3.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
Representation Formalisms for Uncertain Data Jennifer Widom with Anish Das Sarma Omar Benjelloun Alon Halevy Trio and other participants in the Trio Project.
Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom Stanford University.
1 Triggers: Correction. 2 Mutating Tables (Explanation) The problems with mutating tables are mainly with FOR EACH ROW triggers STATEMENT triggers can.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong DFA minimization.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
RA PRESENTATION Sublinear Geometric Algorithms B 張譽馨 B 汪牧君 B 李元翔.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
CHAPTER 1 Regular Languages
January 9, 2015CS21 Lecture 31 CS21 Decidability and Tractability Lecture 3 January 9, 2015.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Based on slides by Patrice Belleville and Steve Wolfman CPSC 121: Models of Computation Unit 11: Sets.
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.
Great Theoretical Ideas in Computer Science.
1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab.
SOLVING ABSOLUTE-VALUE EQUATIONS
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Busch Complexity Lectures: Reductions
Reductions Costas Busch - LSU.
3.1 Functional Dependencies
TRIO Data Uncertainty Lineage Data Model Query Language System
Trio A System for Data, Uncertainty, and Lineage
Hierarchy of languages
Busch Complexity Lectures: Undecidable Problems (unsolvable problems)
Decomposition of relational schemes
CS21 Decidability and Tractability
Lecture 33: The Relational Model 2
Chapter 2: Intro to Relational Model
Probabilistic Databases
CS21 Decidability and Tractability
Presentation transcript:

ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom

Running Example: Crime- Solving Saw(witness,car) // may be uncertain Drives(person,car) // may be uncertain Suspects(person) = π person (Saw ⋈ Drives)

Model for Uncertainty

1.X-Tuples –more expressive than or-attributes 2. ‘?’ (Maybe) Annotations

Our Model for Uncertainty 1. X-Tuples: uncertainty about value 2. ‘?’ (Maybe) Annotations Saw (witness,car) (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) witnesscar Amy{ Honda, Toyota, Mazda } = Three possible instances

Our Model for Uncertainty 1. X-Tuples: uncertainty about value 2. ‘?’ (Maybe) Annotations Saw (witness,car) (Amy, Honda) ∥ (Sally, Toyota) ∥ (Amy, Mazda) Three possible instances Not expressible using or-attributes

Six possible instances Our Model for Uncertainty 1. X-Tuples 2. ‘?’ (Maybe): uncertainty about presence Saw (witness,car) (Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda) (Betty, Acura) ?

Our Model is Not Closed Saw (witness,car) (Cathy, Honda) ∥ (Cathy, Mazda) Drives (person,car) (Jimmy, Toyota) ∥ (Jimmy, Mazda) (Billy, Honda) ∥ (Frank, Honda) (Hank, Honda) Suspects Jimmy Billy ∥ Frank Hank Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT

Lineage

Lineage to the Rescue Lineage –Captures “where data came from” –In Trio: A function λ from alternatives to other alternatives (or external sources) Model, with lineage, is complete –proof omitted

Example with Lineage IDSaw (witness,car) 11 (Cathy, Honda) ∥ (Cathy, Mazda) IDDrives (person,car) 21 (Jimmy, Toyota) ∥ (Jimmy, Mazda) 22 (Billy, Honda) ∥ (Frank, Honda) 23(Hank, Honda) IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? Suspects = π person (Saw ⋈ Drives) λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 Correctly captures possible instances in the result

Example: What is the result of joining these tables? IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda)

What is a legal instance of a ULDB? Each tuple t in a ULDB is associated by with a set of pairs (i,j) such that the j-th alternative of the i-th tuple was used to derive i IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23

What is a legal instance of a ULDB? Let S be the set of all symbols (i.e., pairs (i,j)) in the database An instance of D is derived by picking a set S’µ S such that –if (i,j)2 S’ then for every j  j’, (i,j’)  S’ – 8 (i,j) 2 S’, (i,j)µ S’ –if, for some X-tuple t i, there does not exist a (i,j)2 S’, then t i is a maybe-tuple and for all (i,j’)2 t i, either (i,j) = ; or (i,j)* S’

Example: What are all legal instances of the following ULDB? ? (41,1) = {(21,1),(31,1)} IDAccuses(Witness, Person) 41(Amy, Jimmy) 42(Amy, Jimmy) 43(Amy, Billy) 44(Betty, Billy) ? (42,1) = {(21,2),(32,1)} ? (41,1) = {(21,1),(33,1)} ? (41,1) = {(23,1),(34,1)} IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda)

Well-Behaved Lineage In principle, may be any function – * is the transitive closure of However, useful to restrict to be well behaved: –Acyclic: 8 (i,j), (i,j)  * (i,j) –Deterministic: 8 (i,j), (i,j’), if j  j’ then either (i,j)  (i,j’) or (i,j)=; –Uniform: 8 (i,j),(i,j’), B(i,j)=B(i,j’) where B(i,j) = {k | 9 l, (k,l)2 (i,j)}

Example: Is this ULDB Well- Behaved? IDA 11apple 12pear IDB 21red 22green (11,1) = {(21,1)} (21,1) = {(11,1)}

Example: Is this ULDB Well- Behaved? IDA 11apple 12pear IDB 21red || green 22green (21,1) = {(11,1)} (21,2) = {(11,1)}

Example: Is this ULDB Well- Behaved? IDA 11apple || peach 12pear || grape IDB 21red || pink 22green || purple (21,1) = {(11,1)} (21,2) = {(11,2)} (22,1) = {(12,1)} (21,2) = {(11,2)}

Querying

Querying How do we query a ULDB? What tuples are in the answer? How is the lineage of the answer defined? –for join? –projection? –minus? Only consider projection, multi-set selection, join, multiset union –why?

Query Evaluation Algorithm Given, ULDB D and query Q Step 1: Create D’, an ordinary database derived by taking all alternatives of all tuples IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) IDSaw(Witness, Car) 21, 1(Amy, Mazda) 21, 2(Amy, Toyota) 23, 1(Betty, Honda)

Query Evaluation Algorithm Step 2: Evaluate the query normally IDSaw(Witness, Car) 21, 1(Amy, Mazda) 21, 2(Amy, Toyota) 23, 1(Betty, Honda) IDAccuses(Witness, Person) 41(Amy, Jimmy) 42(Amy, Jimmy) 43(Amy, Billy) 44(Betty, Billy) BC

Query Evaluation Algorithm Step 3: Group tuples in result by the tuple identifiers (the i value) corresponding to their lineage by the evaluation Step 4: For each group of tuple identifiers –create a maybe tuple t l with all tuples in group as alternatives –set lineage as derived by the evaluation Note: all tuples created are maybe-tuples!!

Examples Complete example from previous slides Compute the result of the query: –(R(A,B) BC S(B,C)) [ T(D,E) IDR(A,B) 11(1,2) || (1,3) 12(4,1) || (5,1) IDS(B,C) 11(2,4) || (2,5) 12(1,3) || (2,3) IDT(D,E) 11(7,8) 12(9,10) || (9,11)

Minimality

Minimality ULDBs may contain superfluous information Two types of minimality: –data minimality: ? may be unneeded, entire tuple may be unneeded –lineage minimality

Data Minimality: Example 1 IDSaw(Witness, Car) 21(Amy, Mazda)||(Amy, Toyota) 23(Betty, Honda) ? IDDrives(Person, Car) 31(Jimmy, Mazda) 32(Jimmy, Toyota) 33(Billy, Mazda) 34(Billy, Honda) IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank ? ? ? λ (31) = (11,2),(21,2) λ (32,1) = (11,1),(22,1); λ (32,2) = (11,1),(22,2) λ (33) = (11,1), 23 Which ? is not needed?

Data Minimality: Example 2 What is unneeded in the result of the following query: –(SawBC Car1) BC witness (SawBC Car2) IDSaw(Witness, Car) 1(Amy, Mazda)||(Amy, Toyota) IDCar1(Car) 2Mazda IDCar2(Car) 3Toyota

Data-Minimality: Formally An alternative (i,j) is extraneous if removing it from the relation does not change the set of possible instances A ? on a tuple is extraneous if removing it does not change the set of possible instances

Checking for Data-Minimality Theorem: Let D be a well-behaved ULDB. An alternative (k,l) is extraneous if and only if there exist (i,j), (i,j’)2 (k,l) with j  j’ –Proof?

Checking for Data-Minimality Let h(t) be the set of base tuples of t –tuples that are used to derive an alternative in t, which have empty lineage Let m(t) be the number of alternative of t that are not extraneous Theorem: Let D be a well-behaved ULDB. A ? on an x-tuple t2 D is extraneous if and only if: –none of the tuples in h(t) have a ? –m(t) =  t ’ 2 h(t) m(t’)

Test Yourself Go back to slides and prove what is extraneous, using the characteristics

Tuple Membership Problems

Tuple Membership, Tuple Certainty Recall that: –The tuple membership problem is to determine if a tuple is a member in some instance of the ULDB –The tuple certainty problem is to determine if a tuple is a member in some instance of the ULDB How would you answer tuple membership? Tuple certainty? What is the complexity of these problems?