Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center.

Slides:



Advertisements
Similar presentations
Brief Introduction to Logic. Outline Historical View Propositional Logic : Syntax Propositional Logic : Semantics Satisfiability Natural Deduction : Proofs.
Advertisements

1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis June 05 Answering queries across mappings Grigoris Karvounarakis University of Pennsylvania WPE-II Presentation.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
The Theory of NP-Completeness
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
 Schema mappings are logical assertions that describe the correspondence between two schemas Higher-level, declarative programming constructs Hide implementation.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
1.  Detailed Study of groups is a fundamental concept in the study of abstract algebra. To define the notion of groups,we require the concept of binary.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
A Semantic Characterization of Unbounded-Nondeterministic Abstract State Machines Andreas Glausch and Wolfgang Reisig 1.
Schema Mappings & Data Exchange Phokion G. Kolaitis IBM Almaden Research Center.
Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
Peer Data Exchange IBM Almaden Research Center joint work with
1 Debugging Schema Mappings with Routes Laura Chiticariu UC Santa Cruz (joint work with Wang-Chiew Tan)
Chapter 11: Limitations of Algorithmic Power
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Programming Language Semantics Denotational Semantics Chapter 5 Part III Based on a lecture by Martin Abadi.
Schema Mappings Data Exchange & Metadata Management Phokion G. Kolaitis IBM Almaden Research Center joint work with Ronald Fagin Renée J. Miller Lucian.
Induction and recursion
Equational Reasoning Math Foundations of Computer Science.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Composing and Inverting Schema Mappings
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
1 Approximate Data Exchange Michel de Rougemont Adrien Vieilleribière University Paris II & LRI University Paris-Sud & LRI ICDT 2007.
NP-completeness Section 7.4 Giorgi Japaridze Theory of Computability.
CS6045: Advanced Algorithms NP Completeness. NP-Completeness Some problems are intractable: as they grow large, we are unable to solve them in reasonable.
Foundations and Applications of Schema Mappings Phokion G. Kolaitis University of California Santa Cruz & IBM Almaden Research Center.
Copyright © Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS.
COMPLEXITY. Satisfiability(SAT) problem Conjunctive normal form(CNF): Let S be a Boolean expression in CNF. That is, S is the product(and) of several.
Overview of the theory of computation Episode 3 0 Turing machines The traditional concepts of computability, decidability and recursive enumerability.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Complexity Classes Karl Lieberherr. Source From riptive_complexity.html.
COMPLEXITY. Satisfiability(SAT) problem Conjunctive normal form(CNF): Let S be a Boolean expression in CNF. That is, S is the product(and) of several.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts - 6 th Edition Chapter 8: Relational Algebra.
1 CMPS 277 – Principles of Database Systems Lecture #8.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Applying logic to practice in computer science
Module 2: Intro to Relational Model
Computing Full Disjunctions
Chapter 2: Intro to Relational Model
Nested Mappings: Schema Mapping Reloaded
Nested Mappings: Schema Mapping Reloaded
Implementing Mapping Composition
Semantic Adaptation of Schema Mappings when Schemas Evolve
Foundations of Data Exchange and Metadata Management
Chapter 2: Intro to Relational Model
Data Exchange: Semantics and Query Answering
Chapter 2: Intro to Relational Model
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Presentation transcript:

Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center

2 Joint work with  Ron Fagin, IBM Almaden Research Center  Renee Miller, University of Toronto  Lucian Popa, IBM Almaden Research Center  Wang-Chiew Tan, UC Santa Cruz Studied foundational aspects of schema mappings:  data exchange based on schema mappings  composition of schema mappings

3 Schema Mappings & Data Exchange Schema S 1 Schema S 2  12 Schema Mappings are logic-based specifications that describe the relationship between a “source” schema S 1 and a “target” schema S 2 The Data Exchange Problem associated with such a schema mapping M 12 is as follows:  Input: Source instance I 1  Output: Target instance I 2 such that satisfy the specifications of the schema mapping I1I1 I2I2 q

4 Main issues in data exchange For a given source instance, there may be more than one target instance satisfying the specifications of the schema mapping. Thus,  When more than one solution exist, which solutions are “better” than others?  How do we compute a “best” solution?  Can the certain answers of target queries be obtained by evaluating them on a “best” solution?

5 Schema Mapping Specification Language  The relationship between the source and the target is given by a set Σ 12 of source-to-target tuple generating dependencies (s-t tgds)  (x)   y  (x, y), where   (x) is a conjunction of atoms over the source and   (x, y) is a conjunction of atoms over the target.  Among the most general assertions used in data integration  Generalize LAV (local-as-views) and GAV (global-as-views) specifications in data integration  Equivalent to GLAV (local-and-global-as-views) specifications

6 Universal Solutions in Data Exchange We introduced the notion of universal solutions as the “best” solutions in data exchange  By definition, they have homomorphisms to all other solutions (thus, they are the most general solutions). Main Results (FKMP in ICDT 2003)  Universal solutions are unique up to homomorphic equivalence; they represent the entire solution space.  The chase procedure produces a universal solution in polynomial time.  The certain answers of target conjunctive queries can be obtained by evaluation on an arbitrary universal solution

7 Composing Schema Mappings Given  12 = (S 1, S 2,  12 ) and  23 = (S 2, S 3,  23 ), derive a schema mapping  13 = (S 1, S 3,  13 ) that is “equivalent” to the successive application of  12 and  23. What is the semantics of composition of schema mappings? What does “equivalent” mean in this context? Schema S 1 Schema S 2 Schema S 3  12  23  13

8 Earlier Work Metadata Model Management (Bernstein in CIDR 2003)  Composition is one of the fundamental operators  However, no semantics is given Composing Mappings among Data Sources (Madhavan & Halevy in VLDB 2003)  First to propose a semantics for composition  However, their definition is in terms of maintaining the same certain answers relative to a class of queries.  Their notion of composition depends on the class of queries and may not be unique up to logical equivalence.

9 Semantics of Composition  Definition: (FKPT in PODS 2004) A schema mapping  13 is a composition of  12 and  23 if for every instance I 1 of S 1 and every instance I 3 of S 3,   13 if and only if there exists I 2 such that   12 and   23.  In other words, Inst(  13 ) = Inst(  12 )  Inst(  23 ), where Inst(  ) = { |   ST } Thus,  13 defines the composition of the binary relations of the instances associated with  12 and  23. Schema S 1 Schema S 2 Schema S 3  12  23  13

10 The Composition of Schema Mappings Fact: If  = (S 1, S 3,  ) and  ’ = (S 1, S 3,  ’) are both compositions of  12 and  23, then  are  ’ are logically equivalent. For this reason:  We say that  (or  ’) is the composition of  12 and  23.  We write  12   23 to denote it Definition: The composition query of  12 and  23 is the set Inst(  12 )  Inst(  23 )

11 Issues in Composition of Schema Mappings The semantics of composition was the first main issue. Some other key issues: Is the language of finite sets of s-t tgds closed under composition? That is, if  12 and  23 are specified by finite sets of s-t tgds, is  12   23 also specified by a finite set of s-t tgds? If not, what is the “right” language for composing schema mappings? What is the complexity of the associated composition query?

12 Composition: Expressibility & Complexity Σ 12 Σ 23 Σ 13 Composition Query finite set of full s-t tgds  (x)   (x) finite set of s-t tgds  (x)   y  (x, y) finite set of s-t tgds  (x)  y  (x,y) in PTIME finite set of s-t tgds  (x)   y  (x,y) finite set of (full) s-t tgds  (x)   (x) may not be definable: by any set of s-t tgds; in FO-logic; in Datalog in NP; can be NP-complete

13 Enrollments Example  12 :  n  c (Takes(n,c)   s Students(n,s))  n  c (Takes(n,c)  Takes 1 (n,c))  23 :  n  s  c (Students(n,s)  Takes 1 (n,c)  Enrollments(s,c)) Implied by the composition. But what if Alice takes 3 courses ?  n  c 1  c 2 ( Takes(n,c 1 )  Takes(n,c 2 )   s (Enrollments(s,c 1 )  Enrollments(s,c 2 )) ) AliceMath AliceArt Takes AliceMath AliceArt Takes 1 Alice1234 Students 1234Math 1234Art Enrollments I1I1 I2I2 I3I3

14 Enrollments Example - continued There are infinitely many s-t tgds that are implied by the composition.  12   23 = (S 1, S 3,  13 ), where  13 = { …  n  c 1 …  c k ( Takes(n,c 1 )  …  Takes(n,c k )   s (Enrollments(s,c 1 )  …  Enrollments(s,c k )) ), … } We show that  13 is not equivalent to any finite set of s-t tgds

15 Employee Example  12 :   e ( Emp(e)   m Mgr1(e,m) )  23 :   e  m( Mgr1(e,m)  Mgr(e,m) )   e ( Mgr1(e,e)  SelfMgr(e) ) Theorem: The composition  12   23  is not definable by any finite set of s-t tgds;  is not FO-definable;  is not definable in Datalog. Emp e Mgr1 e m Mgr e m SelfMgr e

16 Second-Order Tgds Definition: Let S be a source schema and T a target schema. A second-order tuple-generating dependency (SO tgd) is a formula of the form:  f 1 …  f m ( (  x 1 (  1   1 ))  …  (  x n (  n   n )) ), where  Each f i is a function symbol  Each  i is a conjunction of atoms from S and equalities of terms  Each  i is a conjunction of atoms from T Theorem: The composition of two finite sets of s-t tgds is always definable by a SO-tgd.

17 Employee Example - revisited  12 :   e ( Emp(e)   m Mgr1(e,m) )  23 :   e  m( Mgr1(e,m)  Mgr(e,m) )   e ( Mgr1(e,e)  SelfMgr(e) ) Fact: The composition is definable by the SO-tgd  13 :   f (  e( Emp(e)  Mgr(e,f(e) )   e( Emp(e)  (e=f(e))  SelfMgr(e) ) )

18 Composing SO-Tgds and Data Exchange Theorem:  The composition of two SO-tgds is definable by a SO-tgd  There is a polynomial-time algorithm for composing SO-tgds  The chase procedure can be extended to schema mappings specified by SO-tgds, so that it produces universal solutions in polynomial time  For schema mappings specified by SO-tgds, the certain answers of target conjunctive queries are polynomial-time computable.

19 Synopsis of Schema Mapping Composition s-t tgds are not closed under composition. SO-tgds form a well-behaved fragment of second-order logic.  SO-tgds are closed under composition; thus, they are a “good” language for composing schema mappings.  SO-tgds are “chasable”. Polynomial-time data exchange with universal solutions Polynomial-time computation of certain answers of target conjunctive queries. SO-tgds form the basis of the schema-mapping language used in the Criollo metadata management system.

20 "The notion of composition of maps leads to the most natural account of fundamental notions of mathematics, from multiplication, addition, and exponentiation, through the basic notions of logic." Conceptual Mathematics by F.W. Lawevere and S.H. Schanuel

21 Reduction from 3-Colorability  12   x  y (E(x,y)   u  v (C(x,u)  C(y,v)))   x  y (E(x,y)  F(x,y))  23   x  y  u  v (C(x,u)  C(y,v)  F(x,y)  D(u,v)) Let I 3 = { (r,g), (g,r), (b,r), (r,b), (g,b), (b,g) } Given G=(V, E),  let I 1 be the instance over S 1 consisting of the edge relation E of G G is 3-colorable iff  Inst(  12 )  Inst(  23 ) [Dawar98] showed that 3-colorability is not expressible in L  

22 Algorithm Compose(  12,  23 ) Input: Two schema mappings  12 and  23 Output: A schema mapping  13 =  12   23 Step 1: Split up tgds in  12 and  23  C 12 = Emp(e)  (Mgr1(e, f(e))  C 23 = Mgr1(e,m)  Mgr(e,m) Mgr1(e,e)  SelfMgr(e) Step 2: Compose C 12 with C 23   1 : Emp(e 0 )  (e=e 0 )  (m=f(e 0 ))  Mgr1(e,m)   2 : Emp(e 0 )  (e=e 0 )  (e=f(e 0 ))  SelfMgr(e) Step 3: Construct  13  Return  13 = (S 1, S 3,  13 ) where   13 =  f(  e 0  e  m  1   e 0  e  2 )

23 Data Exchange with SO tgds Example Let  = (S, T,  ST ) where  ST is:  f(  x  y (R(x,y)  U(x,y,f(x))   x  x’  y  y’ (R(x,y)  R(x’,y’)  (f(x)=f(x’))  T(y,y’)) ) abf(a) ac def(d) ab ac de RU bb bc cb cc ee T

24 Data Exchange with SO tgds Example Let  = (S, T,  ST ) where  ST is:  f(  x  y (R(x,y)  U(x,y,f(x))   x  x’  y  y’ (R(x,y)  R(x’,y’)  (f(x)=f(x’))  T(y,y’)) ) abN0N0 acN0N0 deN1N1 ab ac de RU bb bc cb cc ee T