Download presentation
Presentation is loading. Please wait.
1
Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center
2
2 Joint work with Ron Fagin, IBM Almaden Research Center Renee Miller, University of Toronto Lucian Popa, IBM Almaden Research Center Wang-Chiew Tan, UC Santa Cruz Studied foundational aspects of schema mappings: data exchange based on schema mappings composition of schema mappings
3
3 Schema Mappings & Data Exchange Schema S 1 Schema S 2 12 Schema Mappings are logic-based specifications that describe the relationship between a “source” schema S 1 and a “target” schema S 2 The Data Exchange Problem associated with such a schema mapping M 12 is as follows: Input: Source instance I 1 Output: Target instance I 2 such that satisfy the specifications of the schema mapping I1I1 I2I2 q
4
4 Main issues in data exchange For a given source instance, there may be more than one target instance satisfying the specifications of the schema mapping. Thus, When more than one solution exist, which solutions are “better” than others? How do we compute a “best” solution? Can the certain answers of target queries be obtained by evaluating them on a “best” solution?
5
5 Schema Mapping Specification Language The relationship between the source and the target is given by a set Σ 12 of source-to-target tuple generating dependencies (s-t tgds) (x) y (x, y), where (x) is a conjunction of atoms over the source and (x, y) is a conjunction of atoms over the target. Among the most general assertions used in data integration Generalize LAV (local-as-views) and GAV (global-as-views) specifications in data integration Equivalent to GLAV (local-and-global-as-views) specifications
6
6 Universal Solutions in Data Exchange We introduced the notion of universal solutions as the “best” solutions in data exchange By definition, they have homomorphisms to all other solutions (thus, they are the most general solutions). Main Results (FKMP in ICDT 2003) Universal solutions are unique up to homomorphic equivalence; they represent the entire solution space. The chase procedure produces a universal solution in polynomial time. The certain answers of target conjunctive queries can be obtained by evaluation on an arbitrary universal solution
7
7 Composing Schema Mappings Given 12 = (S 1, S 2, 12 ) and 23 = (S 2, S 3, 23 ), derive a schema mapping 13 = (S 1, S 3, 13 ) that is “equivalent” to the successive application of 12 and 23. What is the semantics of composition of schema mappings? What does “equivalent” mean in this context? Schema S 1 Schema S 2 Schema S 3 12 23 13
8
8 Earlier Work Metadata Model Management (Bernstein in CIDR 2003) Composition is one of the fundamental operators However, no semantics is given Composing Mappings among Data Sources (Madhavan & Halevy in VLDB 2003) First to propose a semantics for composition However, their definition is in terms of maintaining the same certain answers relative to a class of queries. Their notion of composition depends on the class of queries and may not be unique up to logical equivalence.
9
9 Semantics of Composition Definition: (FKPT in PODS 2004) A schema mapping 13 is a composition of 12 and 23 if for every instance I 1 of S 1 and every instance I 3 of S 3, 13 if and only if there exists I 2 such that 12 and 23. In other words, Inst( 13 ) = Inst( 12 ) Inst( 23 ), where Inst( ) = { | ST } Thus, 13 defines the composition of the binary relations of the instances associated with 12 and 23. Schema S 1 Schema S 2 Schema S 3 12 23 13
10
10 The Composition of Schema Mappings Fact: If = (S 1, S 3, ) and ’ = (S 1, S 3, ’) are both compositions of 12 and 23, then are ’ are logically equivalent. For this reason: We say that (or ’) is the composition of 12 and 23. We write 12 23 to denote it Definition: The composition query of 12 and 23 is the set Inst( 12 ) Inst( 23 )
11
11 Issues in Composition of Schema Mappings The semantics of composition was the first main issue. Some other key issues: Is the language of finite sets of s-t tgds closed under composition? That is, if 12 and 23 are specified by finite sets of s-t tgds, is 12 23 also specified by a finite set of s-t tgds? If not, what is the “right” language for composing schema mappings? What is the complexity of the associated composition query?
12
12 Composition: Expressibility & Complexity Σ 12 Σ 23 Σ 13 Composition Query finite set of full s-t tgds (x) (x) finite set of s-t tgds (x) y (x, y) finite set of s-t tgds (x) y (x,y) in PTIME finite set of s-t tgds (x) y (x,y) finite set of (full) s-t tgds (x) (x) may not be definable: by any set of s-t tgds; in FO-logic; in Datalog in NP; can be NP-complete
13
13 Enrollments Example 12 : n c (Takes(n,c) s Students(n,s)) n c (Takes(n,c) Takes 1 (n,c)) 23 : n s c (Students(n,s) Takes 1 (n,c) Enrollments(s,c)) Implied by the composition. But what if Alice takes 3 courses ? n c 1 c 2 ( Takes(n,c 1 ) Takes(n,c 2 ) s (Enrollments(s,c 1 ) Enrollments(s,c 2 )) ) AliceMath AliceArt Takes AliceMath AliceArt Takes 1 Alice1234 Students 1234Math 1234Art Enrollments I1I1 I2I2 I3I3
14
14 Enrollments Example - continued There are infinitely many s-t tgds that are implied by the composition. 12 23 = (S 1, S 3, 13 ), where 13 = { … n c 1 … c k ( Takes(n,c 1 ) … Takes(n,c k ) s (Enrollments(s,c 1 ) … Enrollments(s,c k )) ), … } We show that 13 is not equivalent to any finite set of s-t tgds
15
15 Employee Example 12 : e ( Emp(e) m Mgr1(e,m) ) 23 : e m( Mgr1(e,m) Mgr(e,m) ) e ( Mgr1(e,e) SelfMgr(e) ) Theorem: The composition 12 23 is not definable by any finite set of s-t tgds; is not FO-definable; is not definable in Datalog. Emp e Mgr1 e m Mgr e m SelfMgr e
16
16 Second-Order Tgds Definition: Let S be a source schema and T a target schema. A second-order tuple-generating dependency (SO tgd) is a formula of the form: f 1 … f m ( ( x 1 ( 1 1 )) … ( x n ( n n )) ), where Each f i is a function symbol Each i is a conjunction of atoms from S and equalities of terms Each i is a conjunction of atoms from T Theorem: The composition of two finite sets of s-t tgds is always definable by a SO-tgd.
17
17 Employee Example - revisited 12 : e ( Emp(e) m Mgr1(e,m) ) 23 : e m( Mgr1(e,m) Mgr(e,m) ) e ( Mgr1(e,e) SelfMgr(e) ) Fact: The composition is definable by the SO-tgd 13 : f ( e( Emp(e) Mgr(e,f(e) ) e( Emp(e) (e=f(e)) SelfMgr(e) ) )
18
18 Composing SO-Tgds and Data Exchange Theorem: The composition of two SO-tgds is definable by a SO-tgd There is a polynomial-time algorithm for composing SO-tgds The chase procedure can be extended to schema mappings specified by SO-tgds, so that it produces universal solutions in polynomial time For schema mappings specified by SO-tgds, the certain answers of target conjunctive queries are polynomial-time computable.
19
19 Synopsis of Schema Mapping Composition s-t tgds are not closed under composition. SO-tgds form a well-behaved fragment of second-order logic. SO-tgds are closed under composition; thus, they are a “good” language for composing schema mappings. SO-tgds are “chasable”. Polynomial-time data exchange with universal solutions Polynomial-time computation of certain answers of target conjunctive queries. SO-tgds form the basis of the schema-mapping language used in the Criollo metadata management system.
20
20 "The notion of composition of maps leads to the most natural account of fundamental notions of mathematics, from multiplication, addition, and exponentiation, through the basic notions of logic." Conceptual Mathematics by F.W. Lawevere and S.H. Schanuel
21
21 Reduction from 3-Colorability 12 x y (E(x,y) u v (C(x,u) C(y,v))) x y (E(x,y) F(x,y)) 23 x y u v (C(x,u) C(y,v) F(x,y) D(u,v)) Let I 3 = { (r,g), (g,r), (b,r), (r,b), (g,b), (b,g) } Given G=(V, E), let I 1 be the instance over S 1 consisting of the edge relation E of G G is 3-colorable iff Inst( 12 ) Inst( 23 ) [Dawar98] showed that 3-colorability is not expressible in L
22
22 Algorithm Compose( 12, 23 ) Input: Two schema mappings 12 and 23 Output: A schema mapping 13 = 12 23 Step 1: Split up tgds in 12 and 23 C 12 = Emp(e) (Mgr1(e, f(e)) C 23 = Mgr1(e,m) Mgr(e,m) Mgr1(e,e) SelfMgr(e) Step 2: Compose C 12 with C 23 1 : Emp(e 0 ) (e=e 0 ) (m=f(e 0 )) Mgr1(e,m) 2 : Emp(e 0 ) (e=e 0 ) (e=f(e 0 )) SelfMgr(e) Step 3: Construct 13 Return 13 = (S 1, S 3, 13 ) where 13 = f( e 0 e m 1 e 0 e 2 )
23
23 Data Exchange with SO tgds Example Let = (S, T, ST ) where ST is: f( x y (R(x,y) U(x,y,f(x)) x x’ y y’ (R(x,y) R(x’,y’) (f(x)=f(x’)) T(y,y’)) ) abf(a) ac def(d) ab ac de RU bb bc cb cc ee T
24
24 Data Exchange with SO tgds Example Let = (S, T, ST ) where ST is: f( x y (R(x,y) U(x,y,f(x)) x x’ y y’ (R(x,y) R(x’,y’) (f(x)=f(x’)) T(y,y’)) ) abN0N0 acN0N0 deN1N1 ab ac de RU bb bc cb cc ee T
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.