Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards the Preservation of Keys in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and.

Similar presentations


Presentation on theme: "Towards the Preservation of Keys in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and."— Presentation transcript:

1 Towards the Preservation of Keys in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and Information Science University of South Australia

2 Outline of the Presentation  Motivation for XML Data Transformation with XML keys  How to define XML keys  How to transform XML keys  Whether transformed XML keys are valid and preserved [Key Preservation]  If XML key is not preserved, how to capture XML key as XML functional dependency (XFD) [Key Transition]

3 Data Transformations for Integration Relational  Relational Relational  XML XML  Relational XML  XML

4 Data Transformations for Integration with Constraints Relational  Relational Relational  XML XML  Relational XML  XML Constraint (keys, functional dependencies etc.) preservations (a.k.a propagations) are well studied Little investigated! Mostly structural transformations of schema and data ignoring constraints! Reason: document-centric approach rather than data-centric approach of XML

5 Motivating Example 1 Source DTD D a : Target DTD D b : Nested Flat-like Unnest(sid) Operation

6 VrVr V1V1 V2V2 V3V3 V4V4 V5V5 V9V9 V 10 V 11 enroll dept dname cid sid cid sid Physics Chemistry Phys01001Chem02 V6V6 V7V7 V8V8 sidcid sid 002 Phys02003 004 V 12 002 sid VrVr V1V1 V2V2 V3V3 V4V4 V5V5 V 10 V 12 V 13 enroll dept dname cid sid cid Physics Chemistry Phys01001 Chem02 V6V6 V7V7 V8V8 cidsid cid 002Phys02003 V 14 002 sid V9V9 V 11 cid 004 Chem02 Phys01 XML tree T a XML tree T b Unnest(sid)

7 XML key consideration D a : D b : Unnest(sid) K is valid on D a K is satisfied by T a Is K is transformed?: NO Is K is valid on D b :YES Is K is satisfied by T b ?: NO Unnest(sid) K(enroll/dept,{cid})

8 VrVr V1V1 V2V2 V3V3 V4V4 V5V5 V9V9 V 10 V 11 enroll dept dname cid sid cid sid Physics Chemistry Phys01001Chem02 V6V6 V7V7 V8V8 sidcid sid 002 Phys02003 004 V 12 002 sid VrVr V1V1 V2V2 V3V3 V4V4 V5V5 V 10 V 12 V 13 enroll dept dname cid sid cid Physics Chemistry Phys01001 Chem02 V6V6 V7V7 V8V8 cidsid cid 002Phys02003 V 14 002 sid V9V9 V 11 cid 004 Chem02 Phys01 XML tree T a XML tree T b duplicates distinct

9 Observation Observation 1: An XML key may not be preserved after transformation.

10 Motivating Example 2 Target DTD D b : Source DTD D a : expand operation replacing (cid,sid+) with course K(enroll/dept,{cid}) Vaild and satisfied K(enroll/dept/course,{cid}) Is K Valid? Answer: NO Reason: Path is transformed Suggestion: Needs transformation of key Satisfactions?: May be or not, need to check

11 Expanding (cid,sid+) with new element course

12 Observation Observation 2: How XML keys should be transformed needs to be defined when DTD is transformed

13 Contributing on Defining XML keys on DTD and their satisfactions Rules for transforming XML keys using important operations Key preservation [key to key] Defining XML functional dependencies (XFDs) and their satisfactions Key transition [key to XFD]

14 Contributing on Defining XML keys on DTD and their satisfactions Defined on schema definition DTD Use a novel technique to produce semantically correct values for key satisfactions Can capture some properties of relational key on the sense of value completeness and disallowing redundant values Can capture ID properties of DTD definition Improvement of key notion in XML Schema

15 XML Key Given a DTD D = (EN, ,  ), an XML key on D is defined as K(Q,{P 1,…,P l }), where l>= 0, Q is a complete path on D called the selector, and {P 1,..., P i,…, P l } (often denoted by P) is a set of fields where each P i is defined as:, where " U " means disjunction and p ij (j [1,…,n i ]) is a simple path on D,  (last(p ij ))=Str, and has the following syntax:  p ij =seq  seq=e | e/seq where ; Q/p ij is a complete path.

16 Example of XML keys Source DTD D a : K(enroll/dept,{cid}) selector=enroll/dept field={cid}  (cid)=#PCDATA means Str  (last(cid)) =Str K(enroll/dept,{cid,sid}) selector=enroll/dept fields={cid,sid}  (last(cid))=  (last(sid))= Str

17 Some definitions for XML key satisfactions [ P-tuple ] Given a key K(Q,{P 1,...,P l }) and a tree T, let T Q be a tree in T. A P-tuple in T Q is a tuple of pair-wise close sub-trees. By pair-wise close, we mean tuples in the same minimal hedge A P-tuple is complete if We call T P =T last(P) the prefixed format tree. For example P=enroll/dname. Then T P =T dname

18 Proposed techniques [Hedge] Hedge is a consecutive sequence of primary sub-trees of the same node. [Minimal structure] Given a DTD definition  (e) and two elements e 1 and e 2 in  (e), the minimal structure g of e 1 and e 2 in  (e) is the pair of brackets that encloses e 1 and e 2 and any other structure in g does not enclose both. [Minimal Hedge] Given a hedge H of  (e), a minimal hedge of e 1 and e 2 is one of H g s in H.

19 Example of minimal structure, minimal hedge and P-tuple VrVr V1V1 V2V2 V3V3 V4V4 V5V5 V9V9 V 10 V 11 enroll dept dname cid sid cid sid Physics Chemistry Phys01001Chem02 V6V6 V7V7 V8V8 sidcid sid 002 Phys02003 004 V 12 TaTa D a : K(enroll/dept,{cid,sid}) P 1 =cid, P 2 =sid Minimal structure is g=(cid,sid+) Minimal hedges are: H 1 =v 4 v 5 v 6, H 2 =v 7 v 8 under node v 1 and H 3 =v 10 v 11 v 12 under node v 2 P-tuples are: F 1 =v 4 v 5, F 2 =v 4 v 6 for hedge H 1, F 3 =v 7 v 8 for hedge H 2 for node v 1 and F 4 =v 10 v 11, F 5 =V 10 v 12 for hedge H 3 for node v 2 002 sid H1gH1g H2gH2g H3gH3g

20 Produced P-tuples

21 XML Key Satisfaction An XML tree satisfies a Key K(Q,{P 1,…P l }) if the followings are held: If {P 1,…P l }=  then T satisfies K iff there exists one and only one T Q in T; Else (exists at least one P-tuple in T Q ) (every P-tuple in T Q is complete) (every P-tuple in T Q is value distinct) (exists two P-tuples ) This requires that P-tuples in different T Q must be value distinct.

22 Checking satisfaction of key T Q =T v1 T Q =T v2

23 Contributing on Rules for transformation on key definition A key is transformed if any path in the key is transformed. After the transformation, key needs to be checked whether it is valid on target schema. If a key is not transformed, it is valid on target DTD

24 Transformation on key Unnest operation: g=(g 1 xg 2 +)+  g=(g 1 xg 2 )+ Example: (cid,sid+)+  (cid,sid)+ It makes the nested structure to flat- like structure No path transformation No change in the key definition

25 Transformation on key Nest operation: g=(g 1 xg 2 )+  g=(g 1 xg 2 +)+ Example: (cid,sid)+  (cid,sid+)+ It makes the flat-like structure to nested structure No path transformation No change in the key definition

26 Transformation on key Expand operation: g=(g 1 xg 2 +)+  g=(g new )+, g new =g 1 xg 2 + Example: g=(cid,sid+)+  g=(course+), g new =(cid,sid+)+ It pushes the structure to one level down Path is transformed in DTD and so in key Needs some rules to transform key correctly

27 Transformation on key Transformation rules on key using expand: Depends where the new element is added in the key paths (either selector or field) K(enroll/dept,{cid,sid}) K(enroll/dept/course,{cid,sid})K(enroll/dept,{course/cid,course/sid}) expand((cid,sid+), course) K(enroll/dept,{cid,sid}) expand(sid+, stIDs) K(enroll/dept,{cid,stIDS/sid}) D a :

28 Transformation on key Collapse operation: g=(g coll )+, g coll =g 1 xg 2 +  g=(g 1 xg 2 +)+ Example: g=(dept+), g dept =(cid,sid+)  g=(cid,sid+)+ It moves the structure to one level up Path is transformed in DTD and so in key Needs some rules to transform key correctly

29 Transformation on key Transformation rules on key using collapse: Depends which element is deleted in the key paths (either selector or field) K(enroll/dept,{cid,sid}) K(enroll,{cid,sid}) collapse(dept) K(enroll,{dept/cid,dept/sid}) K(enroll,{cid,sid}) D a : collapse(dept)

30 Contributing on [Key preservation] Given a source DTD, its conforming document, a valid key that is satisfied by the document, if the transformed key is valid on target DTD and is satisfied by the target document then key is said to be preserved by the transformation.

31 Key preserving properties of operations Preserving: Nest and collapse Preserving with necessary and sufficient conditions: Unnest and Expand

32 Theorem: Unnest operator is key preserving if some key fields don’t cross g 1.

33 Example to explain Unnest(sid) However if the key is K(enroll,{cid,sid}), then Key is preserved (cid,sid+)+ g1 g2 K(enroll/dept,{cid})

34 Theorem: Expand operator is key preserving if when the selector is transformed, then every tree for selector has a P-tuple.

35 Example to explain No duplicate cid’s are produced distinct K(enroll/dept,{cid}) K(enroll/dept/course,{cid}) K(enroll/dept,{course/cid})

36 Contributing on [Key transition] Given a source DTD, its conforming document, a valid key that is satisfied by the document, if the transformed key is valid on target DTD and is not satisfied by the target document but if key is transformed to XFD and is satisfied by the target document then we say XML key is transited as XFD.

37 XML functional dependency (XFD) Given a DTD D = (EN, ,  ), an XML key on D is defined as  (S, P  Q), where S is a complete path on D called the scope, P is a set of simple paths P={p 1,...,p i,…,p l } called determinant or LHS, Q is a simple path or empty path called dependent or RHS, and S/P and S/Q are complete paths. If Q= , then XFD  (S, P   ) implies that P  last(S) meaning that P determines S

38 Tuple for XFD [ Tuple ] Given an XFD  (S,P  Q) and a tree T,let T S be a tree in T. A tuple in T S is a tuple of pair-wise close sub-trees. By pair-wise close, we mean tuples in the same minimal hedge By P-tuple, we mean the tuple for paths P By Q-tuple, we mean the tuple for path Q A P-tuple is complete if

39 XFD satisfactions An XML tree satisfies an XFD  (S, P  Q) if the followings are held: If Q=  then is complete; Else are complete. For every pair of tuples F 1 [P] and F 2 [Q] in T S, if F 1 [P]= v F 1 [Q], then F 1 [Q]= v F 2 [Q].

40 Key transition algorithm 1: check=CheckKeyTransformation(k, UnNest); 2: if check=TRUE then 3: TransformKeyToXFD(k); 4: end if 5: if target T satisfies the XFD Φ then 6: return Φ and ”KeyTransited”; 7: end if

41 Function CheckKeyTransformation(k, UnNest) 1: if g 1 crossing any P i in [P 1, · · ·, P n ] at an element e where e in g 1 and e in P i then 2: return TRUE; 3: else 4: return FALSE; 5: end if

42 Function TransformKeyToXFD(k) 1: Φ[S] := k[Q]; 2: for all i such that 1 ≤ i ≤ n do 3: Φ[P i ] := k[P i ]; 4: end for 5: Φ[Q] :=  ; 6: return Φ(S, {P} → Q);

43 VrVr V1V1 V2V2 V3V3 V4V4 V5V5 V9V9 V 10 V 11 enroll dept dname cid sid cid sid Physics Chemistry Phys01001Chem02 V6V6 V7V7 V8V8 sidcid sid 002 Phys02003 004 V 12 002 sid VrVr V1V1 V2V2 V3V3 V4V4 V5V5 V 10 V 12 V 13 enroll dept dname cid sid cid Physics Chemistry Phys01001 Chem02 V6V6 V7V7 V8V8 cidsid cid 002Phys02003 V 14 002 sid V9V9 V 11 cid 004 Chem02 Phys01 XML tree T a XML tree T b duplicates distinct K(enroll/dept,{cid}) Φ(enroll/dept,{cid}   )

44 Theorem: An XML key on source DTD can only be transited to an XFD on the target DTD if the key is satisfied by the conforming source document.

45 Talked on XML data transformation with keys A new definition for XML keys Transformation rules for keys Key preservations Key transition Also a new definition for XML functional dependency (XFD)

46 our papers “On Defining Keys for XML”, IEEE cit2008, Database and Data Mining Workshop, Sydney “Key Preserving P2P Data Transformation for XML”,LNCS, DBISP2P,2008(VLDB Workshop), Auckland, New Zealand “Transition of keys in XML Data Transformation”, IEEE CSA2008, Hobart. “On Defining Functional Dependency for XML”, IEEE IWSCA 2008, Korea

47 Other research issues Already done “Preserving functional dependency in XML data transformation”, LNCS, ADBIS 2008, Finland. Preserving Inclusion dependency in XML data transformation Future work Adaptation of constraints in XML data integration Detecting conflicts between source constraints and target constraints in XML settings Checking Validations and satisfactions of the constraints XML keys, XFDs and XML inclusion dependencies (XID) Performances in XML data transformation and Integrations with constraints

48 Thank You Questions


Download ppt "Towards the Preservation of Keys in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and."

Similar presentations


Ads by Google