Presentation is loading. Please wait.

Presentation is loading. Please wait.

Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Similar presentations


Presentation on theme: "Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored."— Presentation transcript:

1 Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored in part by the National Science Foundation under grant number IIS-0083127

2 “Good” ~ XNF Motivation  XML is for Information Exchange.  What constitutes a “good” XML document for Information Exchange? Principles  XML Document Properties A Few Large Trees. No Redundancy.  Information Modeling Create a conceptual model. Generate “good” XML.  XNF Align XML trees with natural hierarchies in the data. Base redundancy elimination on FDs, naturally occurring MVDs, and inclusion dependencies (IDs).

3 Example: XNF F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

4 Example: More Trees Than Necessary S P F D H F H ( S P F ( H )* )* ( D ( F ( H )* )* Pat PhD Kelly Hiking CS Kelly Hiking Skiing Skiing Tracy MS Kelly Hiking Math Lynn Sailing Sailing Chris MS Kelly

5 Example: Redundancy H S P ( H ( S P )* )* Hiking Pat PhD Tracy MS Skiing Pat PhD Sailing Tracy MS SHFSHF ( S ( H ( F )* )* )* Pat Hiking Kelly Skiing Kelly Tracy Hiking Kelly Sailing Lynn Chris

6 XNF → XML F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

7 Naive DTD Generation F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Grad_Student, Program, ( Hobby )* )* ( Hobby )* )*, … ]>

8 Naive DTD Generation F D S P H H <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Graduate_Student, Program, ( Hobby )* )* ( Hobby )* )*, … ]> Kelly CS Pat PhD Hiking Skiing Tracy MS Hiking Sailing Chris MS Hiking Skiing Lynn Sailing

9 Sophisticated DTD Generation F D S P H H Faculty Members Grad_Students Hobbies <!DOCTYPE University[ <!ELEMENT Department (#PCDATA) … ]> CS PhD Hiking Skiing …

10 → XNF F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing How do we generate XNF scheme-trees?

11 Alg. 1 F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

12 Alg. 1: Start F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges) 1 2 3 1 2

13 Alg. 1: Start F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertix: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges) 1 2 3 1 2

14 Alg. 1: Grow F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

15 Alg. 1: Grow F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges) √

16 Alg. 1: Grow F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges) √ √

17 Algorithm 1 Yields XNF Theorem. Given a canonical, binary conceptual-model (CM) hypergraph H, Algorithm 1 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996) What is this restriction? Can we relax this constraint? Can we enlarge the set of dependencies?

18 Non-Canonical CM Hypergraphs If the input CM hypergraph has redundancy, Algorithm 1 generates scheme trees with potential redundancy. D F S S P H H The set of students must be the same for every department. F D S P D H H A faculty member’s department is the same as the faculty member’s students’ department. A CM hypergraph is canonical if: (1)No edge is redundant, (2)No edge is losslessly decomposable, and (3) No vertex is redundant.

19 Non-Binary CM Hypergraphs Not Canonical: Decomposable

20 Generating Scheme Trees from Non-Binary CM Hypergraphs N A M C CDTCDT D T oror … A P

21 Alg. 2 N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

22 Alg. 2: Start N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges √

23 Alg. 2: Grow N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

24 Alg. 2: Start Again & Grow N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges √

25 Alg. 2: Start Again and Grow N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges √

26 Algorithm 2 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 2 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)

27 Inclusion Dependencies (IDs) optional connections

28 Inclusion Dependencies (IDs) This constraint makes this vertex redundant.

29 Canonical CM Hypergraph with IDs

30 Generating Scheme Trees from Canonical CM Hypergraph with IDs F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

31 Alg. 3: Collapse F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

32 Alg. 3: Collapse F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

33 Alg. 3: Execute F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

34 Algorithm 3 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 3 generates an XNF scheme-tree forest with respect to the FDs, MVDs, and IDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)

35 Conclusions XNF ~ “Good” XML  No redundancy  As few trees as possible Elegant DTD generation Algorithms to generate XNF Proofs of correctness embley@cs.byu.edu mokw@email.uah.edu


Download ppt "Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored."

Similar presentations


Ads by Google