Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Slides:



Advertisements
Similar presentations
Chapter 11 Trees Graphs III (Trees, MSTs) Reading: Epp Chp 11.5, 11.6.
Advertisements

Chapter 23 Minimum Spanning Tree
Property testing of Tree Regular Languages Frédéric Magniez, LRI, CNRS Michel de Rougemont, LRI, University Paris II.
Solving IPs – Cutting Plane Algorithm General Idea: Begin by solving the LP relaxation of the IP problem. If the LP relaxation results in an integer solution,
Decision Maths 1 Sorting Algorithms Bubble Sort A V Ali : 1.Start at the beginning of the data set. 2.Compare the first two elements,
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Comp 122, Spring 2004 Greedy Algorithms. greedy - 2 Lin / Devi Comp 122, Fall 2003 Overview  Like dynamic programming, used to solve optimization problems.
Improved Approximation Algorithms for the Spanning Star Forest Problem Prasad Raghavendra Ning ChenC. Thach Nguyen Atri Rudra Gyanit Singh University of.
Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.
1 Minimum Spanning Tree Prim-Jarnik algorithm Kruskal algorithm.
Graphs III (Trees, MSTs) (Chp 11.5, 11.6)
9/12/06CS 6463: AT Computational Geometry1 CS 6463: AT Computational Geometry Fall 2006 Triangulations and Guarding Art Galleries II Carola Wenk.
Zoo-Keeper’s Problem An O(nlogn) algorithm for the zoo-keeper’s problem Sergei Bespamyatnikh Computational Geometry 24 (2003), pp th CGC Workshop.
Data Structures and Algorithms1 Trees The definitions for this presentation are from from: Corman, et. al., Introduction to Algorithms (MIT Press), Chapter.
17. Computational Geometry Chapter 7 Voronoi Diagrams.
NestedRelations: 1 Nested Relations Flat schemas often have replicated data values in their relations. Nested schemas allow us to collapse some of these.
Chapter 23 Minimum Spanning Trees
XNF: 1 XML and NNF A Standard Form for XML Documents (XNF) Properties –As few hierarchical trees as possible –No redundant data values in any tree Method.
CSE 780 Algorithms Advanced Algorithms Minimum spanning tree Generic algorithm Kruskal’s algorithm Prim’s algorithm.
What is the next line of the proof? a). Let G be a graph with k vertices. b). Assume the theorem holds for all graphs with k+1 vertices. c). Let G be a.
CSE 780 Algorithms Advanced Algorithms Graph Alg. DFS Topological sort.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Conceptual XML for Systems Analysis Reema Al-Kamha PhD Dissertation Defense Supported by NSF.
Graphs and Trees This handout: Trees Minimum Spanning Tree Problem.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Chapter Nested Schemes Flat schemes often have replicated data values. Nested schemes allow us to collapse some of these replicated data values.
Is the following graph Hamiltonian- connected from vertex v? a). Yes b). No c). I have absolutely no idea v.
XNF-1 XML and NNF A Standard Form for XML Documents (XNF) Properties –As few hierarchical trees as possible –No redundant data values in any tree Method.
KNURE, Software department, Ph , N.V. Bilous Faculty of computer sciences Software department, KNURE The trees.
9.8 Graph Coloring. Coloring Goal: Pick as few colors as possible so that two adjacent regions never have the same color. See handout.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Minimum Spanning Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Design and Analysis of Computer Algorithm September 10, Design and Analysis of Computer Algorithm Lecture 5-2 Pradondet Nilagupta Department of Computer.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
Week -7-8 Topic - Graph Algorithms CSE – 5311 Prepared by:- Sushruth Puttaswamy Lekhendro Lisham.
Lecture 12-2: Introduction to Computer Algorithms beyond Search & Sort.
Binary Decision Diagrams (BDDs)
Graph Dr. Bernard Chen Ph.D. University of Central Arkansas.
Finding Optimal Probabilistic Generators for XML Collections Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, Pierre Senellart BDA 2011.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Chapter 8 PD-Method and Local Ratio (4) Local ratio This ppt is editored from a ppt of Reuven Bar-Yehuda. Reuven Bar-Yehuda.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.
Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David.
Eulerian Paths and Cycles. What is a Eulerian Path Given an graph. Find a path which uses every edge exactly once. This path is called an Eulerian Path.
CS 721 Project Implementation of Hypergraph Edge Covering Algorithms By David Leung ( )
1 3/21/2016 MATH 224 – Discrete Mathematics First we determine if a graph is connected.
Algorithm Design and Analysis June 11, Algorithm Design and Analysis Pradondet Nilagupta Department of Computer Engineering This lecture note.
Trees.
Conceptual Modeling for XML Data
Mathematical Foundations of AI
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008
Great Theoretical Ideas in Computer Science
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
A Normal Form for XML Documents
Discrete Mathematics for Computer Science
EMIS 8373: Integer Programming
Algorithms (2IL15) – Lecture 2
Data Structures – LECTURE 13 Minumum spanning trees
CS 583 Analysis of Algorithms
Kruskal’s Minimum Spanning Tree Algorithm
Trees L Al-zaid Math1101.
Minimum Spanning Tree.
Minimum Spanning Trees
Fundamental Structures of Computer Science II
Invitation to Computer Science 5th Edition
Learning a hidden graph with adaptive algorithms
Presentation transcript:

Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored in part by the National Science Foundation under grant number IIS

“Good” ~ XNF Motivation  XML is for Information Exchange.  What constitutes a “good” XML document for Information Exchange? Principles  XML Document Properties A Few Large Trees. No Redundancy.  Information Modeling Create a conceptual model. Generate “good” XML.  XNF Align XML trees with natural hierarchies in the data. Base redundancy elimination on FDs, naturally occurring MVDs, and inclusion dependencies (IDs).

Example: XNF F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

Example: More Trees Than Necessary S P F D H F H ( S P F ( H )* )* ( D ( F ( H )* )* Pat PhD Kelly Hiking CS Kelly Hiking Skiing Skiing Tracy MS Kelly Hiking Math Lynn Sailing Sailing Chris MS Kelly

Example: Redundancy H S P ( H ( S P )* )* Hiking Pat PhD Tracy MS Skiing Pat PhD Sailing Tracy MS SHFSHF ( S ( H ( F )* )* )* Pat Hiking Kelly Skiing Kelly Tracy Hiking Kelly Sailing Lynn Chris

XNF → XML F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

Naive DTD Generation F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Grad_Student, Program, ( Hobby )* )* ( Hobby )* )*, … ]>

Naive DTD Generation F D S P H H <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Graduate_Student, Program, ( Hobby )* )* ( Hobby )* )*, … ]> Kelly CS Pat PhD Hiking Skiing Tracy MS Hiking Sailing Chris MS Hiking Skiing Lynn Sailing

Sophisticated DTD Generation F D S P H H Faculty Members Grad_Students Hobbies <!DOCTYPE University[ <!ELEMENT Department (#PCDATA) … ]> CS PhD Hiking Skiing …

→ XNF F D S P H H ( F D ( S P ( H )* )* ( H )* )* Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing How do we generate XNF scheme-trees?

Alg. 1 F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Alg. 1: Start F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Alg. 1: Start F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertix: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Alg. 1: Grow F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Alg. 1: Grow F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges) √

Alg. 1: Grow F D S P H H How do we generate XNF scheme-trees? Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges) √ √

Algorithm 1 Yields XNF Theorem. Given a canonical, binary conceptual-model (CM) hypergraph H, Algorithm 1 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996) What is this restriction? Can we relax this constraint? Can we enlarge the set of dependencies?

Non-Canonical CM Hypergraphs If the input CM hypergraph has redundancy, Algorithm 1 generates scheme trees with potential redundancy. D F S S P H H The set of students must be the same for every department. F D S P D H H A faculty member’s department is the same as the faculty member’s students’ department. A CM hypergraph is canonical if: (1)No edge is redundant, (2)No edge is losslessly decomposable, and (3) No vertex is redundant.

Non-Binary CM Hypergraphs Not Canonical: Decomposable

Generating Scheme Trees from Non-Binary CM Hypergraphs N A M C CDTCDT D T oror … A P

Alg. 2 N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

Alg. 2: Start N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges √

Alg. 2: Grow N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

Alg. 2: Start Again & Grow N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges √

Alg. 2: Start Again and Grow N A M C CDTCDT A P Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges √

Algorithm 2 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 2 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)

Inclusion Dependencies (IDs) optional connections

Inclusion Dependencies (IDs) This constraint makes this vertex redundant.

Canonical CM Hypergraph with IDs

Generating Scheme Trees from Canonical CM Hypergraph with IDs F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

Alg. 3: Collapse F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

Alg. 3: Collapse F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

Alg. 3: Execute F D S P H F H S Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

Algorithm 3 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 3 generates an XNF scheme-tree forest with respect to the FDs, MVDs, and IDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)

Conclusions XNF ~ “Good” XML  No redundancy  As few trees as possible Elegant DTD generation Algorithms to generate XNF Proofs of correctness