A Normal Form for XML Documents Marcelo Arenas Leonid Libkin Department of Computer Science University of Toronto.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Delta Confidential 1 5/29 – 6/6, 2001 SAP R/3 V4.6c PP Module Order Change Management(OCM)
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Mathematical Preliminaries
Introduction to Algorithms
Constraint Satisfaction Problems
Advanced Piloting Cruise Plot.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Source of slides: Introduction to Automata Theory, Languages and Computation.
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
Importance-Driven Focus of Attention and Meister Eduard Gröller 1 1 Vienna University of Technology, Austria 2 University of Girona, Spain 3 University.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Fourth normal form: 4NF 1. 2 Normal forms desirable forms for relations in DB design eliminate redundancies avoid update anomalies enforce integrity constraints.
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
5NF and other normal forms
1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
Relational data integrity
Dependency preservation, 3NF revisited and BCNF
Normal forms - 1NF, 2NF and 3NF
1 Term 2, 2004, Lecture 2, Normalisation - IntroductionMarian Ursu, Department of Computing, Goldsmiths College Normalisation Introduction.
1GR2-00 GR2 Advanced Computer Graphics AGR Lecture 2 Basic Modelling.
CS 319: Theory of Databases
Dr. Alexandra I. Cristea CS 319: Theory of Databases.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
ZMQS ZMQS
Micro Focus Research 1 As far as youre aware, how does your organization plan to drive business growth over the next three years? (Respondents' first choices)
Slide 1 Copyright © 2004 Glenna R. Shaw & FTC Publishing Background Courtesy of Awesome BackgroundsAwesome BackgroundsDeliberation!Deliberation!
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
© 2011 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. Towards a Model-Based Characterization of Data and Services Integration Paul.
ABC Technology Project
Basic Laws of Electric Circuits Kirchhoff’s Voltage Law
XML and Databases Exercise Session 3 (courtesy of Ghislain Fourny/ETH)
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Squares and Square Root WALK. Solve each problem REVIEW:
Backward Checking Search BT, BM, BJ, CBJ. An example problem Colour each of the 5 nodes, such that if they are adjacent, they take different colours 12.
© 2012 National Heart Foundation of Australia. Slide 2.
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 On c-Vertex Ranking of Graphs Yung-Ling Lai & Yi-Ming Chen National Chiayi University Taiwan.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
1 Unit 1 Kinematics Chapter 1 Day
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
How Cells Obtain Energy from Food
Processing of structured documents Spring 2003, Part 1 Helena Ahonen-Myka.
From Approximative Kernelization to High Fidelity Reductions joint with Michael Fellows Ariel Kulik Frances Rosamond Technion Charles Darwin Univ. Hadas.
Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.
CS 253: Algorithms Chapter 22 Graphs Credit: Dr. George Bebis.
Interval Graph Test.
Well-designed XML Data
An Information-Theoretic Approach to Normal Forms for Relational and XML Data Marcelo Arenas Leonid Libkin University of Toronto.
Well-designed XML Data Marcelo Arenas and Leonid Libkin University of Toronto.
A Normal Form for XML Documents
A Normal Form for XML Documents
Presentation transcript:

A Normal Form for XML Documents Marcelo Arenas Leonid Libkin Department of Computer Science University of Toronto

2 Motivating Example courses course A+B+ cs100cs225 Fox 123Fox

3 Our Goal courses course* student* name, grade name S grade S DTD:Integrity Constraints:, info* name two students with the value must have the same is the key of info.

4 A Non-relational Example DBLP conf Dong2001Jarke

5 Problems to Address Functional dependencies for XML. Normal form for XML documents (XNF). Generalizes BCNF. Algorithm for normalizing XML documents. Implication problem for functional dependencies.

6 Framework: DTDs We do not consider mixed content, IDs and IDREFs. Paths(D): all paths in a DTD D courses.course courses.course.student.name courses.course.student.name.S We distinguish three kinds of elements: attributes strings (S) and element types.

7 Framework: XML Trees v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v0v0... courses name name grade student FoxB+Smith A- SSSS

8 Towards FDs for XML We know how to handle FDs in relational DBs. We need a relational representation of XML. We do it by considering tree tuples: a tree tuple in a DTD D is a mapping t : Paths(D) Vertices Strings { } consistent with D: could be extracted from an XML tree conforming to D.

9 Tree Tuples v1v1 v2v2 v0v0 courses cs100 t(courses) = v 0 t(courses.course) = v 1 = cs100 t(courses.course.student) = v 2 t(p) =, for the remaining paths A tree tuple represents an XML tree:

10 XML Tree: set of Tree Tuples v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v0v0... courses name name grade student FoxB+Smith A- SSSS v1v1 v2v2 courses cs100 student v0v0 v3v3 name grade 123 FoxB+ SS v5v5 v6v6 name grade student 456 Smith A- SS... course

11 Functional Dependencies Expressions of the form: X Y defined over a DTD D, where X, Y are finite non-empty subsets of Paths(D). XML tree T can be tested for satisfaction of X Y if: X Y Paths(T) Paths(D) T X Y if for every pair u, v of tree tuples in T: u.X = v.X and u.X implies u.Y = v.Y

12 FD: Examples University DTD:courses course* student* name, grade Two students with the value must have the same name: courses.course.student.name.S Every student can have at most one grade in every course: { courses.course, } courses.course.student.grade.S

13 Checking FD Satisfaction v1v1 v2v2 v3v3 v4v4 v6v6 v7v7 v8v8 v0v0 courses name name grade student 123 FoxB+FoxA+ SSSS cs225 student v1v1 v2v2 v3v3 v4v4 v0v0 courses name grade student 123 FoxB+ SS v6v6 v7v7 v8v8 name grade 123 FoxA+ SS cs225 student { courses.course, } courses.course.student.grade.S

14 Checking FD Satisfaction v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v0v0 courses name name grade student 123 FoxB+FoxA+ SSSS student v1v1 v2v2 v3v3 v4v4 v0v0 courses name grade student 123 FoxB+ SS v5v5 v6v6 name grade 123 FoxA+ SS student { courses.course, } courses.course.student.grade.S

15 Implication Problem for FD Given a DTD D and a set of functional dependencies { }: (D, ) if for any XML tree T conforming to D and satisfying, it is the case that T (D, ) + = { | (D, ) } Functional dependency is trivial if it is implied by the DTD alone: (D, )

16 XNF: XML Normal Form XML specification: a DTD D and a set of functional dependencies. A Relational DB is in BCNF if for every non- trivial functional dependency X Y in the specification, X is a key. (D, ) is in XNF if: For each non-trivial FD X or X p.S in (D, ) +, X p is in (D, ) +.

17 Back to DBLP DBLP is not in XNF: DBLP.conf.issue (D, ) + DBLP.conf.issue DBLP.conf.issue.article (D, ) + Proposed solution is in XNF.

18 Normalization Algorithm The algorithm applies two transformations until the schema is in XNF. If there is an anomalous FD of the form: DBLP.conf.issue then apply the DBLP example rule. Otherwise: choose a minimal anomalous FD and apply the University example rule.

19 Normalizing XML Documents Theorem The decomposition algorithm terminates and outputs a specification in XNF. It does not lose information: UnnormalizedNormalized XML documentXML Document Q 1, Q 2 are XQuery core queries. It works even if we cannot compute (D, ) +. If we know (D, ) +, the output is better. Q1Q1 Q2Q2

20 Implication Problem (contd) Typically, regular expressions used in DTDs are rather simple. Trivial regular expression: s 1, s 2, …, s n Each s i is either one of a i, a i ?, a i + or a i * For i j, a i a j D is a simple DTD if all its productions use permutations of trivial regular expressions. Example: (a | b)* is a permutation of a*, b*

21 Implication Problem (contd) Theorem For simple DTDs: The implication problem for FDs is solvable in O(n 2 ). Testing if a specification is in XNF can be done in O(n 3 ). Other results: There is a larger class of DTDs for which these problems are tractable. There is an even larger class of DTDs for which they are coNP-complete.

22 Final Comments The normalization algorithm can be improved in various ways. Why XNF? Theorem XNF generalizes BCNF and a normal form for nested relations (called NNF) when those are coded as XML documents. A complete classification of the complexity of the implication problem for FDs remains open. Implementation.

Backup Slides

24 Normalization Algorithm We consider FDs of the form: {q, p 1, …, p n } p where n 0 and q ends with an element type. The algorithm applies two transformations until the schema is in XNF. If there is an anomalous FD with n = 0: DBLP.conf.issue then apply the DBLP example rule. Otherwise: choose a minimal anomalous FD and apply the University example rule.