Download presentation
Presentation is loading. Please wait.
Published byGage Sandford Modified over 10 years ago
1
A Normal Form for XML Documents Marcelo Arenas Leonid Libkin Department of Computer Science University of Toronto
2
2 Motivating Example courses course info @cno student @snonamegrade name@sno studentname@sno... 123 A+B+ cs100cs225 Fox 123Fox
3
3 Our Goal courses course* course @cno, student* student @sno, name, grade name S grade S DTD:Integrity Constraints:, info* info @sno, name two students with the same @sno value must have the same name. @sno is the key of info.
4
4 A Non-relational Example DBLP conf titleissue article @yeartitle @year ICDT @year author@yeartitleauthor 1999 Dong2001Jarke 2001...
5
5 Problems to Address Functional dependencies for XML. Normal form for XML documents (XNF). Generalizes BCNF. Algorithm for normalizing XML documents. Implication problem for functional dependencies.
6
6 Framework: DTDs We do not consider mixed content, IDs and IDREFs. Paths(D): all paths in a DTD D courses.course courses.course.@cno courses.course.student.name courses.course.student.name.S We distinguish three kinds of elements: attributes (@), strings (S) and element types.
7
7 Framework: XML Trees v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v0v0... courses course @cno cs100 @sno name grade@sno name grade student 123456 FoxB+Smith A- SSSS
8
8 Towards FDs for XML We know how to handle FDs in relational DBs. We need a relational representation of XML. We do it by considering tree tuples: a tree tuple in a DTD D is a mapping t : Paths(D) Vertices Strings { } consistent with D: could be extracted from an XML tree conforming to D.
9
9 Tree Tuples v1v1 v2v2 v0v0 courses course @cnostudent cs100 t(courses) = v 0 t(courses.course) = v 1 t(courses.course.@cno) = cs100 t(courses.course.student) = v 2 t(p) =, for the remaining paths A tree tuple represents an XML tree:
10
10 XML Tree: set of Tree Tuples v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v0v0... courses course @cno cs100 @sno name grade@sno name grade student 123456 FoxB+Smith A- SSSS v1v1 v2v2 courses course @cno cs100 student v0v0 v3v3 v4v4 @sno name grade 123 FoxB+ SS v5v5 v6v6 v7v7 @sno name grade student 456 Smith A- SS... course
11
11 Functional Dependencies Expressions of the form: X Y defined over a DTD D, where X, Y are finite non-empty subsets of Paths(D). XML tree T can be tested for satisfaction of X Y if: X Y Paths(T) Paths(D) T X Y if for every pair u, v of tree tuples in T: u.X = v.X and u.X implies u.Y = v.Y
12
12 FD: Examples University DTD:courses course* course @cno, student* student @sno, name, grade Two students with the same @sno value must have the same name: courses.course.student.@sno courses.course.student.name.S Every student can have at most one grade in every course: { courses.course, courses.course.student.@sno } courses.course.student.grade.S
13
13 Checking FD Satisfaction v1v1 v2v2 v3v3 v4v4 v6v6 v7v7 v8v8 v0v0 courses course @cno cs100 @sno name grade@sno name grade student 123 FoxB+FoxA+ SSSS v5v5 @cno cs225 student v1v1 v2v2 v3v3 v4v4 v0v0 courses course @cno cs100 @sno name grade student 123 FoxB+ SS v6v6 v7v7 v8v8 course @sno name grade 123 FoxA+ SS v5v5 @cno cs225 student { courses.course, courses.course.student.@sno } courses.course.student.grade.S
14
14 Checking FD Satisfaction v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v0v0 courses course @cno cs100 @sno name grade@sno name grade student 123 FoxB+FoxA+ SSSS student v1v1 v2v2 v3v3 v4v4 v0v0 courses course @cno cs100 @sno name grade student 123 FoxB+ SS v5v5 v6v6 v7v7 @sno name grade 123 FoxA+ SS student { courses.course, courses.course.student.@sno } courses.course.student.grade.S
15
15 Implication Problem for FD Given a DTD D and a set of functional dependencies { }: (D, ) if for any XML tree T conforming to D and satisfying, it is the case that T (D, ) + = { | (D, ) } Functional dependency is trivial if it is implied by the DTD alone: (D, )
16
16 XNF: XML Normal Form XML specification: a DTD D and a set of functional dependencies. A Relational DB is in BCNF if for every non- trivial functional dependency X Y in the specification, X is a key. (D, ) is in XNF if: For each non-trivial FD X p.@l or X p.S in (D, ) +, X p is in (D, ) +.
17
17 Back to DBLP DBLP is not in XNF: DBLP.conf.issue DBLP.conf.issue.article.@year (D, ) + DBLP.conf.issue DBLP.conf.issue.article (D, ) + Proposed solution is in XNF.
18
18 Normalization Algorithm The algorithm applies two transformations until the schema is in XNF. If there is an anomalous FD of the form: DBLP.conf.issue DBLP.conf.issue.article.@year then apply the DBLP example rule. Otherwise: choose a minimal anomalous FD and apply the University example rule.
19
19 Normalizing XML Documents Theorem The decomposition algorithm terminates and outputs a specification in XNF. It does not lose information: UnnormalizedNormalized XML documentXML Document Q 1, Q 2 are XQuery core queries. It works even if we cannot compute (D, ) +. If we know (D, ) +, the output is better. Q1Q1 Q2Q2
20
20 Implication Problem (contd) Typically, regular expressions used in DTDs are rather simple. Trivial regular expression: s 1, s 2, …, s n Each s i is either one of a i, a i ?, a i + or a i * For i j, a i a j D is a simple DTD if all its productions use permutations of trivial regular expressions. Example: (a | b)* is a permutation of a*, b*
21
21 Implication Problem (contd) Theorem For simple DTDs: The implication problem for FDs is solvable in O(n 2 ). Testing if a specification is in XNF can be done in O(n 3 ). Other results: There is a larger class of DTDs for which these problems are tractable. There is an even larger class of DTDs for which they are coNP-complete.
22
22 Final Comments The normalization algorithm can be improved in various ways. Why XNF? Theorem XNF generalizes BCNF and a normal form for nested relations (called NNF) when those are coded as XML documents. A complete classification of the complexity of the implication problem for FDs remains open. Implementation.
23
Backup Slides
24
24 Normalization Algorithm We consider FDs of the form: {q, p 1.@l 1, …, p n.@l n } p where n 0 and q ends with an element type. The algorithm applies two transformations until the schema is in XNF. If there is an anomalous FD with n = 0: DBLP.conf.issue DBLP.conf.issue.article.@year then apply the DBLP example rule. Otherwise: choose a minimal anomalous FD and apply the University example rule.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.