1 Keys for XML Peter Buneman, Susan Davidson, Wenfei Fan Carmem Hara, Wang-Chiew Tan Carmem Hara, Wang-Chiew Tan University of Pennsylvania Temple University.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping.
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
Normalisation The theory of Relational Database Design.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
1 COS 425: Database and Information Management Systems XML and information exchange.
Keys For XML Peter Buneman Susan Davidson Wenfei Fan Carmem Hara Wang Chiew Tan.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
Managing XML and Semistructured Data Lecture 14: Constraints and Keys Prof. Dan Suciu Spring 2001.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
4/20/2017.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
1 XML Schemas. 2 Useful Links Schema tutorial links:
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
Concepts and Terminology Introduction to Database.
THE RELATIONAL DATA MODEL CHAPTER 3 (6/E) CHAPTER 5 (5/E) 1.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Module 3: The Relational Model.  Overview Terminology Relational Data Structure Mathematical Relations Database Relations Relational Keys Relational.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Identity Constraints.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Database Systems Part VII: XML Querying Software School of Hunan University
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Home Work. Design Principles and Weak Entity Sets.
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang,
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Modified Slides from Dr.Peter Buneman 1 XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
COP Introduction to Database Structures
CS 480: Database Systems Lecture 25 March 15, 2013.
Tables and Their Characteristics
Lecture 2 The Relational Model
Management of XML and Semistructured Data
Managing XML and Semistructured Data
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
DTD (Document Type Definition)
XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type – thus constraints are often the only.
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Chapter 3: Multivalued Dependencies
Presentation transcript:

1 Keys for XML Peter Buneman, Susan Davidson, Wenfei Fan Carmem Hara, Wang-Chiew Tan Carmem Hara, Wang-Chiew Tan University of Pennsylvania Temple University Universidade Federal do Parana, Brazil Jonathan Mamou

Keys for XML2 Keys in DB design Essential part of DB design  Invariant connection between the tuple and the real-world entity  Important in update –Guarantee that an update will affect precisely one tuple  …

Keys for XML3 Keys in XML  XML documents are to do – at least - double duty as databases  Examination of existing DTDs reveals a number of cases in which some element or attribute is specified as a “unique identifier” in comments  Various key specifications in XML Standard, XML Data, XML Schema

Keys for XML4 Components: XML vs. relational DB Smith Math B Jones Math A+ Smith CS A- gradecourseName BMathSmith A+MathJones A-CSSmith

Keys for XML5 Components: XML vs. relational DB (cont’d) DB  If 2 tuples agree on their name and course attributes they agree everywhere XML  If 2 elements agree on the name and course subelements then they are the same element  Node identification?  Equality?

Keys for XML6 Nodes - Value Equality  name key for person nodes  name may have a complex structure: first- name, last-name dept... db company government company employee name firstNamelastName “Bill” “Clinton” “Bill Clinton”

Keys for XML7 Hierarchical structure  Hierarchically structured databases, e.g. scientific data formats  Top-level key to identify components of a document  Secondary key to identify sub-components –Book/chapter/section –Bible/book/chapter/verse

Keys for XML8 Absolute and relative keys In an XML document, how to identify  A book?  a chapter?  a section? db book titlechapter “XML” chapter section “1”“1” “...” “6” numbersection numbertextnumber “10” number “1”“1” “1”“1” section number “5” titlechapter “SGML” number “1”“1” chapter number “10” text “…”

Keys for XML9 XML standard - ID attribute  Internal “pointers” rather than keys  Scoping: ID attribute unique within the entire document rather than among a designated set of elements –can’t express relative keys, e.g., for chapters/sections.  Limit to using attributes rather than elements  unary: at most one ‘key’ can be defined, in terms of a single attribute  value equality: on text (string)  defined in a attribute type : keys must come with a DTD

Keys for XML10 XML Data  Introduces a notion of keys explicitly  BUT –Can only be defined for element types rather than for certain collections of elements e.g. book, articles, …

Keys for XML XPath  Possible to specify interesting fragments of a document  Syntax similar to navigating directories in a file system //arbitrary path.empty path /document root - path concatenator * any single no de name

Keys for XML12 XPath example  Select BBB elements which have any attribute 

Keys for XML13 Xpath example (cont’d) //GGG/ancestor::*

Keys for XML14 XML-Schema...

Keys for XML15 XML Schema (cont’d)  Allow to specify keys in term of XPath expressions  BUT –XPath is a relatively complex language (move down, sideways, upwards, predicates and functions can be embedded) –Equivalence/containment of XPath expressions is unresolved  No efficient way to tell whether two keys are equivalent. –Value equality: restricted to text –Relative key not addressed –Structural requirement: key paths must exist and be unique.

Keys for XML16 A new key constraint language for XML  Powerful enough to express absolute and relative keys  Simple enough to be reasoned about efficiently –Equivalence/containment –consistency (satisfiability) –implication (keys derived from others)  Capturing the semistructured nature of XML data: –independent of any types/schema –no structural requirements: tolerating missing/multiple key paths

Keys for XML17 Outline  Node addresses – testing whether 2 nodes are the same node  Value equality – testing whether 2 nodes have the same value  Path expression language  Absolute key  Key Inference  Relative key  Strong key  Some issues

Keys for XML18 Tree representation  DOM (Document Object Model)  Document is a hierarchical structure of nodes –Element nodes –Attribute nodes –Text nodes

Keys for XML19 Tree representation (cont’d) J.S. Bach 1685 & Ich habe genug G.F. Handel Art Thou Troubled? </db<

Keys for XML20 Tree representation (cont’d) “Art Thou Troubled” name “J.S. Bach” 1 born title num “BWV82” db composer work “1685” “Iche abe genug” num “BWV552” work name periode “Baroque” composer 1 1 “G.F. Handel” num work title “HWV19”

Keys for XML21 Tree representation (cont’d)  Attribute node: name+text, terminal  Text node: text, terminal  Element node: –name, may have children –Text and element children held in an array Index in the array determined by the order of the subelement in the document –Attribute children held in a dictionary Name of the attribute used as the index  Edge label uniquely identify children

Keys for XML22 Node Address  A path of edge labels from the root uniquely identifies a node –,  An attribute node can only occur at the end of a node address  Order of attributes is unimportant  Order of subelements specified by their indexes  Address of a subnode relative to a node –Any subnode of a node with address will have a node address of the form where is the address of the subnode relative to.

Keys for XML23 Value Equality  Value of a node 1.A set S of relative addresses of its subnodes 2.A partial function from S to names 3.A partial function from S to texts  2 nodes are value-equal if they agree on 1, 2, 3  Notation: a = v b

Keys for XML24 Value Equality (example) S = {.,,,, }... db “ ” name firstName lastName “George” “Bush” name firstNamelastName “George” “Bush”

Keys for XML25 Path expressions  How to identify nodes in a tree?  Expression involving node names (tags + attributes) that describes a set of paths in the document tree –XPath (XML-Schema) –Regular expressions (semistructured data)

Keys for XML26 Regular Path Expressions db emps depts mgr emp “Mary” “John” “Bill” name emp name In the normal syntax of regular expressions: db.emps.emp db.(depts.dept.mgr |emps.emp) db._*.name dept

Keys for XML27 Language for path expression  2 necessary properties –Concatenation operation, not uniform presentation in XPath Concatenate a/b with /c/d : a/b//c/d –A path should only move down the tree Navigation axis in XPath

Keys for XML28 Language for path expression  Empty path “ε” (“.”)  Node name (tag/attribute name)  Wild card “_”, single node name (“*”)  Arbitrary path “_*”(“//”)  Concatenation of paths P, Q is P.Q (“/”)  Notation –n[P]: set of nodes (node addresses) reached by starting at node n and following a path that conforms to P –[P] := root[P]

Keys for XML29 Examples  Simple path – [title] = { } –[composer.work] = {,, }  Complex path – [_*] = {,,, } –[composer._] = {,,,,, } –[_*.num] = {,, }

30 Absolute key

Keys for XML31 Key specification Necessary to specify –Set on which we are defining the key (relation) –“Attributes” (set of column names)  Pair (Q, {P 1, …, P n }) –Target path Q path expression: target set on which the key constraint is to hold –Key path {P 1, …, P n } set of simple path expressions

Keys for XML32 Key specification (cont’d) –Target path Q –Key path {P 1, …, P n }  For any node n in [Q], there is a set of nodes n[P i ] found by following P i from n (may be empty)  Examples 1. (person.employees, {name.firstname, name.lastname}) 2. (composer, {name}) 3. (composer, {born})

Keys for XML33 Formal Definition A node n satisfies a key specification (Q,{P 1,..., P k }) iff for any n 1, n 2 in n[Q], if for all i, 1<= i <= k, there exist z 1 in n 1 [P i ] and z 2 in n 2 [P i ] such that z 1 = v z 2 then n 1 = n 2.  Value equality  Value equality z 1 = v z 2  Node equality  Node equality : 2 nodes are equal if they have the same node address n 1 = n 2  The values associated with key paths uniquely identify a node in the target set  Not part of the schema, data

Keys for XML34 Remarks  For any n1, n2 in [Q], if Pi is missing at either n1 or n2 then n1[Pi] and n2[Pi] are by definition disjoint  Multiple nodes Key (A, {B}) with respect to the root. The document does not satisfy the key.

Keys for XML35 Example of keys  (_*.person, {id}) –2 persons elements are disjoint on their id fields  (person, {ε}) –Any 2 person nodes immediately under the root have different values  (employee, {}) –Empty key. There is at most one employee under the root  (_*, {id}) –Any 2 nodes are disjoint on their id fields up to value-equality –Semantics of ID attribute in the XML standard

Keys for XML36 XML vs. relational XML, paths that define keys –Need not exist (null- valued keys) –Do not have to be unique –Key paths specify a set of addresses within a document Relational DB –Key values cannot be null, must exist –Have to be unique –1NF requires each component of every tuple to be atomic value, not set

Keys for XML37 Remarks  Equivalence of 2 path expressions is decidable  Given a definition of equality on tree, do we need to have more than one key path in a key specification? –All key attributes must be represented as subnodes of some node –Constrain this node to contain only those subnodes –Too restrictive, unnecessary interference between key specifications and data models  Allow a (possible empty) set of nodes at the end of each key path –How to require each of the key paths to exist and to be unique?

Keys for XML38 Remarks (cont’d)  Language of path expression –Need something more powerful to express Q (person.(mother | father)*, {id}) A person element followed by zero or more father or mother elements  Provisional language of path expressions  Does not change in the way of the theory

Keys for XML39 Key inference  In relational DB –Infer some keys from the presence of others  If (Q, S) is a key and S  S ’, then so is (Q, S ’ ) –Counterpart of relational inference rule  If (Q.Q ’, {P}) is a key, then so is (Q, {Q ’.P}) –tree-like structure : if a node is identified in a tree then its ancestor are also determined I.e. if a key path P uniquely identifies a node n in [Q.Q ’ ] then Q ’.P is a key path for the ancestor of n in [Q].

Keys for XML40 Key Inference (cont’d)  If (Q,S) is a key and Q ’  Q,  then (Q ’, S) is also a key –Any key of the set [Q] is also a key for any subset of [Q]  For any finite set Σ of keys, there exists an (finite) XML document satisfying Σ –Key paths may be missing, e.g. (_*,{id}) If key path was required to exist at all nodes specified by the target path, the XML document would have to be infinite to satisfy the key –Only holds in the absence of DTDs

Keys for XML41 Key Inference  Key K = (X, {})  DTD D: foo foo  No XML document that both conforms to D and satisfies K  DTDs interact with XML key constraint XX X

42 Relative Key

Keys for XML43 Relative key - Motivation  Motivated by scientific data format, hierarchical structure, large set of entries at the top-level  Protein sequence database Swiss-prot –Accession number (key) for each entry –Within each entry, sequence of citations each identified by a number 1, 2, 3, …  Linguistic database – recording of speech –Data sets held in files –Metadata provided by directory structure –/timit/train/dr1/fcjjf0/sa1.wav –TIMIT corpus, training set, dialect region 1, female speaker, speaker-ID "cjf0", sentence text "sa1", speech waveform file

Keys for XML44 An absolute key for books An absolute key to identify a book: (book, {title} )  target path: book, starting from the root and identifying a collection of books  key path: title; its value uniquely identifies a book absolute: defined on the entire document section db book titlechapter “XML” chapter section “1” “...” “6” numbersection numbertextnumber “10” number “1”number “1” section number “5” titlechapter “SGML” number “1” chapter number “10” text “…”

Keys for XML45 Relative key - definition  Like the key of a weak entity set in DB Studios(name, address) Crews(number) A document satisfies a relative key specification (Q, (Q’,S)) iff for all nodes n in [Q], n satisfies the key (Q’,S).  Absolute keys are a special case of relative keys –(Q’,S) equivalent to (ε, (Q’,S))

Keys for XML46 A relative key for chapters A relative key: (book, (chapter, {number} ) ) A chapter number uniquely identifies a chapter within a book!  Context path: book  target path: chapter, starting at a book  key path: number relative: defined on sub-documents, relative to the context section db book titlechapter “XML” chapter section “1” “...” “6” numbersection numbertextnumber “10” number “1”number “1” section number “5” titlechapter “SGML” number “1” chapter number “10” text “…”

Keys for XML47 Absolute/Relative Key  What is the difference between –Absolute key (book.chapter, {number}) –Relative key (book, (chapter, {number} ) ) section db book titlechapter “XML” chapter section “1” “...” “6” numbersection numbertextnumber “10” number “1”number “1” section number “5” titlechapter “SGML” number “1” chapter number “10” text “…”

Keys for XML48 A relative key for sections Key: (book.chapter, (section, {number} ) ) A section number uniquely identifies a section within a particular chapter of a particular book! relative to the chapter containing the section, and to the book containing the chapter “XML” “1” “...”“10” db book titlechapter section “6” number section number textnumber “1”number “1” section number “5” titlechapter “SGML” number “1” chapter number “10” text “…”

Keys for XML49 Transitivity of relative keys  A relative key such as (bible.book.chapter,(verse, {number})) does not uniquely identify a particular verse in the bible  Book name, chapter number, verse number  verse

Keys for XML50 “immediately precedes” relation (Q 1, (Q’ 1,S 1 )) immediately precedes (Q 2, (Q’ 2,S 2 )) if Q 2 = Q 1.Q’ 1 –(bible, (book,{name})) immediately precedes (bible.book, (chapter,{number})) –Any absolute key immediately precedes itself

Keys for XML51 “precede” relation Precede is the transitive closure of the immediately precedes relation –Q n = Q 1.Q’ 1 …Q’ n-1 (bible, (book, {name})), (bible.book,(chapter, {number})), (bible.book.chapter,(verse, {number}))

Keys for XML52 Transitivity of relative keys  A set Σ of relative keys is transitive if for any relative key K1 = (Q1,(Q ’ 1,S1)) in Σ there is a key K2 = (ε,(Q ’ 2,S2)) in Σ which precedes K1  Any transitive set of relative key must contain some absolute key

Keys for XML53 Transitivity of relative keys - example TRANSITIVE SET (ε,(bible.book, {name})) (bible.book,(chapter, {number})) (bible.book.chapter,(verse, {number}))

Keys for XML54 Insertion-friendly relative keys  Transitive key specification ( ε, (university, {name})) (university, (dept.employee, {emp-id}))  Identify an employee: university name + emp-id  Add an employee: specify a dept for the employee  No way to identify a dept –Many ways to add an employee!!!

Keys for XML55 Insertion-friendly relative keys (cont’d)  Insert an element in the “ keyed ” part of the document unambiguously by specifying where to insert the element using keys.  A set Σ of relative keys is insertion-friendly if it is transitive and whenever (Q1,(Q ’ 1.n,S1))  Σ, there is a relative key (Q2,(Q ’ 2,S2))  Σ where |Q ’ 2| > 0 and Q1. Q ’ 1 = Q2.Q ’ 2. –n is a node name  Every element with a prefix along the path Q1.Q ’ 1 can be identified through some keys

Keys for XML56 Insertion-friendly relative keys (cont’d) ( ε, (university, {name})) (university, (dept, {dept-name})) (university, (dept.employee, {emp-id})) n = employee

Keys for XML57 Insertion-friendly relative keys (cont’d) (ε, (university, {name})) (university, (dept, {dept-name})) (university, (dept.employee, {emp-id}))  Nothing about the dept is necessary to identify employees!!!  Anomaly that occurs in non-second NF of relational databases  Employees should not be children of department nodes, but only of university nodes  Linkage between employees and department should be expressed through a foreign key

Keys for XML58 Notation for relative key  If system of relative keys is transitive, it forms a hierarchical structure  create a compressed syntax for such systems  Basic syntactic form Q 1 {P1,...,Pk 1 }.Q 2 {P1,...,Pk 2 }....Q n {P1,...,Pk n }

Keys for XML59 Notation for relative key (cont’d)  bible{}.book{name}.chapter{number}.verse{ number} (ε, (bible, {})) (bible, (book, {name}) (bible.book, (chapter,{number})) (bible.book.chapter, (verse,{number}))  company{name}[.employee{id},.department{name}] company{name}.employee{id} company{name}.department{name}

Keys for XML60 Notation for relative key  Compact and understandable  Ensure the internal consistency of the document  To tell other how to cite a component of our document  Our document have a structured “core”

61 Strong keys

Keys for XML62 Stronger definitions of keys  Requirements imposed by a key in relational DB: –Uniqueness of a key –Existence of key  Key paths exist and are unique (for 1  i  n, n[Pi] contains exactly one node) –name is unique at –work and num are not unique at this node

Keys for XML63 Stronger definitions of keys (cont’d) A node n satisfies a strong key specification (Q, {P1, …, Pk}) if –For all n’ in n[Q] and for all Pi, Pi exists and is unique at n’. –For any n1, n2 in n[Q], if for all I, n1[Pi] = v n2[Pi] then n1=n2

Keys for XML64 Stronger definitions of keys (cont’d)  (_*.person, {id}) –Any 2 person elements, have unique id and differ on those elements  (person, {ε}) –Unchanged  (employees, {}) –Unchanged

Keys for XML65 Stronger definitions of keys (cont’d)  (_*, {k}) –Every element has a key k, including element whose name is k  Finite satisfiability?  Impose an infinite chain of k nodes –No finite document satisfies it  Because of the requirement of existence of key paths –Structural constraint

Keys for XML66 Relative Strong Key A document satisfies a strong relative key specification (Q, (Q’,S)) iff for all nodes n in [Q], n satisfies the strong key (Q’,S)

67 “Unconstrained” XML : Node names as key values

Keys for XML68 Node names as key values  Key specification must cover the practical cases without using definitions that are too complex to allow any kind of reasoning about keys  Issue in “unconstrained” XML: interchanging structure (the names) with data (their values)

Keys for XML69 “unconstrained” XML widget widget gadget

Keys for XML70 Node names as key values (cont’d)  “Unconstrained” XML –Type of a part is expressed in the tag –Key constraint: parts{}[.widget{id},.gadget{id}]  Alternative XML representation –type expressed as an attribute or subelement of a part element –Key constraint: parts{}[.part{type,id}]

Keys for XML71 Introducing a new part type  Introduce a thingy  “unconstrained” –Change key specification –parts{}[.widget{id},.gadget{id},.thingy{id}]  Alternative –No change parts{}[.part{type,id}]  Ability to interchange structure and data is supposed to be one of the strong points of semistructured data and XML

Keys for XML72 Solution  Adding a “virtual” subelement node-name to each named node, whose value consists of the node name  Key: parts{}._{node-name, id}  Does not alter any of the properties expected to hold for keys  Account for any practical use of tag names in keys

Keys for XML73 Conclusion 4 A new key constraint language for XML: –independent of any schema specifications for XML –powerful enough to express absolute and relative keys –simple enough to be reasoned about efficiently 4 In contrast to their relational counterparts: –XML keys are more complex –the analyses of XML keys are far more intricate

Keys for XML74 References  Peter Buneman, Susan Davidson, Wenfei Fan, Carmem Hara, and Wang-Chiew Tan. Keys for XML. WWW10 (2001)  Peter Buneman, Susan Davidson, Wenfei Fan, Carmem Hara, and Wang-Chiew Tan. Reasoning about keys for XML. University of Pennsylvania. Technical Report MS-CIS-00-26,