Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 11: XML and Semistructured Data

Similar presentations


Presentation on theme: "Lecture 11: XML and Semistructured Data"— Presentation transcript:

1 Lecture 11: XML and Semistructured Data
Friday, October 22, 2004

2 Outline XML syntax XML semantics
XML v.s. relational data and semistructured data DTD XPath

3 XML Syntax XML describes the content <bibliography>
<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> </bibliography> XML describes the content

4 XML Terminology tags: book, title, author, …
start tag: <book>, end tag: </book> elements: <book>…</book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element well formed XML document: if it has matching tags

5 More XML: Attributes <book price = “55” currency = “USD”>
<title> Foundations of Databases </title> <author> Abiteboul </author> <year> 1995 </year> </book>

6 Attributes v.s. Elements
<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> <year> 1995 </year> </book> <book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> <year> 1995 </year> <price> 55 </price> <currency> USD </currency> </book> attributes are alternative ways to represent data

7 Comparison Elements Attributes Ordered Unordered May be repeated
Must be unique May be nested Must be atomic

8 More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/> </person> <person id=“o123” mother=“o456”><name>John</name> oids and references in XML are just syntax

9 More XML: CDATA Section
Syntax: <![CDATA[ .....any text here...]]> Example: <example> <![CDATA[ some text here </notAtag> <>]]> </example>

10 More XML: Entity References
Syntax: &entityname; Example: <element> this is less than < </element> Some entities: < > & & &apos; " & Unicode char

11 More XML: Processing Instructions
Syntax: <?target argument?> Example: What do they mean ? <product> <name> Alarm Clock </name> <?ringBell 20?> <price> </price> </product>

12 More XML: Comments Syntax <!-- .... Comment text... -->
Yes, they are part of the data model !!!

13 XML Namespaces http://www.w3.org/TR/REC-xml-names (1/99)
name ::= [prefix:]localpart <book xmlns:isbn=“ <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book>

14 Belong to this namespace
XML Namespaces syntactic: <number> , <isbn:number> semantic: provide URL for schema <tag xmlns:mystyle = “ <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> Belong to this namespace

15 XML Semantics: a Tree ! Order matters !!! data person person id
Element node Text node Attribute node data <data> <person id=“o555” > <name> Mary </name> <address> <street> Maple </street> <no> 345 </no> <city> Seattle </city> </address> </person> <person> <name> John </name> <address> Thailand </address> <phone> </phone> </data> person person id address name address name phone o555 street no city Mary Thai John 23456 Maple 345 Seattle Order matters !!!

16 XML Data XML is self-describing
Schema elements become part of the data Reational schema: persons(name,phone) In XML <persons>, <name>, <phone> are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

17 Mapping Relational Data to XML Data
The canonical mapping: persons row row row Persons phone name phone name phone name “John” 3634 “Sue” 6343 “Dick” 6363 Name Phone John 3634 Sue 6343 Dick 6363 <persons> <row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row> </persons>

18 Mapping Relational Data to XML Data
Application specific mapping <persons> <person> <name> John </name> <phone> 3634 </phone> <order> <date> 2002 </date> <product> Gizmo </product> </order> <order> <date> 2004 </date> <product> Gadget </product> </person> <name> Sue </name> <phone> 6343 </phone> <order> <date> 2004 </date> </persons> Persons Name Phone John 3634 Sue 6343 Orders PersonName Date Product John 2002 Gizmo 2004 Gadget Sue

19 XML is Semi-structured Data
Missing attributes: Could represent in a table with nulls <person> <name> John</name> <phone>1234</phone> </person> <person> <name>Joe</name>  no phone ! name phone John 1234 Joe -

20 XML is Semi-structured Data
Repeated attributes Impossible in tables: <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> </person>  two phones ! name phone Mary 2345 3456 ???

21 XML is Semi-structured Data
Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: <db> contains both <book>s and <publisher>s <person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> </person>  structured name !

22 Document Type Definitions DTD
part of the original XML specification an XML document may have a DTD XML document: well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it validation is useful in data exchange

23 Very Simple DTD <!DOCTYPE company [
<!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

24 Very Simple DTD Example of valid XML document: <company>
<person> <ssn> </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> </ssn> <name> Jim </name> <office> B123 </office> <product> ... </product> ... </company>

25 DTD: The Content Model <!ELEMENT tag (CONTENT)> Content model:
Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY Mixed content = (#PCDATA | A | B | C)* content model

26 DTD: Regular Expressions
XML sequence <!ELEMENT name (firstName, lastName)) <name> <firstName> </firstName> <lastName> </lastName> </name> optional <!ELEMENT name (firstName?, lastName)) <person> <name> </name> <phone> </phone> </person> Kleene star <!ELEMENT person (name, phone*)) alternation <!ELEMENT person (name, (phone| )))

27 Querying XML Data XPath = simple navigation through the tree
XQuery = the SQL of XML XSLT = recursive traversal will not discuss in class

28 Sample Data for Queries
<bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib>

29 Data Model for XPath The root The root element . . . . bib book book
publisher author Addison-Wesley Serge Abiteboul

30 XPath: Simple Expressions
/bib/book/year Result: <year> 1995 </year> <year> 1998 </year> Result: empty (there were no papers) /bib/paper/year

31 XPath: Restricted Kleene Closure
//author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> Result: <first-name> Rick </first-name> /bib//first-name

32 Xpath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul
Victor Vianu Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: text() = matches the text value node() = matches any node (= * or text()) name() = returns the name of the current tag

33 Xpath: Wildcard Result: <first-name> Rick </first-name>
<last-name> Hull </last-name> * Matches any element //author/*

34 Xpath: Attribute Nodes
Result: “55” @price means that price is has to be an attribute

35 Xpath: Predicates /bib/book/author[firstname]
Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>

36 Xpath: More Predicates
Result: <lastname> … </lastname> <lastname> … </lastname> /bib/book/author[firstname][address[.//zip][city]]/lastname

37 Xpath: More Predicates
< 60] < 25] /bib/book[author/text()]

38 Xpath: Summary bib matches a bib element * matches any element
/ matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute matches price attribute in book, in bib matches…


Download ppt "Lecture 11: XML and Semistructured Data"

Similar presentations


Ads by Google