Download presentation
Presentation is loading. Please wait.
1
1 Lecture 08: XML and Semistructured Data
2
2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath
3
3 Additional Readings on XML XML –http://www.w3.org/XML/1999/XML-in-10-pointshttp://www.w3.org/XML/1999/XML-in-10-points –www.zvon.org/xxl/XMLTutorial/General/book_en.htmlwww.zvon.org/xxl/XMLTutorial/General/book_en.html –http://db.bell-labs.com/galax/http://db.bell-labs.com/galax/ –http://www.w3.org/TR/REC-xml-names (1/99)http://www.w3.org/TR/REC-xml-names Xpath –http://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.htmlhttp://java.sun.com/webservices/docs/ea2/tutorial/doc/JAXPXSLT2.html Xquery –http://www.w3.org/TR/xmlquery-use-cases/http://www.w3.org/TR/xmlquery-use-cases/ –http://www.xmlportfolio.com/xquery.htmlhttp://www.xmlportfolio.com/xquery.html Main source: www.w3.org (but hard to read)
4
4 XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (used in publishing). After the roots: a format for sharing data
5
5 XML Data Relational data does not have a syntax –I can’t “give” you my relational database –Need to import it from other syntax, like CSV (comma-separated-values) XML = rich syntax for data –But XML is not relational: semistructured Usage: –Map any data to XML –Store it in files, exchange on the Web, etc. –Even query it directly, using XPath, XQuery
6
6 XML Data Sharing and Exchange application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational Specific data management tasks
7
7 From HTML to XML HTML describes the layout
8
8 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999
9
9 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the structure
10
10 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. well formed XML document if it has matching tags tags are properly nested single root element and more constraints, e.g. on names
11
11 More XML: Attributes Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data
12
12 More XML: IDs and References Jane Mary John Jane Mary John Scope of IDs and references is the document
13
13 More XML: CDATA Section Syntax: Example: <>]]>
14
14 More XML: Entity References Syntax: &entityname; Used like macros Example: this is less than < << >> && '‘ "“ &Unicode char complete list: http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html some predefined entities
15
15 More XML: Processing Instructions Syntax: Example: Processed by external applications, e.g. php (bad style) Alarm Clock 19.99
16
16 More XML: Comments Syntax Yes, they are part of the data model !!!
17
17 XML Data: a Tree ! Mary Maple 345 Seattle John Thailand 23456 Mary Maple 345 Seattle John Thailand 23456 data Mary person name address name address streetnocity Maple345 Seattle John Thai phone 23456 id o555 Element node Text node Attribute node Order matters !!!
18
18 From Relational Data to XML Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick”63436363 persons XML: persons
19
19 XML Data XML is self-describing Schema elements become part of the data –Relational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data
20
20 Semi-structured Data Explained Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe no phone ! namephone John1234 Joe-
21
21 Semi-structured Data Explained Repeated attributes Impossible in tables: Mary 2345 3456 Mary 2345 3456 two phones ! namephone Mary23453456 ???
22
22 Semistructured Data Explained Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234 structured name !
23
23 Document Type Definitions DTD part of the original XML specification an XML document may have a DTD XML document: well-formed = if tags are correctly closed valid = if it has a DTD and conforms to it validation is useful in data exchange
24
24 Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>
25
25 Very Simple DTD 123456789 John B432 1234 987654321 Jim B123... 123456789 John B432 1234 987654321 Jim B123... Example of valid XML document:
26
26 DTD: The Content Model Content model: –Complex = a regular expression over other elements –Text-only = #PCDATA –Empty = EMPTY –Any = ANY –Mixed content = (#PCDATA | A | B | C)* content model
27
27 DTD: Regular Expressions <!ELEMENT name (firstName, lastName)).......... <!ELEMENT name (firstName?, lastName)) DTDXML <!ELEMENT person (name, phone*)) sequence optional <!ELEMENT person (name, (phone|email))) star (repeated occurrence) alternation..........................................
28
28 DTD: Attributes Document Type Definition Document … … … mandatory optional default enumeration
29
29 DTD: Entities DTD: Tim Berners Lee "> Document: &name; &address; internal entity external entity
30
30 Inclusion of DTD in Documents "test" is a document element ]> &hello; ]> External DTD Declaration Internal DTD Declaration Mixed usage
31
31 XML Namespaces Different DTDs can use the same names! –how to avoid conflicts when combining names from different DTDs? XML namespace is a collection of names (markup vocabulary) –identified by a prefix (URL reference)
32
32 XML Namespaces name ::= [prefix:]localname <book xmlns='urn:loc.gov:book' xmlns:isbn='www.isbn-org.org/def'> … 15 …. <book xmlns='urn:loc.gov:book' xmlns:isbn='www.isbn-org.org/def'> … 15 …. default name space names belong to default name space
33
33 … … XML Namespaces syntactic:, semantic: URL used as unique identifier –URL may not exist, has no function Belong to this namespace
34
34 Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will not discuss XQuery and XSLT build on XPath
35
35 Sample Data for Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998
36
36 Data Model for XPath bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element
37
37 XPath: Simple Expressions Result: 1995 1998 Result: empty (there were no papers) /bib/book/year /bib/paper/year
38
38 XPath: Restricted Kleene Closure Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman Result: Rick //author /bib//first-name
39
39 XPath: Text Nodes Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: –text() = matches the text value –node() = matches any node (= * or @* or text()) –name() = returns the name of the current tag /bib/book/author/text()
40
40 XPath: Wildcard Result: Rick Hull * Matches any element //author/*
41
41 XPath: Attribute Nodes Result: “55” @price means that price is has to be an attribute /bib/book/@price
42
42 XPath: Predicates Result: Rick Hull /bib/book/author[firstname]
43
43 XPath: More Predicates Result: … … /bib/book/author[firstname][address[.//zip][city]]/lastname
44
44 XPath: More Predicates /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()]
45
45 XPath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a book @pricematches a price attribute bib/book/@pricematches price attribute in book, in bib bib/book[@price<“55”]/author/lastname matches…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.