Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query execution –Query optimization
XMLXML
XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (a very nasty language). After the roots: a format for sharing data
Why XML is of Interest to Us XML is just syntax for data –Note: we have no syntax for relational data –But XML is not relational: semistructured This is exciting because: –Can translate any data to XML –Can ship XML over the Web (HTTP) –Can input XML into any application –Thus: data sharing and exchange on the Web
XML Data Sharing and Exchange application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational Specific data management tasks
From HTML to XML HTML describes the presentation
HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999
XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content
Web Services A new paradigm for creating distributed applications? Systems communicate via messages, contracts. Example: order processing system. MS.NET, J2EE – some of the platforms XML – a part of the story; the data format.
XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags
More XML: Attributes Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data
More XML: Oids and References Jane Mary John Jane Mary John oids and references in XML are just syntax
XML Semantics: a Tree ! Mary Maple 345 Seattle John Thailand Mary Maple 345 Seattle John Thailand data Mary person name address name address streetnocity Maple345 Seattle John Thai phone id o555 Element node Text node Attribute node Order matters !!!
XML Data XML is self-describing Schema elements become part of the data –Reational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data
Relational Data as XML John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” person XML: person
XML is Semi-structured Data Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe no phone ! namephone John1234 Joe-
XML is Semi-structured Data Repeated attributes Impossible in tables: Mary Mary two phones ! namephone Mary ???
XML is Semi-structured Data Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234 structured name !
Document Type Definitions DTD part of the original XML specification an XML document may have a DTD XML document: well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it validation is useful in data exchange
Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>
Very Simple DTD John B Jim B John B Jim B Example of valid XML document:
DTD: The Content Model Content model: –Complex = a regular expression over other elements –Text-only = #PCDATA –Empty = EMPTY –Any = ANY –Mixed content = (#PCDATA | A | B | C)* content model
DTD: Regular Expressions <!ELEMENT name (firstName, lastName)) <!ELEMENT name (firstName?, lastName)) DTDXML <!ELEMENT person (name, phone*)) sequence optional <!ELEMENT person (name, (phone| ))) Kleene star alternation
Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will not discuss in class
Sample Data for Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998
Data Model for XPath bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element
XPath: Simple Expressions Result: Result: empty (there were no papers) /bib/book/year /bib/paper/year
XPath: Restricted Kleene Closure Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman Result: Rick //author /bib//first-name
Xpath: Text Nodes Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: –text() = matches the text value –node() = matches any node (= * or text()) –name() = returns the name of the current tag /bib/book/author/text()
Xpath: Wildcard Result: Rick Hull * Matches any element //author/*
Xpath: Attribute Nodes Result: means that price is has to be an attribute
Xpath: Predicates Result: Rick Hull /bib/book/author[firstname]
Xpath: More Predicates Result: … … /bib/book/author[firstname][address[//zip][city]]/lastname
Xpath: More Predicates < “60”] < “25”] /bib/book[author/text()]
Xpath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a a price attribute price attribute in book, in bib matches…