Download presentation
Presentation is loading. Please wait.
1
about XML/Xquery/RDF 4/1
2
TEXT Structured (relational) Data XML Less Structure More Structure
3
HTML vs. XML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … “Self-describing” -Schema info part of the data -Good for data exchange (albeit baroque for storage)
4
Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … HTML describes presentation XML describes content
5
Why are Database folks so excited about XML? XML is just a syntax for (self- describing) data This is still exciting because –No standard syntax for relational data –With XML, we can Translate any legacy data to XML Can exchange data in XML format –Ship over the web, input to any application
6
XML machine accessible meaning This is what a web-page in natural language looks like for a machine Jim Hendler
7
XML machine accessible meaning CV name education work private XML allows “meaningful tags” to be added to parts of the text Jim Hendler
8
XML machine accessible meaning CV name education work private But to your machine, the tags look like this…. Jim Hendler
9
XML machine accessible meaning Schemas help…. …by relating common terms between documents Jim Hendler
10
But other people use other schemas CV name education work private > Someone else has one like this…. Jim Hendler
11
But other people use other schemas …which don’t fit in Moral: There is still need for ontology mapping.. Jim Hendler
12
The X-standards… XML: an on-the-wire representation for data –Xquery: a query language for XML –Xschema: a schema description language for XML data RDF: a language for meta- data description WSDL/SOAP/UDDI: languages for describing services
13
XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags
14
Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … HTML describes presentation XML describes content
16
XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags
17
More XML: Attributes Foundations of Databases Abiteboul … 1995 Attributes are single-valued --No guidance on when to use them
18
More XML: Oids and References Jane Mary John oids and references in XML are just syntax Object identifiers
19
XML vs. Relational Data XML is meant as a language that supports both Text and Structured Data –Conflicting demands... XML supports semi-structured data –In essence, the schema can be union of multiple schemas Easy to represent books with or without prices, books with any number of authors etc. XML supports free mixing of text and data –using the #PCDATA type XML is ordered (while relational data is unordered) TEXT Structured (relational) Data XML Less Structure More Structure
20
DTDs <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … Notice that DTD is not In XML syntax… Semi- structured
21
XML Schemas More recent proposal (with XML syntax) unifies previous schema proposals generalizes DTDs uses XML syntax two documents: structure and datatypes –http://www.w3.org/TR/xmlschema-1 –http://www.w3.org/TR/xmlschema-2
22
XML Schema
23
RDF: Meta-data Standard for Web birds, butterflies, snakes John Smith Good’ol semantic networks..?
24
Querying XML Requirements: –Need to handle lack of schema. We may not know much about the data, so we need to navigate the XML. –Need to support both “information retrieval” and “SQL- style” queries. Ordered vs. un-ordered XML –“Human readable” like SQL? Candidates –Many… based on conflicting requirements XSL: Makes IR folks happy XML-QL: Makes DB folks happy Xquery : W3C’s attempt to make everybody (un)happy
25
XQuery 1.0: An XML Query Language –W3C Working Draft 20 December 2001 XML Query Use Cases –W3C Working Draft 20 December 2001 Microsoft.Net Xquery Language Demo –http://131.107.228.20/http://131.107.228.20/ –http://support.x- hive.com/xquery/index.ht ml –Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler –www.research.avayalabs.com/ user/wadler/papers/xquery- tutorial/ xquery-tutorial.pdf Xquery Resources
26
FLoWeR Expressions Xquery queries are made up of FLWR expressions that work on “paths” For binds variables to nodes Let computes aggregates Where applies a formula to find matching elements Return constructs the output elements Path expressions are of the form: element//element/element[attrib=value]
27
Comparison to SQL Look at the use case description on Xquery manual Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo] Has support for –“construction”—outputting the answers in arbitrary XML formats (use case “XMP” ) –“path expressions” --- navigating the XML tree (use case “seq”) –Simple text queries [use case “text”] –Allows queries on “Tag” elements Removes the “data/meta-data” barrier in queries –For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors. [XMP use case 6]
28
DTD for http://www.bn.com/bib.xml
29
Example Query { for $b in /bib/book where $b/publisher = "Addison- Wesley" and $b/@year > 1991 return { $b/title } } “For all books after 1991, return with Year changed from a tag to an attribute” TCP/IP Illustrated Advanced Programming in the Unix environment Result Query
30
Example Query (2) Return the books that cost more at amazon than fatbrain Let $amazon := document(http://www.amazon.com/books.xml),http://www.amazon.com/books.xml Let $fatbrain := document(http://www.fatbrain.com/books.xml)http://www.fatbrain.com/books.xml For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return { $am/title, $am/price, $fat/price } Join
31
XML frenzy in the DB Community Now that XML is there, what can we do with it? –Convert all databases from Relational to XML? Or provide XML views of relational databases? –Develop theory of native XML databases? Or assume that XML data will be stored in relational databases.. –Issues: What sort of storage mechanisms? What sort of indices?
32
XML middleware for Databases XML adapters (middle-ware) received significant attention in DB community –SilkRoute (AT&T) –Xperanto (IBM) Issues: – Need to convert relational data into XML Tagging (easy) –Need to convert Xquery queries into equivalent SQL queries Trickier as Xquery supports schema querying
33
Don’t look beyond this..
34
Xquery Tutorial Craig Knoblock University of Southern California
35
References XQuery 1.0: An XML Query Language –W3C Working Draft 20 December 2001 XML Query Use Cases –W3C Working Draft 20 December 2001 Microsoft.Net Xquery Language Demo –http://131.107.228.20/http://131.107.228.20/ –Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler –www.research.avayalabs.com/user/wadler/papers/xquer y-tutorial/ xquery-tutorial.pdf
36
DTD for http://www.bn.com/bib.xml
37
Data for www.bn.com/bib.xmlwww.bn.com/bib.xml TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 Advanced Programming in the Unix environment Stevens W. Addison-Wesley 65.95
38
Data for www.bn.com/bib.xml (cont.)www.bn.com/bib.xml Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95 The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers 129.95
39
Document References Document can either be referenced explicitly or in the default namespace In the Microsoft Demo –/Bib = document("http://www.bn.com/bib.xml")/bib We will use /bib throughout, but you must use the expansion to run the demo In Theseus the document for xquery is passed as input
40
Projection Return the names of all authors of books /bib/book/author = Stevens W. Abiteboul Serge Buneman Peter Suciu Dan
41
Project (cont.) The same query can also be written as a for loop /bib/book/author = for $bk in /bib/book return for $aut in $bk/author return $aut = Stevens W. Abiteboul Serge Buneman Peter Suciu Dan
42
Selection Return the titles of all books published before 1997 /bib/book[@year < "1997"]/title = TCP/IP Illustrated Advanced Programming in the Unix environment
43
Selection (cont.) Return the titles of all books published before 1997 /bib/book[@year < "1997"]/title = for $bk in /bib/book where $bk/@year < "1997" return $bk/title = TCP/IP Illustrated Advanced Programming in the Unix environment
44
Selection (cont.) Return book with the title “Data on the Web” /bib/book[title = "Data on the Web"] = Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95
45
Selection (cont.) Return the price of the book “Data on the Web” /bib/book[title = "Data on the Web"]/price = 39.95 How would you return the book with a price of $39.95?
46
Selection (cont.) Return the book with a price of $39.95 for $bk in /bib/book where $bk/price = " 39.95" return $bk = Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95
47
Construction Return year and title of all books published before 1997 for $bk in /bib/book where $bk/@year < "1997" return { $bk/@year, $bk/title } = TCP/IP Illustrated Advanced Programming in the Unix environment
48
Grouping Return titles for each author for $author in distinct(/bib/book/author/last) return { /bib/book[author/last = $author]/title } = TCP/IP Illustrated Advanced Programming in the Unix environment Data on the Web …
49
Join Return the books that cost more at amazon than fatbrain Let $amazon := document(http://www.amazon.com/books.xml),http://www.amazon.com/books.xml Let $fatbrain := document(http://www.fatbrain.com/books.xml)http://www.fatbrain.com/books.xml For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return { $am/title, $am/price, $fat/price }
50
Example Query 1 { for $b in /bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return { $b/title } } What does this do?
51
Result Query 1 TCP/IP Illustrated Advanced Programming in the Unix environment
52
Example Query 2 { for $b in document("http://www.bn.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return { $t } { $a } }
53
Result Query 2 TCP/IP Illustrated Stevens Advanced Programming in the Unix environment Stevens Data on the Web Abiteboul Data on the Web Buneman Data on the Web Suciu
54
Example Query 3 { for $b in document("http://www.bn.com/bib.xml")//book, $a in document("http://www.amazon.com/reviews.xml")//entry where $b/title = $a/title return { $b/title } { $a/price/text() } { $b/price/text() } }
55
Result Query 3 TCP/IP Illustrated 65.95 Advanced Programming in the Unix environment 65.95 Data on the Web 34.95 39.95
56
Example Query 4 { for $b in document("www.bn.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > "1991" return { $b/@year } { $b/title } sortby (title) }
57
Example Result 4 Advanced Programming in the Unix environment TCP/IP Illustrated
58
Impact of XML on Integration If and when all sources accept Xqueries and exchange data in XML format, then –Mediator can accept user queries in Xquery –Access sources using Xquery –Get data back in XML format –Merge results and send to user in XML format How about now? –Sources can use XML adapters (middle-ware)
59
Is XML standardization a magical solution for Integration? If all WEB sources standardize into XML format –Source access (wrapper generation issues) become easier to manage –BUT all other problems remain Still need to relate source (XML)schemas to mediator (XML)schema Still need to reason about source overlap, source access limitations etc. Still need to manage execution in the presence of source/network uncertainities
60
“Semantic Web” The LAV/GAV approaches assume that some human expert will do the actual schema mapping The “semantic-web” initiative attempts to automate schema mapping –Idea: Allow pages to write logical axioms relating their vocabulary (tags) to other external tags –Support automatic inference of relations between source and mediator schema using these rules DAML+OIL
68
Data Model
69
Which will have XML Syntax
70
Document Type Definition: DTD part of the original XML specification an XML document may have a DTD terminology for XML: –well-formed: if tags are correctly closed –valid: if it has a DTD and conforms to it validation is useful in data exchange
71
Notice that DTD is not In XML syntax…
72
External DTD Internal Two ways to specify a DTD Hello, world! <!DOCTYPE greeting [ ]> Hello, world!
74
DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> …
77
Shortcomings of DTDs Useful for documents, but not so good for data: No support for structural re-use –Object-oriented-like structures aren’t supported No support for data types –Can’t do data validation Can have a single key item (ID), but: –No support for multi-attribute keys –No support for foreign keys (references to other keys) –No constraints on IDREFs (reference only a Section)
78
XML Schema In XML format Includes primitive data types (integers, strings, dates, etc.) Supports value-based constraints (integers > 100) User-definable structured types Inheritance (extension or restriction) Foreign keys Element-type reference constraints
79
XML Schemas DTD: Pre-specified tags How many different RDBMS Schemas are needed here?
80
Sample XML Schema …
81
.//person[@ssn] @ssn Subtyping in XML Schema
82
DTDs as Schemas Not so well suited: impose unwanted constraints on order references cannot be constrained can be too vague: Union of schemas..?
83
XML Schemas recent proposal unifies previous schema proposals generalizes DTDs uses XML syntax two documents: structure and datatypes –http://www.w3.org/TR/xmlschema-1 –http://www.w3.org/TR/xmlschema-2
84
Although DB folks have several beefs Give me the names of people who are Listed either as editor or author of a book
86
Differences between XML and SSD Pure SSD uses edge-labeled graphs as data model XML is ordered, ssd is not XML can mix text and elements: Making Java easier to type and easier to type Phil Wadler XML has lots of other stuff: entities, processing instructions, comments
87
XML vs. standard semi- structured data models Alan 42 ab@com { person: &o123 { name: “Alan”, age: 42, email: “ab@com” } } person nameageemail Alan42ab@com person name age email Alan42ab@com father … { person: { father: &o123 …} } similar on trees, different on graphs Node labeling Edge labeling
89
XML seen from (R)DBMS world RDBMS may want to “publish” data in XML [provide an XML view of their data] –“Tagging” the output –Support XML-based querying (which are then converted to SQL querying) Single XML-QL query may correspond to a set of SQL queries –E.g. Schema queries SilkRoute, Xperanto systems –Support XML-based updating Tukwila RDBMS can be used to provide an efficient storage for XML files –Efficient indexing/retrieval of path expressions
90
Other Important XML Standards XSL/XSLT*: –presentation and transformation standards RDF: –resource description framework (meta-info such as ratings, categorizations, etc.) Xpath/Xpointer/Xlink*: –standard for linking to documents and elements within Namespaces: –for resolving name clashes DOM: –Document Object Model for manipulating XML documents SAX: –Simple API for XML parsing
91
RDF http://www.w3.org/TR/REC-rdf-syntax (2/99) purpose: metadata for Web –help search engines syntax in XML semantics: edge-labeled graphs
92
RDF Metadata standard birds, butterflies, snakes John Smith
93
More RDF Examples
95
RDF Terminology subject object predicate statement
96
More RDF: Containers bag, sequence, alternative s1 s2
97
RDF Containers (cont’d) Bag s1 s2 a rdf:type rdf_1 rdf_2
98
More RDF: Higher Order Statements “the author of www.thispage.com says: ‘the topic of www.thatpage.com is environment’ “ www.thatpage.com environment topic www.thispage.com says author RDF uses reification
100
XML Parsers traditional: return data structure (DOM?) event based: SAX (Simple API for XML) –http://www.megginson.com/SAX –write handler for start tag and for end tag
101
Need for Ontology standardization
102
XML Data Model does not exists Document Object Model (DOM): –http://www.w3.org/TR/REC-DOM-Level-1 (10/98) –class hierarchy (node, element, attribute,…) –objects have behavior –defines API to inspect/modify the document
108
Start of 4/9 lecture
109
Querying XML
110
XML Data Model (Graph) Issues: distinguish between attributes and sub-elements? Should we conserve order? Think of the labels as names of binary relations.
111
Need for XML querying human-readable documents to retrieve individual documents, to provide dynamic indexes, to perform context-sensitive searching, and to generate new documents. data-oriented documents to query (virtual) XML representations of databases, to transform data into new XML representations, and to integrate data from multiple heterogeneous data sources. mixed-model documents to perform queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents.
112
Querying XML Requirements: –Query a graph, not a relation. –The result should be a graph (representing an XML document), not a relation. –No schema. –We may not know much about the data, so we need to navigate the XML.
113
W3C requirements The W3C Query Working Group has identified many technical requirements: requirements at least one XML syntax; at least one human-readable syntax. must be declarative; must be protocol independent; must respect XML data model; must be namespace aware; must coordinate with XML Schema; must work even if schemas are unavailable; must support simple and complex datatypes; must support universal and existential quantifiers; must support operations on hierarchy and sequence of document structures; must combine information from multiple documents; must support aggregation; must be able to transform and to create XML structures; must be able to traverse ID references.
114
Query Languages XML-QL: Invented by DB folks –XML-QL is relational-complete (allows Joins) also supports path expressions Can extract as well as transform data into different formats (like XSL) –XML-QL is not in XML syntax XSL: can also be seen as a query language –Can transform data
115
XML-QL data model XML-QL works on an abstraction, called an XML graph, of the concrete XML document: comments and processing instructions are ignored; the relative order of elements is ignored; every node has an ID (autogenerated, if necessary); all leaves are character data. XML graphs are obtained from XML documents but are also generated by queries. A graph is mapped back into an XML document by choosing arbitrary orderings of element sequences. This abstraction is very similar to that from tables to relations: disregard the order of tuples and attributes.
116
Extracting Data by Query Matching data using elements patterns. WHERE Addison-Wesley $t $a IN “www.a.b.c/bib.xml” CONSTRUCT $a “where” clause only specifies What must be in the pattern --pattern can have other stuff besides what is listed in where
117
Constructing XML Data WHERE Addison-Wesley $t $a IN “www.a.b.c/bib.xml CONSTRUCT $a $t
118
Grouping with Nested Queries WHERE $t, Addison-Wesley CONTENT_AS $p IN “www.a.b.c/bib.xml” CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a ”
119
Joining Elements by Value (also integration) WHERE $f $l ELEMENT_AS $e IN “www.a.b.c/artbib.xml” $f $l IN “www.a.b.c/bookbib.xml”, y > 1995 CONSTRUCT $e Find all articles whose writers also published a book after 1995. Multiple queries That share values
120
Tag variables (schema queries) WHERE $t 1995 Smith IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT $t Smith $p matches book and article. $e matches author and editor. this saves us from writing four queries. This finds all publications in 1995 where Smith is either author or editor
121
Path Expressions WHERE $r Ford IN "www.a.b.c/parts.xml" CONSTRUCT $r WHERE $r IN "www.a.b.c/parts.xml" CONSTRUCT $r Matches any sequence of nodes all of which are labeled part (can substitute $ for part in the above…)
122
Due 30 th April
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.