About XML/Xquery/RDF 4/1. TEXT Structured (relational) Data XML Less Structure More Structure.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
1 XML and QUERY Shilpi Ahuja CSE Data Mining 4 th April 2002.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.
Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
About XML/Xquery/RDF 4/1. Why XML XML is the confluence of several factors: –The Web needed a more declarative format for data, trying to describe the.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML-QL A Query Language for XML Charuta Nakhe
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
Lecture 14: Relational Algebra Projects XML?
XML: Extensible Markup Language
XML QUESTIONS AND ANSWERS
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Managing XML and Semistructured Data
About XML/Xquery/RDF.
eXtensible Markup Language (XML)
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 9: XML Monday, October 17, 2005.
Lecture 8: XML Data Wednesday, October
CSE591: Data Mining by H. Liu
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

about XML/Xquery/RDF 4/1

TEXT Structured (relational) Data XML Less Structure More Structure

HTML vs. XML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … “Self-describing” -Schema info part of the data -Good for data exchange (albeit baroque for storage)

Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … HTML describes presentation XML describes content

Why are Database folks so excited about XML? XML is just a syntax for (self- describing) data This is still exciting because –No standard syntax for relational data –With XML, we can Translate any legacy data to XML Can exchange data in XML format –Ship over the web, input to any application

XML  machine accessible meaning This is what a web-page in natural language looks like for a machine Jim Hendler

XML  machine accessible meaning CV name education work private XML allows “meaningful tags” to be added to parts of the text Jim Hendler

XML  machine accessible meaning CV name education work private But to your machine, the tags look like this…. Jim Hendler

XML  machine accessible meaning Schemas help…. …by relating common terms between documents  Jim Hendler

But other people use other schemas CV name education work private   >  Someone else has one like this…. Jim Hendler

But other people use other schemas …which don’t fit in  Moral: There is still need for ontology mapping.. Jim Hendler

The X-standards… XML: an on-the-wire representation for data –Xquery: a query language for XML –Xschema: a schema description language for XML data RDF: a language for meta- data description WSDL/SOAP/UDDI: languages for describing services

XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … HTML describes presentation XML describes content

XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

More XML: Attributes Foundations of Databases Abiteboul … 1995 Attributes are single-valued --No guidance on when to use them

More XML: Oids and References Jane Mary John oids and references in XML are just syntax Object identifiers

XML vs. Relational Data XML is meant as a language that supports both Text and Structured Data –Conflicting demands... XML supports semi-structured data –In essence, the schema can be union of multiple schemas Easy to represent books with or without prices, books with any number of authors etc. XML supports free mixing of text and data –using the #PCDATA type XML is ordered (while relational data is unordered) TEXT Structured (relational) Data XML Less Structure More Structure

DTDs <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … Notice that DTD is not In XML syntax…  Semi- structured

XML Schemas More recent proposal (with XML syntax) unifies previous schema proposals generalizes DTDs uses XML syntax two documents: structure and datatypes – –

XML Schema

RDF: Meta-data Standard for Web birds, butterflies, snakes John Smith Good’ol semantic networks..?

Querying XML Requirements: –Need to handle lack of schema. We may not know much about the data, so we need to navigate the XML. –Need to support both “information retrieval” and “SQL- style” queries. Ordered vs. un-ordered XML –“Human readable” like SQL? Candidates –Many… based on conflicting requirements XSL: Makes IR folks happy XML-QL: Makes DB folks happy Xquery : W3C’s attempt to make everybody (un)happy

XQuery 1.0: An XML Query Language –W3C Working Draft 20 December 2001 XML Query Use Cases –W3C Working Draft 20 December 2001 Microsoft.Net Xquery Language Demo – – hive.com/xquery/index.ht ml –Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler – user/wadler/papers/xquery- tutorial/ xquery-tutorial.pdf Xquery Resources

FLoWeR Expressions Xquery queries are made up of FLWR expressions that work on “paths” For binds variables to nodes Let computes aggregates Where applies a formula to find matching elements Return constructs the output elements Path expressions are of the form: element//element/element[attrib=value]

Comparison to SQL Look at the use case description on Xquery manual Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo] Has support for –“construction”—outputting the answers in arbitrary XML formats (use case “XMP” ) –“path expressions” --- navigating the XML tree (use case “seq”) –Simple text queries [use case “text”] –Allows queries on “Tag” elements Removes the “data/meta-data” barrier in queries –For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors. [XMP use case 6]

DTD for

Example Query { for $b in /bib/book where $b/publisher = "Addison- Wesley" and > 1991 return { $b/title } } “For all books after 1991, return with Year changed from a tag to an attribute” TCP/IP Illustrated Advanced Programming in the Unix environment Result Query

Example Query (2) Return the books that cost more at amazon than fatbrain Let $amazon := document( Let $fatbrain := document( For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return { $am/title, $am/price, $fat/price } Join

XML frenzy in the DB Community Now that XML is there, what can we do with it? –Convert all databases from Relational to XML? Or provide XML views of relational databases? –Develop theory of native XML databases? Or assume that XML data will be stored in relational databases.. –Issues: What sort of storage mechanisms? What sort of indices?

XML middleware for Databases XML adapters (middle-ware) received significant attention in DB community –SilkRoute (AT&T) –Xperanto (IBM) Issues: – Need to convert relational data into XML Tagging (easy) –Need to convert Xquery queries into equivalent SQL queries Trickier as Xquery supports schema querying

Don’t look beyond this..

Xquery Tutorial Craig Knoblock University of Southern California

References XQuery 1.0: An XML Query Language –W3C Working Draft 20 December 2001 XML Query Use Cases –W3C Working Draft 20 December 2001 Microsoft.Net Xquery Language Demo – –Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler – y-tutorial/ xquery-tutorial.pdf

DTD for

Data for TCP/IP Illustrated Stevens W. Addison-Wesley Advanced Programming in the Unix environment Stevens W. Addison-Wesley 65.95

Data for (cont.) Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers

Document References Document can either be referenced explicitly or in the default namespace In the Microsoft Demo –/Bib = document(" We will use /bib throughout, but you must use the expansion to run the demo In Theseus the document for xquery is passed as input

Projection Return the names of all authors of books /bib/book/author = Stevens W. Abiteboul Serge Buneman Peter Suciu Dan

Project (cont.) The same query can also be written as a for loop /bib/book/author = for $bk in /bib/book return for $aut in $bk/author return $aut = Stevens W. Abiteboul Serge Buneman Peter Suciu Dan

Selection Return the titles of all books published before 1997 < "1997"]/title = TCP/IP Illustrated Advanced Programming in the Unix environment

Selection (cont.) Return the titles of all books published before 1997 < "1997"]/title = for $bk in /bib/book where < "1997" return $bk/title = TCP/IP Illustrated Advanced Programming in the Unix environment

Selection (cont.) Return book with the title “Data on the Web” /bib/book[title = "Data on the Web"] = Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95

Selection (cont.) Return the price of the book “Data on the Web” /bib/book[title = "Data on the Web"]/price = How would you return the book with a price of $39.95?

Selection (cont.) Return the book with a price of $39.95 for $bk in /bib/book where $bk/price = " 39.95" return $bk = Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95

Construction Return year and title of all books published before 1997 for $bk in /bib/book where < "1997" return { $bk/title } = TCP/IP Illustrated Advanced Programming in the Unix environment

Grouping Return titles for each author for $author in distinct(/bib/book/author/last) return { /bib/book[author/last = $author]/title } = TCP/IP Illustrated Advanced Programming in the Unix environment Data on the Web …

Join Return the books that cost more at amazon than fatbrain Let $amazon := document( Let $fatbrain := document( For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return { $am/title, $am/price, $fat/price }

Example Query 1 { for $b in /bib/book where $b/publisher = "Addison-Wesley" and > 1991 return { $b/title } } What does this do?

Result Query 1 TCP/IP Illustrated Advanced Programming in the Unix environment

Example Query 2 { for $b in document(" $t in $b/title, $a in $b/author return { $t } { $a } }

Result Query 2 TCP/IP Illustrated Stevens Advanced Programming in the Unix environment Stevens Data on the Web Abiteboul Data on the Web Buneman Data on the Web Suciu

Example Query 3 { for $b in document(" $a in document(" where $b/title = $a/title return { $b/title } { $a/price/text() } { $b/price/text() } }

Result Query 3 TCP/IP Illustrated Advanced Programming in the Unix environment Data on the Web

Example Query 4 { for $b in document(" where $b/publisher = "Addison-Wesley" and > "1991" return { } { $b/title } sortby (title) }

Example Result 4 Advanced Programming in the Unix environment TCP/IP Illustrated

Impact of XML on Integration If and when all sources accept Xqueries and exchange data in XML format, then –Mediator can accept user queries in Xquery –Access sources using Xquery –Get data back in XML format –Merge results and send to user in XML format How about now? –Sources can use XML adapters (middle-ware)

Is XML standardization a magical solution for Integration? If all WEB sources standardize into XML format –Source access (wrapper generation issues) become easier to manage –BUT all other problems remain Still need to relate source (XML)schemas to mediator (XML)schema Still need to reason about source overlap, source access limitations etc. Still need to manage execution in the presence of source/network uncertainities

“Semantic Web” The LAV/GAV approaches assume that some human expert will do the actual schema mapping The “semantic-web” initiative attempts to automate schema mapping –Idea: Allow pages to write logical axioms relating their vocabulary (tags) to other external tags –Support automatic inference of relations between source and mediator schema using these rules DAML+OIL

Data Model

Which will have XML Syntax

Document Type Definition: DTD part of the original XML specification an XML document may have a DTD terminology for XML: –well-formed: if tags are correctly closed –valid: if it has a DTD and conforms to it validation is useful in data exchange

Notice that DTD is not In XML syntax… 

External DTD Internal Two ways to specify a DTD Hello, world! <!DOCTYPE greeting [ ]> Hello, world!

DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> …

Shortcomings of DTDs Useful for documents, but not so good for data: No support for structural re-use –Object-oriented-like structures aren’t supported No support for data types –Can’t do data validation Can have a single key item (ID), but: –No support for multi-attribute keys –No support for foreign keys (references to other keys) –No constraints on IDREFs (reference only a Section)

XML Schema In XML format Includes primitive data types (integers, strings, dates, etc.) Supports value-based constraints (integers > 100) User-definable structured types Inheritance (extension or restriction) Foreign keys Element-type reference constraints

XML Schemas DTD: Pre-specified tags How many different RDBMS Schemas are needed here?

Sample XML Schema …

@ssn Subtyping in XML Schema

DTDs as Schemas Not so well suited: impose unwanted constraints on order references cannot be constrained can be too vague: Union of schemas..?

XML Schemas recent proposal unifies previous schema proposals generalizes DTDs uses XML syntax two documents: structure and datatypes – –

Although DB folks have several beefs Give me the names of people who are Listed either as editor or author of a book

Differences between XML and SSD Pure SSD uses edge-labeled graphs as data model XML is ordered, ssd is not XML can mix text and elements: Making Java easier to type and easier to type Phil Wadler XML has lots of other stuff: entities, processing instructions, comments

XML vs. standard semi- structured data models Alan 42 { person: &o123 { name: “Alan”, age: 42, } } person nameage person name age father … { person: { father: &o123 …} } similar on trees, different on graphs Node labeling Edge labeling

XML seen from (R)DBMS world RDBMS may want to “publish” data in XML [provide an XML view of their data] –“Tagging” the output –Support XML-based querying (which are then converted to SQL querying) Single XML-QL query may correspond to a set of SQL queries –E.g. Schema queries SilkRoute, Xperanto systems –Support XML-based updating Tukwila RDBMS can be used to provide an efficient storage for XML files –Efficient indexing/retrieval of path expressions

Other Important XML Standards XSL/XSLT*: –presentation and transformation standards RDF: –resource description framework (meta-info such as ratings, categorizations, etc.) Xpath/Xpointer/Xlink*: –standard for linking to documents and elements within Namespaces: –for resolving name clashes DOM: –Document Object Model for manipulating XML documents SAX: –Simple API for XML parsing

RDF (2/99) purpose: metadata for Web –help search engines syntax in XML semantics: edge-labeled graphs

RDF Metadata standard birds, butterflies, snakes John Smith

More RDF Examples

RDF Terminology subject object predicate statement

More RDF: Containers bag, sequence, alternative s1 s2

RDF Containers (cont’d) Bag s1 s2 a rdf:type rdf_1 rdf_2

More RDF: Higher Order Statements “the author of says: ‘the topic of is environment’ “ environment topic says author RDF uses reification

XML Parsers traditional: return data structure (DOM?) event based: SAX (Simple API for XML) – –write handler for start tag and for end tag

Need for Ontology standardization

XML Data Model does not exists Document Object Model (DOM): – (10/98) –class hierarchy (node, element, attribute,…) –objects have behavior –defines API to inspect/modify the document

Start of 4/9 lecture

Querying XML

XML Data Model (Graph) Issues: distinguish between attributes and sub-elements? Should we conserve order? Think of the labels as names of binary relations.

Need for XML querying human-readable documents to retrieve individual documents, to provide dynamic indexes, to perform context-sensitive searching, and to generate new documents. data-oriented documents to query (virtual) XML representations of databases, to transform data into new XML representations, and to integrate data from multiple heterogeneous data sources. mixed-model documents to perform queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents.

Querying XML Requirements: –Query a graph, not a relation. –The result should be a graph (representing an XML document), not a relation. –No schema. –We may not know much about the data, so we need to navigate the XML.

W3C requirements The W3C Query Working Group has identified many technical requirements: requirements at least one XML syntax; at least one human-readable syntax. must be declarative; must be protocol independent; must respect XML data model; must be namespace aware; must coordinate with XML Schema; must work even if schemas are unavailable; must support simple and complex datatypes; must support universal and existential quantifiers; must support operations on hierarchy and sequence of document structures; must combine information from multiple documents; must support aggregation; must be able to transform and to create XML structures; must be able to traverse ID references.

Query Languages XML-QL: Invented by DB folks –XML-QL is relational-complete (allows Joins) also supports path expressions Can extract as well as transform data into different formats (like XSL) –XML-QL is not in XML syntax XSL: can also be seen as a query language –Can transform data

XML-QL data model XML-QL works on an abstraction, called an XML graph, of the concrete XML document: comments and processing instructions are ignored; the relative order of elements is ignored; every node has an ID (autogenerated, if necessary); all leaves are character data. XML graphs are obtained from XML documents but are also generated by queries. A graph is mapped back into an XML document by choosing arbitrary orderings of element sequences. This abstraction is very similar to that from tables to relations: disregard the order of tuples and attributes.

Extracting Data by Query Matching data using elements patterns. WHERE Addison-Wesley $t $a IN “ CONSTRUCT $a “where” clause only specifies What must be in the pattern --pattern can have other stuff besides what is listed in where

Constructing XML Data WHERE Addison-Wesley $t $a IN “ CONSTRUCT $a $t

Grouping with Nested Queries WHERE $t, Addison-Wesley CONTENT_AS $p IN “ CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a ”

Joining Elements by Value (also integration) WHERE $f $l ELEMENT_AS $e IN “ $f $l IN “ y > 1995 CONSTRUCT $e Find all articles whose writers also published a book after Multiple queries That share values

Tag variables (schema queries) WHERE $t 1995 Smith IN " $e IN {author, editor} CONSTRUCT $t Smith $p matches book and article. $e matches author and editor. this saves us from writing four queries. This finds all publications in 1995 where Smith is either author or editor

Path Expressions WHERE $r Ford IN " CONSTRUCT $r WHERE $r IN " CONSTRUCT $r Matches any sequence of nodes all of which are labeled part (can substitute $ for part in the above…)

Due 30 th April