“The reason that so many people are excited about XML is that so many people are excited about XML.” ANON CS 186 Spring 2006 XML Databases.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
Query Languages - XQuery Slides partially from Dan Suciu.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Lecture 7 of Advanced Databases XML Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
XML By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Maziar Sanaii Ashtiani – SCT – EMU, Fall 2011/12.
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
XML and its applications: 4. Processing XML using PHP.
Lecture 6 of Advanced Databases XML Querying & Transformation Instructor: Mr.Eyad Almassri.
Extensible Markup and Beyond
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Database Systems Part VII: XML Querying Software School of Hunan University
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
An Introduction to XML Paul Donohue May 8th 2002 Hotel Senator Zürich.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Martin Kruliš by Martin Kruliš (v1.1)1.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Lecture 14: Relational Algebra Projects XML?
XML: Extensible Markup Language
Querying and Transforming XML Data
XML in Web Technologies
Database Processing with XML
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
Lecture 8: XML Data Wednesday, October
XML and its applications: 4. Processing XML using PHP
Semi-Structured data (XML)
Presentation transcript:

“The reason that so many people are excited about XML is that so many people are excited about XML.” ANON CS 186 Spring 2006 XML Databases

XML Background eXtensible Markup Language Roots are HTML and SGML –HTML mixes formatting and semantics –SGML is cumbersome XML is focused on content –Designers (or others) can create their own sets of tags. –These tag definitions can be exchanged and shared among various groups (DTDs, XSchema). –XSL is a companion language to specify presentation. XML is ugly –Intended to be generated and consumed by applications --- not people!

From HTML to XML HTML describes the presentation

HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999

Example in XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content

XML as a Wire Format People quickly figured out that XML is a convenient way to exchange data among applications. –E.g. Ford’s purchasing app generates a purchase order in XML format, s it to a billing app at Firestone. –Firestone’s billing app ingests the , generates a bill in XML format, and s it to Ford’s bank. Emerging standards to get the “ ” out of the picture: SOAP, WSDL, UDDI… The basis of “Web Services” --- potential impact is tremendous. Why is it catching on? It’s just text, so… Platform, Language, Vendor agnostic Easy to understand, manipulate and extend. Compare this to data trapped in an RDBMS.

What’s this got to do with Databases? Given that apps will communicate by exchanging XML data, then databases must at least be able to: –Ingest XML formatted data –Publish their own data in XML format Thinking a bit harder: –XML is kind of a data model. –Why convert to/from relational if everyone wants XML? More cosmically: –Like evolution from spoken language to written language! The (multi-) Billion Dollar Question: –Will people really want to store XML data directly? –Current opinion: All major vendors say Yes, or at least, “Maybe”

Another (partial) Example ABC Corp. 123 ABC Way Goods Inc. 17 Main St. widget thingy jobber

Can View XML Document as a Tree Invoice as a tree Invoice Buyer Seller Itemlist Name Address Item ABC Corp.123 ABC Way Goods Inc. 17 Main St. widgetthingyjobber Name Address Item

Mapping to Relational Relational systems handle highly structured data

New splinters from XML Difficult to search trees that are broken into tables Very expensive to store variable document types   

ParentIDLabel NULL … 0 1 NULL 2 NULL … article author E.F. Codd pages … Mapping to Relational I Question: What is a relational schema for storing XML data? Answer – Depends on how “Structured” it is… If unstructured – use an “Edge Map” … article authoryearnumber E.F. Codd pages journal CACM

STORED table (author, year, journal, …) Overflow buckets Mapping to Relational II Can leverage Schema (or DTD) information to create relational schema. Sometimes called “shredding” For semi-structured data use hybrid with edge map for overflow. E.F. Codd … article authoryearcdrom pages 1970 journal CACM P377.pdf

Other XML features Elements can have “attributes” (not clear why) XML docs can have IDs and IDREFs, URIs –reference to another document or document element Two APIs for interacting with/parsing XML Docs: –Document Object Model (DOM) A tree “object” API for traversing an XML doc Typically for Java –SAX Event-Driven: Fire an event for each tag encountered during parse. May not need to parse the entire document.

Document Type Definitions (DTDs) Grammar for describing the allowed structure of XML Documents. Specify what elements can appear and in what order, nesting, etc. DTDs are optional (!) Many “standard” DTDs have been developed for all sorts of industries, groups, etc. –e.g. NITF for news article dissemination DTDs are being replaced by XSchema (more in a moment)

DTD Example (partial) <!ATTLIST SupplierID domain %string; #REQUIRED > <!ATTLIST ItemSegment segmentKey %string; #IMPLIED > <!ATTLIST Contract effectiveDate %datetime.tz; #REQUIRED expirationDate %datetime.tz; #REQUIRED > Here’s a DTD for a Contract Elements contain others: ? = 0 or 1 * = 0 or more + = 1 or more

XML Schemas, etc. XML Documents can be described using XSchema –Has a notion of types and typechecking –Introduces some notions of IC’s –Quite complicated, controversial... But will replace simpler DTDs XML Namespaces –Can import tag names from others –Disambiguate by prefixing the namespace name i.e. usa:price is different from eurozone:price

Querying XML Xpath –A single-document language for “path expressions” XSLT –XPath plus a language for formatting output XQuery –An SQL-like proposal with XPath as a sub-language –Supports aggregates, duplicates, … –Data model is lists, not sets – “reference implementations” have appeared, but language is still not widely accepted. SQL/XML –the SQL standards community fights back

XPath Syntax for tree navigation and node selection –Navigation is defined by “paths” –Used by other standards: XSLT, XQuery, XPointer,XLink / = root node or separator between steps in path * matches any one element references attributes of the current node // references any descendant of the current node [] allows specification of a filter (predicate) at a step [n] picks the nth occurrence from a list of elements. The fun part: Filters can themselves contain paths

XPath Examples Parent/Child (‘/’) and Ancestor/Descendant (‘//’): /catalog/product//msrp Wildcards (match any single element): /catalog/*/msrp Element Node Filters to further refine the nodes: –Filters can contain nested path expressions //product[price/msrp < 300]/name //product[price/msrp < –Note, this last one is a kind of “join”

XQuery FOR $x in /bib/book WHERE $x/year > 1995 RETURN $x/title

XQuery Main Construct (replaces SELECT-FROM-WHERE): FLWR Expression: FOR-LET-WHERE-RETURN FOR/LET Clauses WHERE Clause RETURN Clause Ordered List of tuples Filtered list of tuples XML data: Instance of Xquery data model

XQuery FOR $x in expr -- binds $x to each value in the list expr LET $x = expr -- binds $x to the entire list expr –Useful for common subexpressions and for aggregations

XQuery FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p distinct = a function that eliminates duplicates count = a (aggregate) function that returns the number of elms

Nested Queries Invert the hierarchy from publishers inside books to books inside publishers FOR $p IN distinct(//publisher) RETURN {FOR $b IN //book[publisher = $p] RETURN {$b/title} {$b/price} }

Operators Based on Global Ordering Returns nodes in expr1 that are before (after) nodes in expr2 Find procedures where no anesthesia occurs before the first incision FOR $proc IN //section[title = “Procedure”] WHERE empty($proc//anesthesia BEFORE ($proc//incision)[1]) RETURN $proc expr1 BEFORE AFTER expr2

Advantages of XML vs. Relational ASCII makes things easy –Easy to parse –Easy to ship (e.g. across firewall, via , etc.) Self-documenting –Metadata (tag names) come with the data Nested –Can bundle lots of related data into one message –(Note: object-relational allows this) Can be sloppy –don’t have to define a schema in advance Standard –Lots of free Java tools for parsing and munging XML Expect lots of Microsoft tools (C#) for same Tremendous Momentum!

What XML does not solve XML doesn’t standardize metadata –It only standardizes the metadata language Not that much better than agreeing on an alphabet –E.g. my tag vs. your tag Mine includes shipping and federal tax, and is in $US Yours is manufacturer’s list price in ¥Japan –XML Schema is a proposal to help with some of this XML doesn’t help with data modeling –No notions of IC’s, FD’s, etc. –In fact, encourages non-first-normal form! You will probably have to translate to/from XML (at least in the short term) –Relational vendors will help with this ASAP –XML “features” (nesting, ordering, etc.) make this a pain –Flatten the XML if you want data independence (?)

Reminder: Benefits of Relational Data independence buys you: –Evolution of storage -- vs. XML? –Evolution of schema (via views) – vs. XML? Database design theory –IC’s, dependency theory, lots of nice tools for ER Remember, databases are long-lived and reused –Today’s “nesting” might need to be inverted tomorrow! Issues: –XML is good for transient data (e.g. messages) –XML is fine for data that will not get reused in a different way (e.g. Shakespeare, database output like reports) –Relational is far cleaner for persistent data (we learned this with OODBs) Will benefits of XML outweigh these issues?????

More on XML 100s of books published –Each seems to be 1000 pages Try some websites –xml.org provides a business software view of XML –xml.apache.org has lots of useful shareware for XML – has shareware, tutorials, reference info –xml.com is the O’Reilly resource site – is the official XML standard site –the most standardized XML dialects are: Ariba’s Commerce XML (“cxml”, see cxml.org) RosettaNet (see rosettanet.org) Microsoft trying to enter this arena (BizTalk, now.NET)