CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University1 Database Management Systems Session 10 Instructor: Vinnie Costa

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
S EMISTRUCTURED D ATA AND XML H OW THE W EB IS T ODAY HTML documents often generated by applications consumed by humans only easy access: across.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Introduction to XML, XPath, & XQuery CS186, Fall 2005 R &G - Chapters 7-27 Bill Gates, The Revolution, and a Network of Trees ( based on a true story)
1 Part 3: Query Languages Managing XML and Semistructured Data.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 357 Database Systems I Query Languages for XML.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
Query Languages - XQuery Slides partially from Dan Suciu.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
XML, XML Schema, Xpath and Xquery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Xquery. Summary of XQuery FLWR expressions FOR and LET expressions Collections and sorting Resource W3C recommendation:
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
S EMISTRUCTURED D ATA AND XML D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across.
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.
XML SNU OOPSLA Lab. October Contents  Semistructured Data  Introduction  History  XML Application  DTD & XML Schema  DOM & SAX  Summary.
1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004.
XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
S EMISTRUCTURED D ATA AND XML D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?
Lecture 14: Relational Algebra Projects XML?
Management of XML and Semistructured Data
Querying XML and Semistructured Data
Management of XML and Semistructured Data
XML: Schemas, Queries Wednesday, 4/17/2002
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
Lecture 8: XML Data Wednesday, October
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University1 Database Management Systems Session 10 Instructor: Vinnie Costa

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University2 Making A Difference [Apple Advertisement, 10/13 ] “It’s unfolded before your eyes. The revolution that is iPod first took the music scene by storm. Further spiced things up with full-color photos. Added a full complement of podcasts to the mix. And now iPod has turned the world topsy- turvy once again with video, letting you carry up to 150 hours of video wherever you go. Imagine: With iPod, you can play the DJ one minute. Rock with the latest Madonna or U2 music videos the next. Then get lost with “Lost”—or any of the other TV shows or short films now available for purchase and download from the iTunes Music Store. “ The Long Tail is becoming reality!!!

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University3 Tim Bray - Coinventor of XML  For more than 20 years, Tim Bray has been tackling projects as deep as the English Language (computerized Oxford English Dictionary, 1987), as wide as the Web (one of the first Internet search engines, 1994), and as tall as the meaning of data (XML, 1996). He invented XML with Jon Bosak.  “XML is used for banking transactions, for interchanging prices in condo developments and for exporting data from iTunes,” he points out. “None of those things were remotely on our minds when we were building it.”  

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University4 Introduction to Semistructured Data and XML Chapter 27

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University5 How the Web is Today  HTML documents  often generated by applications  consumed by humans only  easy access: across platforms, across organizations  No application interoperability:  HTML not understood by applications screen scraping brittle  Database technology: client-server still vendor specific

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University6 New Universal Data Exchange Format: XML A recommendation from the W3C  XML = data  XML generated by applications  XML consumed by applications  Easy access: across platforms, organizations

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University7 Paradigm Shift on the Web  From documents (HTML) to data (XML)  From information retrieval to data management  For databases, also a paradigm shift:  from relational model to semistructured data  from data processing to data/query translation  from storage to transport

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University8 Semistructured Data Origins:  Integration of heterogeneous sources  Data sources with non-rigid structure  Biological data  Web data

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University9 The Semistructured Data Model “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib Object Exchange Model (OEM) complex object atomic object

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University10 Syntax for Semistructured Data Bib: 1 { paper: 12 { … }, book: 24 { … }, paper: 29 { author: 52 “Abiteboul”, author: 96 { firstname: 243 “Victor”, lastname: 206 “Vianu”}, title: 93 “Regular path queries with constraints”, references: 12, references: 24, pages: 25 { first: , last: } }

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University11 Syntax for Semistructured Data May omit oids: { paper: { author: “Abiteboul”, author: { firstname: “Victor”, lastname: “Vianu”}, title: “Regular path queries …”, page: { first: 122, last: 133 } }

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University12 Characteristics of Semistructured Data  Missing or additional attributes  Multiple attributes  Different types in different objects  Heterogeneous collections Self-describing, irregular data, no a priori structure

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University13 Comparison with Relational Data { row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 } } row name phone “John”3634“Sue”“Dick”

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University14 XML  A W3C standard to complement HTML  Origins: Structured text SGML  Large-scale electronic publishing  Data exchange on the web  Motivation:  HTML describes presentation  XML describes content  (version 2, 10/2000)

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University15 From HTML to XML HTML describes the presentation

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University16 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University17 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University18 Why are we DB’ers interested?  It’s data, stupid. That’s us.  Proof by Google:  database+XML – 1,940,000 pages.  Database issues:  How are we going to model XML? (graphs).  How are we going to query XML? (XQuery)  How are we going to store XML (in a relational database? object-oriented? native?)  How are we going to process XML efficiently? (many interesting research questions!)

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University19 Document Type Descriptors  Sort of like a schema but not really.  Inherited from SGML DTD standard  BNF grammar establishing constraints on element structure and content  Definitions of entities

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University20 Shortcomings of DTDs Useful for documents, but not so good for data:  Element name and type are associated globally  No support for structural re-use  Object-oriented-like structures aren’t supported  No support for data types  Can’t do data validation  Can have a single key item (ID), but:  No support for multi-attribute keys  No support for foreign keys (references to other keys)  No constraints on IDREFs (reference only a Section)

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University21 XML Schema  In XML format  Element names and types associated locally  Includes primitive data types (integers, strings, dates, etc.)  Supports value-based constraints (integers > 100)  User-definable structured types  Inheritance (extension or restriction)  Foreign keys  Element-type reference constraints

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University22 Sample XML Schema …

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University23 Important XML Standards  XSL/XSLT: presentation and transformation standards  RDF: resource description framework (meta-info such as ratings, categorizations, etc.)  Xpath/Xpointer/Xlink: standard for linking to documents and elements within  Namespaces: for resolving name clashes  DOM: Document Object Model for manipulating XML documents  SAX: Simple API for XML parsing  XQuery: query language

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University24 XML Data Model (Graph) Issues: Distinguish between attributes and sub-elements? Should we conserve order?

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University25 XML Terminology  Tags: book, title, author, …  start tag:, end tag:  Elements: …, …  elements can be nested  empty element: (Can be abbrv. )  XML document: Has a single root element  Well-formed XML document: Has matching tags  Valid XML document: conforms to a schema

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University26 More XML: Attributes Foundations of Databases Abiteboul … 1995 Attributes are alternative ways to represent data

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University27 More XML: Oids and References Jane Mary John oids and references in XML are just syntax

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University28 XQuery Summary:  FOR-LET-WHERE-ORDERBY-RETURN = FLWOR FOR/LET Clauses WHERE Clause ORDERBY/RETURN Clause List of tuples Instance of Xquery data model

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University29 XQuery  FOR $x in expr -- binds $x to each value in the list expr  LET $x = expr -- binds $x to the entire list expr  Useful for common subexpressions and for aggregations

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University30 FOR v.s. LET FOR $x IN document("bib.xml") /bib/book RETURN $x FOR $x IN document("bib.xml") /bib/book RETURN $x Returns:... LET $x IN document("bib.xml") /bib/book RETURN $x LET $x IN document("bib.xml") /bib/book RETURN $x Returns:...

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University31 Path Expressions  Abbreviated Syntax  /bib/paper[2]/author[1]  /bib//author  paper[author/lastname=“Vianu"]  /bib/(paper|book)/title  Unabbreviated Syntax  child::bib/descendant::author  child::bib/descendant-or-self::*/child::author  parent, self, descendant-or-self, attribute

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University32 XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title Result: abc def ghi

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University33 XQuery For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t distinct = a function that eliminates duplicates

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University34 XQuery Result: Jones abc def Smith ghi

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University35 XQuery count = a (aggregate) function that returns the number of elms FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University36 XQuery Find books whose price is larger than average: LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University37 FOR v.s. LET FOR  Binds node variables  iteration LET  Binds collection variables  one value

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University38 Sorting in XQuery FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN $b/title, $b/price FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN $b/title, $b/price

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University39 If-Then-Else FOR $h IN //holding ORDERBY $h/title RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author FOR $h IN //holding ORDERBY $h/title RETURN $h/title, IF = "Journal" THEN $h/editor ELSE $h/author

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University40 XML vs. Semistructured Data  Both described best by a graph  Both are schema-less, self-describing  XML is ordered, ssd is not  XML can mix text and elements: Making Java easier to type and easier to type Phil Wadler  XML has lots of other stuff: attributes, entities, processing instructions, comments

CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University41 La commedia e finita' … …Good Luck…Make A Difference!!!