Lecture 5: XML Tuesday, January 16, 2001. Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)

Slides:



Advertisements
Similar presentations
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Advertisements

XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
XML May 2 nd, Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Lecture #6 XML November 2 nd, Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.
CSE 326: Data Structures Lecture #22 Multidimensional Search Trees Alon Halevy Spring Quarter 2001.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML – a data sharing standard DSC340 Mike Pangburn.
4/20/2017.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
Web Data Management XML and its Syntax.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
1 Data Integration. 2 Motivating Examples An organization has on average 49 databases –can talk about the same topic, but use different vocabularies,
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Transactions, Relational Algebra, XML February 11 th, 2004.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Lecture 14: Relational Algebra Projects XML?
Lecture 10 XML Monday, Oct. 21, 2001.
Management of XML and Semistructured Data
Lecture 11 XML Wednesday, Oct. 24, 2001.
eXtensible Markup Language (XML)
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
CSE591: Data Mining by H. Liu
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

Lecture 5: XML Tuesday, January 16, 2001

Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)

Facts About XML 132 books at Amazon 875,340 pages at Every database vendor Z has Many applications are just fancier Websites But, most importantly, XML enables data sharing on the Web – hence our interest

XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (a very nasty language). After the roots: a format for sharing data

XML Applications Sharing data between different components of an application. Format for storing all data in Office Format for CISCO routers system tables. Format for EDI: electronic data exchange: –Transactions between banks –Producers and suppliers sharing product data (auctions) –Extranets: building relationships between companies –Scientists sharing data about experiments.

Why XML is of Interest to Us XML is just syntax for data –Note: we have no syntax for relational data –But XML is not relational: semistructured This is exciting because: –Can translate any legacy data to XML –Can ship XML over the Web (HTTP) –Can input XML into any application –Thus: data sharing and exchange on the Web

XML Data Sharing and Exchange application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational Specific data management tasks

What is XML ? From HTML to XML HTML describes the presentation: easy for humans

HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 HTML is hard for applications

XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content: easy for applications

XML Syntax Another example: <db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> < > <title>Transaction Processing</title> <author>Bernstein</author> < >Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher> </db>

XML Terminology tags: book, title, author, … start tag:, end tag: start tags must correspond to end tags, and conversely

XML Terminology an element: everything between tags –example element: Complete Guide to DB2 –example element: Complete Guide to DB2 Chamberlin elements may be nested empty element: abbreviated an XML document has a unique root element well formed XML document: if it has matching tags

The XML Tree db book publisher titleauthor titleauthor namestate “Complete Guide to DB2” “Chamberlin”“Transaction Processing” “Bernstein”“Newcomer” “Morgan Kaufman” “CA” Tags on nodes Data values on leaves

More XML Syntax: Attributes Complete Guide to DB2 Chamberlin 1998 price, currency are called attributes

Replacing Attributes with Elements Complete Guide to DB2 Chamberlin USD attributes are alternative ways to represent data

“Types” (or “Schemas”) for XML Document Type Definition – DTD Define a grammar for the XML document –we use it as substitute for types/schemas Will be replaced by XML-Schema

An Example DTD <!DOCTYPE db [ ]> PCDATA means Parsed Character Data (a mouthful for string)

DTDs as Grammars db ::= (book|publisher)* book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= string A DTD is a EBNF (Extended BNF) grammar An XML tree is precisely a derivation tree XML Documents that have a DTD and conform to it are called valid

More on DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … XML documents can be nested arbitrarily deep

XML for Representing Data John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” person XML: person

DTDs as Schemas The DTD is: <!DOCTYPE db [ ]> <!DOCTYPE db [ ]>

XML vs Other Data Models XML is self-describing Schema elements become part of the data –Reational schema: person(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible, but also more inefficient XML = semistructured data

Semi-structured Data Explained Missing attributes: – John 1234 – Joe no phone ! Repeated attributes – Mary

Semistructured Data Explained Attributes with different types in different objects – John Smith complex name ! 1234 Nested collections (no 1NF) Heterogeneous collections: – contains both s and s

XML Data v.s. E/R, Relational Q: is XML better or worse ? A: serves different purposes –E/R, Relational models: For centralized processing, when we control the data –XML: Data sharing between different systems we do not have control over the entire data Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.

Data Sharing with XML: Easy Data source (e.g. relational Database) Application Web XML

Projects All (except one) are related to XML All are potential research projects –Some of your colleagues already do research on these topics Readings have been added to the Website

Project 1: Indexing patterns An XML-QL pattern: Abiteboul $x $y Looking for titles, years of all books published by Abiteboul. Problem: given a large XML file, preprocess it in order to answer quickly any pattern Goal of the project: implement 2-3 simple methods, evaluate and compare them. Notice: Pradeep Shenoy is working on this

Project 2: Xpath containment Xpath allow us to express simple patterns, with a single variable. E.g. /bib/book[author=Abiteboul, year=1999]/title Some queries are “contained” in others. E.g. the above is contained in: //*[author=Abiteboul]//*[year=1999]/title Containment for the full Xpath language is probably complex Goal: define a “reasonable” subset of Xpath, solve the containment problem, implement it. Notice: Gerome Miklau is working on this

Project 3: Xpath query pruning We are given an Xpath query and a DTD or XML- Schema. E.g.: Xpath query: //editor DTD: [says that there are papers and books in the document, but only books may have an editor] Rewrite the Xpath query to the “more efficient”: /bib/book/editor Goal: for a fragment of Xpath and of DTDs (or XML- Schema), find an algorithm for pruning queries in that fragment against schemas; implement it.

Project 4: Storing XML as relational data Given an XML document, therea re three ways to store it in relations (see papers) Goal of the project: evaluate the three alternatives on some large XML data instance (to be provided). Use SQL server, or some other DBMS

Project 5: Publishing relational data as XML Two research prototypes are published in the literature (see references) How do commercial products do it ? Goal: do a study of how commercial products approach XML publishing. Implement query rewriting for one of them (say SQL Server)

Project 6: Bulk Processing of Recursive Transformations Today’s most popular XML language is XSLT: top- down, recursive processing How can we “translate” XSLT to SQL ? We can’t. For a nice subset (“structural recursion”) this is possible, using a sophisticated (read inefficient) technique: epsilon-edges. Goal: choose a subset of structural recursion which can be translated efficiently to SQL, and implement the translation.

Project 7: Processing Ordered Collections in SQL Goal: design an extension of SQL with ordered collections, translated it into SQL over relations with an “index” attribute. Notice: Yana Kadiyska is working on this.