Lecture 8: XML Data Wednesday, October 11 2000.

Slides:



Advertisements
Similar presentations
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Advertisements

XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
XML May 2 nd, Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Lecture #6 XML November 2 nd, Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML – a data sharing standard DSC340 Mike Pangburn.
4/20/2017.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Web Data Management XML and its Syntax.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
1 Data Integration. 2 Motivating Examples An organization has on average 49 databases –can talk about the same topic, but use different vocabularies,
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
XML Extensible Markup Language
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Metadata Michael J. Watts
eXtensible Markup Language
Lecture 14: Relational Algebra Projects XML?
Lecture 10 XML Monday, Oct. 21, 2001.
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Managing XML and Semistructured Data
XML Data Introduction, Well-formed XML.
Lecture 11 XML Wednesday, Oct. 24, 2001.
eXtensible Markup Language (XML)
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
CSE591: Data Mining by H. Liu
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Lecture 14: XML Publishing & Storage Midterm Review
Presentation transcript:

Lecture 8: XML Data Wednesday, October 11 2000

Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)

Facts About XML 132 books at Amazon 875,340 pages at www.altavista.com Every database vendor X has www.x.com/xml Many applications are just fancier Websites But, most importantly, XML enables data sharing on the Web – hence our interest

What is XML ? From HTML to XML HTML describes the presentation: easy for humans

HTML HTML is hard for applications <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999 HTML is hard for applications

XML XML describes the content: easy for applications <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content: easy for applications

XML eXtensible Markup Language Roots: comes from SGML (very nasty language). After the roots: a format for sharing data Emerging format for data exchange on the Web and between applications

XML Applications Sharing data between different components of an application. Format for storing all data in Office 2000. EDI: electronic data exchange: Transactions between banks Producers and suppliers sharing product data (auctions) Extranets: building relationships between companies Scientists sharing data about experiments.

XML Syntax Very simple: < db > < book > < title > Complete Guide to DB2 </ title > < author > Chamberlin </ author > </ book > < book > < title > Transaction Processing </ title > < author > Bernstein </ author > < author > Newcomer </ author > </ book > < publisher > < name > Morgan Kaufman </ name > < state > CA </ state > </ publisher > </ db >

XML Terminology tags: book, title, author, … start tag: <book>, end tag: </book> start tags must correspond to end tags, and conversely

XML Terminology well formed XML document: if it has matching tags an element: everything between tags example element: <title>Complete Guide to DB2</title> <book> <title> Complete Guide to DB2 </title> <author>Chamberlin</author> </book> elements may be nested empty element: <red></red> abbreviated <red/> an XML document has a unique root element well formed XML document: if it has matching tags

The XML Tree db book book publisher title author author name state “Complete Guide to DB2” “Morgan Kaufman” “CA” “Chamberlin” “Transaction Processing” “Bernstein” “Newcomer” Tags on nodes Data values on leaves

More XML Syntax: Attributes <book price = “55” currency = “USD”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year> </book> price, currency are called attributes

Replacing Attributes with Elements <book> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year> <price> 55 </price> <currency> USD </currency> </book> attributes are alternative ways (worse ) to represent data

“Types” (or “Schemas”) for XML Document Type Definition – DTD Define a grammar for the XML document, but we use it as substitute for types/schemas Will be replaced by XML-Schema (will extend DTDs)

An Example DTD <!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (#PCDATA)> ]> PCDATA means Parsed Character Data (a mouthful for string)

DTDs as Grammars db ::= (book|publisher)* book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= string A DTD is a EBNF (Extended BNF) grammar An XML tree is precisely a derivation tree XML Documents that have a DTD and conform to it are called valid

More on DTDs as Grammars <!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)> ]> <paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section> </paper> XML documents can be nested arbitrarily deep

XML for Representing Data persons XML: persons row row row phone name phone name phone name “John” 3634 “Sue” 6343 “Dick” 6363 <persons> <row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row> </persons>

XML vs Data Models XML is self-describing Schema elements become part of the data Reational schema: persons(name,phone) In XML <persons>, <name>, <phone> are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

Semi-structured Data Explained Missing attributes: <person> <name> John</name> <phone>1234</phone> </person> <person><name>Joe</name></person> no phone ! Repeated attributes <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone>

Semistructured Data Explained Attributes with different types in different objects <person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> </person> Nested collections (no 1NF) Heterogeneous collections: <db> contains both <book>s and <publisher>s

XML Data v.s. E/R, ODL, Relational Q: is XML better or worse ? A: serves different purposes E/R, ODL, Relational models: For centralized processing, when we control the data XML: Data sharing between different systems we do not have control over the entire data E.g. on the Web Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.

Data Sharing with XML: Easy  Web Data source (e.g. relational Database) Application XML

Exporting Relational Data to XML Product(pid, name, weight) Company(cid, name, address) Makes(pid, cid, price) makes product company

Export data grouped by companies <db><company> <name> GizmoWorks </name> <address> Tacoma </address> <product> <name> gizmo </name> <price> 19.99 </price> </product> <product> …</product> … </company> <company> <name> Bang </name> <address> Kirkland </address> <product> <name> gizmo </name> <price> 22.99 </price> </db> Redundant representation of products

The DTD <!ELEMENT db (company*)> <!ELEMENT company (name, address, product*)> <!ELEMENT product (name,price)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT price (#PCDATA)>

Export Data by Products <db> <product> <name> Gizmo </name> <manufacturer> <name> GizmoWorks </name> <price> 19.99 </price> <address> Tacoma </address> </manufacturer> <name> Bang </name> <price> 22.99 </price> <address> Kirkland </address> … </product> <product> <name> OneClick </name> … </db> Redundant representation of companies

Which One Do We Choose ? The structure of the XML data is determined by agreement, with our partners, or dictated by committees Many XML dialects (called applications) XML Data is often nested, irregular, etc No normal forms for XML 

Storing XML Data We got lots of XML data from the Web, how do we store it ? Ideally: convert to relational data, store in RDBMS Much harder than exporting relations to XML (why ?) DB Vendors currently work on tools for loading XML data into an RDBMS