Introduction to Database Systems CSE 444 Lecture 10 XML

Slides:



Advertisements
Similar presentations
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Advertisements

XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
1 COS 425: Database and Information Management Systems XML and information exchange.
Managing XML and Semistructured Data
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML – a data sharing standard DSC340 Mike Pangburn.
4/20/2017.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Lecture 14: Relational Algebra Projects XML?
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
Lecture 10 XML Monday, Oct. 21, 2001.
XML QUESTIONS AND ANSWERS
Management of XML and Semistructured Data
Management of XML and Semistructured Data
CSCE 315 – Programming Studio Spring 2013
Managing XML and Semistructured Data
Lecture 11 XML Wednesday, Oct. 24, 2001.
eXtensible Markup Language (XML)
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 9: XML Monday, October 17, 2005.
eXtensible Markup Language
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
Allyson Falkner Spokane County ISD
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

Introduction to Database Systems CSE 444 Lecture 10 XML October 17, 2007

XML Outline XML (4.6, 4.7) Syntax Semistructured data DTDs

Further Readings on XML Main source on XML, but hard to read: http://www.w3.org/XML/ Two tutorials out of myriads: http://www.w3.org/XML/1999/XML-in-10-points http://www.zvon.org/xxl/XMLTutorial/General/book_en.html You don’t need to read this for the class

Additional Readings on XPath/XQuery Recommended reading on Xquery http://www.w3.org/TR/xquery-use-cases/ Other suggested readings: http://www.w3.org/TR/xquery/ http://www.galaxquery.org/ Note: XML/XQuery is NOT covered in the textbook

We will study only XML as data A flexible syntax for data Used in: Data exchange Flexible databases: e.g. property lists Configuration files: e.g. Web.Config Document markup: e.g. XHTML Roots: SGML - a very nasty language We will study only XML as data

XML for Data Exchange Relational data does not have a syntax I can’t “give” you my relational database Examples of syntaxes: CSV (comma-separated-values), ASN.1 XML = syntax for data But XML is not relational: semistructured Usage: Export: Database  XML Transport/transform XML Import: XML  Databases or application

XML for Databases Relational databases have rigid schema Schema evolution is costly XML is flexible: semistructured data Store data in XML Warning: not normal form! Not even 1NF Don’t try this at home

From HTML to XML HTML describes the presentation

HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

XML Syntax XML describes the content <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content

XML Terminology tags: book, title, author, … start tag: <book>, end tag: </book> elements: <book>…</book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element well formed XML document: if it has matching tags

More XML: Attributes <book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book>

Attributes v.s. Elements <book price = “55” currency = “USD”> <title> Foundations of DBs </title> <author> Abiteboul </author> … <year> 1995 </year> </book> <book> <title> Foundations of DBs </title> <author> Abiteboul </author> … <year> 1995 </year> <price> 55 </price> <currency> USD </currency> </book> attributes are alternative ways to represent data

Comparison Elements Attributes Ordered Unordered May be repeated Must be unique May be nested Must be atomic

XML v.s. HTML What are the differences between XML and HTML ? In class HTML may be non-well formed: e.g. <br> without </br>. Better: <br/>; XML must be well formed HTML has semantics: <br> means newline, <i> means italic etc. XML has no semantics

More XML: Oids and References <person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <mother idref=“o555”/> Are just keys/ foreign keys design by someone who didn’t take 444 Don’t use them: use your own foreign keys instead. oids and references in XML are just syntax

More XML: CDATA Section Syntax: <![CDATA[ .....any text here...]]> Example: <example> <![CDATA[ some text here </notAtag> <>]]> </example>

More XML: Entity References Syntax: &entityname; Example: <element> this is less than < </element> Some entities: < > & & &apos; ‘ " “ & Unicode char

More XML: Processing Instructions Syntax: <?target argument?> Example: What do they mean ? <product> <name> Alarm Clock </name> <?ringBell 20?> <price> 19.99 </price> </product>

More XML: Comments Syntax <!-- .... Comment text... --> Yes, they are part of the data model !!!

Means nothing as URL; just a unique name XML Namespaces name ::= [prefix:]localpart <book xmlns:isbn=“www.isbn-org.org/def”> <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book> Means nothing as URL; just a unique name

Belong to this namespace XML Namespaces syntactic: <number> , <isbn:number> semantic: provide URL for schema <tag xmlns:mystyle = “http://…”> … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> Belong to this namespace

XML Semantics: a Tree ! Order matters !!! <data> Element node Text node Attribute node data <data> <person id=“o555” > <name> Mary </name> <address> <street>Maple</street> <no> 345 </no> <city> Seattle </city> </address> </person> <person> <name> John </name> <address>Thailand <phone>23456</phone> </data> person person id address name address name phone o555 street no city Mary Thai John 23456 Maple 345 Seattle Order matters !!!

XML Data XML is self-describing Schema elements become part of the data Reational schema: persons(name,phone) In XML <persons>, <name>, <phone> are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

Mapping Relational Data to XML Data The canonical mapping: persons row row row Persons phone name phone name phone name “John” 3634 “Sue” 6343 “Dick” 6363 Name Phone John 3634 Sue 6343 Dick 6363 <persons> <row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row> </persons>

Mapping Relational Data to XML Data Application specific mapping <persons> <person> <name> John </name> <phone> 3634 </phone> <order> <date> 2002 </date> <product> Gizmo </product> </order> <order> <date> 2004 </date> <product> Gadget </product> </person> <name> Sue </name> <phone> 6343 </phone> <order> <date> 2004 </date> </persons> Persons Name Phone John 3634 Sue 6343 Orders PersonName Date Product John 2002 Gizmo 2004 Gadget Sue

XML is Semi-structured Data Missing attributes: Could represent in a table with nulls <person> <name> John</name> <phone>1234</phone> </person> <person> <name>Joe</name> no phone ! name phone John 1234 Joe -

XML is Semi-structured Data Repeated attributes Impossible in tables: <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> </person> Two phones ! name phone Mary 2345 3456 ???

XML is Semi-structured Data Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: <db> contains both <book>s and <publisher>s <person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> </person> Structured name !

Document Type Definitions DTD part of the original XML specification an XML document may have a DTD XML document: Well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it validation is useful in data exchange

DTD Goals: Define what tags and attributes are allowed Define how they are nested Define how they are ordered Superseded by XML Schema Very complex: DTDs still used widely

Very Simple DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

Very Simple DTD Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> <product> ... </product> ... </company>

DTD: The Content Model <!ELEMENT tag (CONTENT)> Content model: Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY Mixed content = (#PCDATA | A | B | C)* content model

DTD: Regular Expressions XML sequence <!ELEMENT name (firstName, lastName))> <name> <firstName> . . . . . </firstName> <lastName> . . . . . </lastName> </name> optional <!ELEMENT name (firstName?, lastName))> <person> <name> . . . . . </name> <phone> . . . . . </phone> . . . . . . </person> Kleene star <!ELEMENT person (name, phone*))> alternation <!ELEMENT person (name, (phone|email)))>