Lecture 9: XML Monday, October 17, 2005.

Slides:



Advertisements
Similar presentations
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Advertisements

XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
1 COS 425: Database and Information Management Systems XML and information exchange.
Managing XML and Semistructured Data
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML – a data sharing standard DSC340 Mike Pangburn.
4/20/2017.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
+ 1 XML eXtensible Markup Language. + 2 XML Lecture Adapted from the work of Dr. Praveen Madiraju of Marquette University.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Lecture 14: Relational Algebra Projects XML?
XML: Extensible Markup Language
Lecture 10 XML Monday, Oct. 21, 2001.
XML QUESTIONS AND ANSWERS
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Managing XML and Semistructured Data
Lecture 11 XML Wednesday, Oct. 24, 2001.
7. מבוא למסמכי XML ו-DTD (מבוסס על שקפים של אלדר פישר)
eXtensible Markup Language (XML)
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
eXtensible Markup Language
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
Allyson Falkner Spokane County ISD
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

Lecture 9: XML Monday, October 17, 2005

XML Outline XML (4.6, 4.7) This lecture: syntax, semistructured data Next lectures: DTDs, XPath, XQuery

Additional Readings on XML Main source: www.w3.org (but hard to read) http://www.w3.org/XML/ Strongly recommend readings: http://www.w3.org/XML/1999/XML-in-10-points www.zvon.org/xxl/XMLTutorial/General/book_en.html For XPath and XQuery (next lectures): http://www.galaxquery.org/

XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (a very nasty language). After the roots: a format for sharing data

XML Data Relational data does not have a syntax I can’t “give” you my relational database Need to import it from other other syntax, like CSV (comma-separated-values) XML = rich syntax for data But XML is not relational: semistructured Usage: Map any data to XML Store it in files, exchange on the Web, etc. Even query it directly, using XPath, XQuery

From HTML to XML HTML describes the presentation

HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999

XML Syntax XML describes the content <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content

XML Terminology tags: book, title, author, … start tag: <book>, end tag: </book> elements: <book>…</book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element well formed XML document: if it has matching tags

More XML: Attributes <book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book>

Attributes v.s. Elements <book price = “55” currency = “USD”> <title> Foundations of DBs </title> <author> Abiteboul </author> … <year> 1995 </year> </book> <book> <title> Foundations of DBs </title> <author> Abiteboul </author> … <year> 1995 </year> <price> 55 </price> <currency> USD </currency> </book> attributes are alternative ways to represent data

Comparison Elements Attributes Ordered Unordered May be repeated Must be unique May be nested Must be atomic

More XML: Oids and References <person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <mother idref=“o555”/> Are just keys/ foreign keys design by someone who didn’t take 444 Don’t use them: use your onwn foreign keys instead. oids and references in XML are just syntax

More XML: CDATA Section Syntax: <![CDATA[ .....any text here...]]> Example: <example> <![CDATA[ some text here </notAtag> <>]]> </example>

More XML: Entity References Syntax: &entityname; Example: <element> this is less than < </element> Some entities: < > & & &apos; ‘ " “ & Unicode char

More XML: Processing Instructions Syntax: <?target argument?> Example: What do they mean ? <product> <name> Alarm Clock </name> <?ringBell 20?> <price> 19.99 </price> </product>

More XML: Comments Syntax <!-- .... Comment text... --> Yes, they are part of the data model !!!

Means nothing as URL; just a unique name XML Namespaces name ::= [prefix:]localpart <book xmlns:isbn=“www.isbn-org.org/def”> <title> … </title> <number> 15 </number> <isbn:number> …. </isbn:number> </book> Means nothing as URL; just a unique name

Belong to this namespace XML Namespaces syntactic: <number> , <isbn:number> semantic: provide URL for schema <tag xmlns:mystyle = “http://…”> … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> Belong to this namespace

XML Semantics: a Tree ! Order matters !!! <data> Element node Text node Attribute node data <data> <person id=“o555” > <name> Mary </name> <address> <street>Maple</street> <no> 345 </no> <city> Seattle </city> </address> </person> <person> <name> John </name> <address>Thailand <phone>23456</phone> </data> person person id address name address name phone o555 street no city Mary Thai John 23456 Maple 345 Seattle Order matters !!!

XML Data XML is self-describing Schema elements become part of the data Reational schema: persons(name,phone) In XML <persons>, <name>, <phone> are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

Mapping Relational Data to XML Data The canonical mapping: persons row row row Persons phone name phone name phone name “John” 3634 “Sue” 6343 “Dick” 6363 Name Phone John 3634 Sue 6343 Dick 6363 <persons> <row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row> </persons>

Mapping Relational Data to XML Data Application specific mapping <persons> <person> <name> John </name> <phone> 3634 </phone> <order> <date> 2002 </date> <product> Gizmo </product> </order> <order> <date> 2004 </date> <product> Gadget </product> </person> <name> Sue </name> <phone> 6343 </phone> <order> <date> 2004 </date> </persons> Persons Name Phone John 3634 Sue 6343 Orders PersonName Date Product John 2002 Gizmo 2004 Gadget Sue

XML is Semi-structured Data Missing attributes: Could represent in a table with nulls <person> <name> John</name> <phone>1234</phone> </person> <person> <name>Joe</name> no phone ! name phone John 1234 Joe -

XML is Semi-structured Data Repeated attributes Impossible in tables: <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> </person> Two phones ! name phone Mary 2345 3456 ???

XML is Semi-structured Data Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: <db> contains both <book>s and <publisher>s <person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> </person> Structured name !

Document Type Definitions DTD part of the original XML specification an XML document may have a DTD XML document: well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it validation is useful in data exchange

Very Simple DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

Very Simple DTD Example of valid XML document: <company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> <product> ... </product> ... </company>

DTD: The Content Model <!ELEMENT tag (CONTENT)> Content model: Complex = a regular expression over other elements Text-only = #PCDATA Empty = EMPTY Any = ANY Mixed content = (#PCDATA | A | B | C)* content model

DTD: Regular Expressions XML sequence <!ELEMENT name (firstName, lastName)) <name> <firstName> . . . . . </firstName> <lastName> . . . . . </lastName> </name> optional <!ELEMENT name (firstName?, lastName)) <person> <name> . . . . . </name> <phone> . . . . . </phone> . . . . . . </person> Kleene star <!ELEMENT person (name, phone*)) alternation <!ELEMENT person (name, (phone|email)))