1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.

Slides:



Advertisements
Similar presentations
Spring Part III: Introduction to XPath XML Path Language.
Advertisements

XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
An Introduction to XML Based on the W3C XML Recommendations.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
Managing XML and Semistructured Data
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Lecture #6 XML November 2 nd, Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML – a data sharing standard DSC340 Mike Pangburn.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
An Introduction to XML Sandeep Bhattaram
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.
1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Lecture 14: Relational Algebra Projects XML?
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
XML path expressions CSE 350 Fall 2003.
Management of XML and Semistructured Data
Lecture 11: Xpath/XQuery
Management of XML and Semistructured Data
Chapter 7 Representing Web Data: XML
Lecture 11 XML Wednesday, Oct. 24, 2001.
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

1 Lecture 08: XML and Semistructured Data

2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath

3 Additional Readings on XML XML – – – – (1/99) Xpath – Xquery – – Main source: (but hard to read)

4 XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (used in publishing). After the roots: a format for sharing data

5 XML Data Relational data does not have a syntax –I can’t “give” you my relational database –Need to import it from other syntax, like CSV (comma-separated-values) XML = rich syntax for data –But XML is not relational: semistructured Usage: –Map any data to XML –Store it in files, exchange on the Web, etc. –Even query it directly, using XPath, XQuery

6 XML Data Sharing and Exchange application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational Specific data management tasks

7 From HTML to XML HTML describes the layout

8 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999

9 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the structure

10 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. well formed XML document if it has matching tags tags are properly nested single root element and more constraints, e.g. on names

11 More XML: Attributes Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data

12 More XML: IDs and References Jane Mary John Jane Mary John Scope of IDs and references is the document

13 More XML: CDATA Section Syntax: Example: <>]]>

14 More XML: Entity References Syntax: &entityname; Used like macros Example: this is less than < << >> && &apos;‘ "“ &Unicode char complete list: some predefined entities

15 More XML: Processing Instructions Syntax: Example: Processed by external applications, e.g. php (bad style) Alarm Clock 19.99

16 More XML: Comments Syntax Yes, they are part of the data model !!!

17 XML Data: a Tree ! Mary Maple 345 Seattle John Thailand Mary Maple 345 Seattle John Thailand data Mary person name address name address streetnocity Maple345 Seattle John Thai phone id o555 Element node Text node Attribute node Order matters !!!

18 From Relational Data to XML Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons XML: persons

19 XML Data XML is self-describing Schema elements become part of the data –Relational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

20 Semi-structured Data Explained Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe  no phone ! namephone John1234 Joe-

21 Semi-structured Data Explained Repeated attributes Impossible in tables: Mary Mary  two phones ! namephone Mary ???

22 Semistructured Data Explained Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !

23 Document Type Definitions DTD part of the original XML specification an XML document may have a DTD XML document: well-formed = if tags are correctly closed valid = if it has a DTD and conforms to it validation is useful in data exchange

24 Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>

25 Very Simple DTD John B Jim B John B Jim B Example of valid XML document:

26 DTD: The Content Model Content model: –Complex = a regular expression over other elements –Text-only = #PCDATA –Empty = EMPTY –Any = ANY –Mixed content = (#PCDATA | A | B | C)* content model

27 DTD: Regular Expressions <!ELEMENT name (firstName, lastName)) <!ELEMENT name (firstName?, lastName)) DTDXML <!ELEMENT person (name, phone*)) sequence optional <!ELEMENT person (name, (phone| ))) star (repeated occurrence) alternation

28 DTD: Attributes Document Type Definition Document … … … mandatory optional default enumeration

29 DTD: Entities DTD: Tim Berners Lee "> Document: &name; &address; internal entity external entity

30 Inclusion of DTD in Documents "test" is a document element ]> &hello; ]> External DTD Declaration Internal DTD Declaration Mixed usage

31 XML Namespaces Different DTDs can use the same names! –how to avoid conflicts when combining names from different DTDs? XML namespace is a collection of names (markup vocabulary) –identified by a prefix (URL reference)

32 XML Namespaces name ::= [prefix:]localname <book xmlns='urn:loc.gov:book' xmlns:isbn=' … 15 …. <book xmlns='urn:loc.gov:book' xmlns:isbn=' … 15 …. default name space names belong to default name space

33 … … XML Namespaces syntactic:, semantic: URL used as unique identifier –URL may not exist, has no function Belong to this namespace

34 Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will not discuss XQuery and XSLT build on XPath

35 Sample Data for Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

36 Data Model for XPath bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element

37 XPath: Simple Expressions Result: Result: empty (there were no papers) /bib/book/year /bib/paper/year

38 XPath: Restricted Kleene Closure Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman Result: Rick //author /bib//first-name

39 XPath: Text Nodes Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: –text() = matches the text value –node() = matches any node (= * or text()) –name() = returns the name of the current tag /bib/book/author/text()

40 XPath: Wildcard Result: Rick Hull * Matches any element //author/*

41 XPath: Attribute Nodes Result: means that price is has to be an attribute

42 XPath: Predicates Result: Rick Hull /bib/book/author[firstname]

43 XPath: More Predicates Result: … … /bib/book/author[firstname][address[.//zip][city]]/lastname

44 XPath: More Predicates < “60”] < “25”] /bib/book[author/text()]

45 XPath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a a price attribute price attribute in book, in bib matches…