CSE 636 Data Integration XML Semistructured Data Document Type Definitions.

Slides:



Advertisements
Similar presentations
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
Advertisements

XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
Managing XML and Semistructured Data
1 XML Document Type Definitions XML Schema. 2 Well-Formed and Valid XML uWell-Formed XML allows you to invent your own tags. uValid XML conforms to a.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Fall 2001Arthur Keller – CS 18017–1 Schedule Nov. 27 (T) Semistructured Data, XML. u Read Sections Assignment 8 due. Nov. 29 (TH) The Real World,
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Database Systems Part VII: XML
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Document Type Definitions XML Schema
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XML Syntax - Writing XML and Designing DTD's
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
1 CS1368 Introduction* Relational Model, Schemas, SQL Semistructured Model, XML * The slides in this lecture are adapted from slides used in Standford's.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
An Introduction to XML Sandeep Bhattaram
Semistructured Data Extensible Markup Language Document Type Definitions Zaki Malik November 04, 2008.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.
Semistructured-Data Model. Lu Chaojun, SJTU 2 Semistructured Data Structured data has a separate schema to describe its structure. –Advantage: efficient.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Management of XML and Semistructured Data
Managing XML and Semistructured Data
Web Programming Maymester 2004
Lecture 11 XML Wednesday, Oct. 24, 2001.
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
Introduction to Database Systems CSE 444 Lecture 10 XML
CE223 Database Systems Introduction
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

CSE 636 Data Integration XML Semistructured Data Document Type Definitions

2 Semistructured Data Another data model, based on trees Motivation: flexible representation of data –Often, data comes from multiple sources with differences in notation, meaning, etc. Motivation: sharing of documents among systems and databases

3 Nodes = objects Labels on arcs (attributes, relationships) Atomic values at leaf nodes (nodes with no arcs out) Flexibility: no restriction on: –Labels out of a node –Number of successors with a given label Graphs of Semistructured Data

4 Bud A.B. Gold1995 MapleJoe’s M’lob beer bar manf servedAt name addr prize yearaward root The bar object for Joe’s Bar The beer object for Bud Example: Data Graph

5 XML HTML Uses tags for formatting the presentation (e.g., “italic”) Hard for applications to process XML = Extensible Markup Language Uses tags for semantics (e.g., “this is an address”) –Similar to labels in semistructured data Allows you to invent your own tags Easy for applications to process

6 HTML  XML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 …

7 Why XML is of Interest to Us XML is just syntax for data –Note: we have no syntax for relational data –But XML is not relational: semistructured This is exciting because: –Can translate any data to XML –Can ship XML over the Web (HTTP, SOAP) –Can input XML into any application –Thus: data sharing and exchange on the Web

8 XML Data Sharing and Exchange  Applications Relational DB Web Site Web Service TransformIntegrate Warehouse XML Data Web (HTTP, SOAP) XML DB  Applications

9 XML Tags & Elements Tags: book, title, author, … –XML tags are case sensitive Tags, as in HTML, are normally matched pairs – … –Start tag:, End tag: Elements: everything between tags –Example 1: Foundations of Databases –Example 2: Foundations of Databases Elements may be nested arbitrarily Empty element: –Abbreviation

10 XML Attributes Foundations of Databases Abiteboul … 1995 Attributes are alternative ways to represent data

11 Replacing Attributes with Elements Foundations of Databases Abiteboul … USD

12 Elements vs. Attributes Too many attributes make documents hard to read Attributes do not specify document structure Attributes are good for simple information

13 More XML: CDATA Section Syntax: Example: <>]]>

14 More XML: Entity References << >> && &apos;‘ "“ &Unicode char Syntax: &entityname; Example: this is less than < Some entities:

15 More XML: Comments Syntax Yes, they are part of the data model !!!

16 Mary Maple 345 Seattle John Thailand XML Semantics: a Tree ! data Mary person name address name address streetnocity Maple345 Seattle John Thai phone age 25 Element node Text node Attribute node Order matters!!!

17 Well-Formed XML Start the document with a declaration, surrounded by Normal declaration is: –“Standalone” = “no DTD provided” Has single root element surrounding nested elements Has matching tags

18 XML Data XML is self-describing Schema elements become part of the data –Relational schema: person(name, phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data –Well-Formed XML with nested tags is exactly the same idea as trees of semistructured data –XML also enables nontree structures, as does the semistructured data model

19 XML is Semistructured Data Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe  no phone ! namephone John1234 Joe-

20 XML is Semistructured Data Repeated attributes Impossible in tables: Mary Mary  two phones ! namephone Mary ???

21 XML is Semistructured Data Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !

22 Document Type Definition (DTD) Part of the original XML specification An XML document may have a DTD Valid XML: if it has a DTD and conforms to it Validation is useful in data exchange

23 Very Simple DTD <!DOCTYPE db [ ]> <!DOCTYPE db [ ]>

24 Content model: –Complex = a regular expression over other elements –Text-only= #PCDATA –Empty= EMPTY –Any= ANY –Mixed content= (#PCDATA | A | B | C)* DTD: The Content Model content model

25 … … … … … … … … … … … … … … … … … … DTD: Regular Expressions <!ELEMENT name (firstName, lastName)) <!ELEMENT name (firstName?, lastName)) <!ELEMENT person (name, phone*)) sequence optional <!ELEMENT person (name, (phone| ))) zero or more alternation DTDXML <!ELEMENT person (name, phone+)) one or more

26 DTD: Attributes <!ATTLISTperson age CDATA #REQUIRED height CDATA #IMPLIED> <!ATTLISTperson age CDATA #REQUIRED height CDATA #IMPLIED> <personage=“25” height=“6”>... <personage=“25” height=“6”>...

27 DTD: Attributes Types: CDATA= string (Mon | Wed | Fri)= enumeration ID= key IDREF= foreign key IDREFS= foreign keys separated by space others= rarely used Kind: #REQUIRED #IMPLIED= optional “value”= default value “value” #FIXED= the only value allowed

28 XML: IDs and References Attributes can be pointers from one object to another –Compare to HTML’s NAME = “foo” and HREF = “#foo” Allows the structure of an XML document to be a general graph, rather than just a tree

29 XML: Creating ID’s Give an element E an attribute A of type ID When using tag in an XML document, give its attribute A a unique value Example:

30 XML: Creating References To allow objects of type F to refer to another object with an ID attribute, give F an attribute of type IDREF Or, let the attribute have type IDREFS, so the F –object can refer to any number of other objects

31 Jane Mary John Jane Mary John XML: IDs and References IDs and references in XML are just syntax

32 DTD: ID and IDREF(S) Attributes <!ATTLIS personageCDATA#REQUIRED idID#REQUIRED managerIDREF#REQUIRED managesIDREFS#REQUIRED > <!ATTLIS personageCDATA#REQUIRED idID#REQUIRED managerIDREF#REQUIRED managesIDREFS#REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”>

33 Use of DTDs 1.Set standalone = “no” 2.Either: a)Include the DTD as a preamble of the XML document, or b)Follow DOCTYPE and the by SYSTEM and a path to the file where the DTD can be found, or c)Mix the two... (e.g. to override the external definition)

34 <!DOCTYPE BARS [ ]> Joe’s Bar Bud 2.50 Miller 3.00 … The DTD The document Example (a)

35 Assume the BARS DTD is in file bar.dtd Joe’s Bar Bud 2.50 Miller 3.00 … Example (b) Get the DTD from the file bar.dtd

36 DTDs as Grammars <!DOCTYPE db [ ]> <!DOCTYPE db [ ]>

37 DTDs as Grammars Same thing as: A DTD is a EBNF (Extended BNF) grammar An XML tree is precisely a derivation tree A valid XML document = a parse tree for that grammar db::= (book|publisher)* book::= (title,author*,year?) title::= string author::= string year::= string publisher::= string db::= (book|publisher)* book::= (title,author*,year?) title::= string author::= string year::= string publisher::= string

38 DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … XML documents can be nested arbitrarily deep

39 DTDs as Schemas Not so well suited: impose unwanted constraints on order: – references cannot be constrained –ID/IDREFS can reference any ID can be too vague: –

40 DTDs as Schemas No context-dependant typing Cannot distinguish between used car ads and new car ads –Different structure in different contexts dealer UsedCarsNewCars adad adad modelyear

41 XML APIs Document Object Model - DOM –Manipulation of XML Data –Provides a representation of an XML Document as a tree –Reads XML Document into memory – –Many implementations (Sun JAXP, Apache Xerces, …) Simple API for XML - SAX –Event-based framework for parsing XML data –

42 References Lecture Slides –Jeffrey D. Ullman – –Dan Suciu – x.htmhttp:// x.htm – –Alon Levy – ecture5cut.ppthttp:// ecture5cut.ppt BRICS XML Tutorial –A. Moeller, M. Schwartzbach – W3C's XML homepage – XML School: an XML tutorial –