XML: Extensible Markup Language

Slides:

Advertisements

Similar presentations

XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.

Advertisements

An Introduction to XML Based on the W3C XML Recommendations.

XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.

XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.

Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.

CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang

Document Type Definitions

 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.

Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:

Introduction to XML This material is based heavily on the tutorial by the same name at

1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.

XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.

XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.

Lecture 7 of Advanced Databases XML Querying & Transformation Instructor: Mr.Ahmed Al Astal.

Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.

Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.

©Silberschatz, Korth and Sudarshan10.1Database System ConceptsIntroduction XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Originally.

XML By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.

Chapter 10: XML.

XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.

Lecture 6 of Advanced Databases XML Querying & Transformation Instructor: Mr.Eyad Almassri.

Chapter 10: XML XML Structure of XML Data XML Document Schema Querying and Transformation Application Program Interfaces to XML Storage of XML Data.

Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.

XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.

XML Syntax - Writing XML and Designing DTD's

XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH.

What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.

XML. 2 XML- Some Links XML Tutorials – Some Links me=htmlhttp://

1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.

Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.

1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.

Computing & Information Sciences Kansas State University Thursday, 15 Mar 2007CIS 560: Database System Concepts Lecture 24 of 42 Thursday, 15 March 2007.

IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.

Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.

XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.

Lecture 16 Introduction to XML Boriana Koleva Room: C54

An Introduction to XML Sandeep Bhattaram

XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.

Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.

COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.

When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.

ADT 2010 Introduction to XML, XPath (& XQuery) Chapter 10 in Silberschatz, Korth, Sudarshan “Database System Concepts” Stefan Manegold

Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.

Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.

XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.

CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.

Extensible Markup Language (XML) Pat Morin COMP 2405.

CS 480: Database Systems Lecture 26 March 18, 2013.

XML Technologies DTD.

XML: Extensible Markup Language

Unit 4 Representing Web Data: XML

CS 480: Database Systems Lecture 25 March 15, 2013.

XML By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)

XML QUESTIONS AND ANSWERS

Chapter 23: XML Database System concepts,6th Ed.

XML and Databases.

Database Processing with XML

CHAPTER 9 JAVA AND XML.

Session III Chapter 6 – Creating DTDs

CSCE 315 – Programming Studio Spring 2013

Chapter 7 Representing Web Data: XML

New Perspectives on XML

DTD (Document Type Definition)

Sampath Jayarathna Cal Poly Pomona

Lecture 5- Semi-Structured Data (XML, JSON)

Semi-Structured Data (XML, JSON)

Session II Chapter 6 – Creating DTDs

Presentation transcript:

XML: Extensible Markup Language Sampath Jayarathna Cal Poly Pomona

XML Structure of XML Data XML Document Schema Application Program Interfaces to XML Storage of XML Data

Introduction XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Documents have tags giving extra information about sections of the document E.g. <title> XML </title> <slide> Introduction …</slide> Extensible, unlike HTML Users can add new tags, and separately specify how the tag should be handled for display

XML Introduction (Cont.) The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents. Much of the use of XML has been in data exchange applications, not as a replacement for HTML Tags make data (relatively) self-documenting E.g. <?xml version = "1.0"?> <bank> <account> <account_number> A-101 </account_number> <branch_name> Downtown </branch_name> <balance> 500 </balance> </account> <depositor> <account_number> A-101 </account_number> <customer_name> Johnson </customer_name> </depositor> </bank>

XML: Motivation Data interchange is critical in today’s networked world Examples: Banking: funds transfer Order processing (especially inter-company orders) Scientific data Chemistry, Genetics Paper flow of information between organizations is being replaced by electronic flow of information Each application area has its own set of standards for representing information XML has become the basis for all new generation data interchange formats For awhile, XML (extensible markup language) was the only choice for open data interchange. But over the years there has been a lot of transformation in the world of open data sharing. The more lightweight JSON (Javascript object notation) has become a popular alternative to XML for various reasons.

XML Motivation (Cont.) Earlier generation formats were based on plain text with line headers indicating the meaning of fields Similar in concept to email headers Does not allow for nested structures, no standard “type” language Tied too closely to low level document structure (lines, spaces, etc) Each XML based standard defines what are valid elements, using XML type specification languages to specify the syntax DTD (Document Type Descriptors) XML Schema Plus textual descriptions of the semantics XML allows new tags to be defined as required However, this may be constrained by DTDs A wide variety of tools is available for parsing, browsing and querying XML documents/data

Comparison with Structured (Relational) Data Inefficient: tags, which in effect represent schema information, are repeated Better than relational tuples as a data-exchange format Unlike relational tuples, XML data is self-documenting due to presence of tags Non-rigid format: tags can be added Allows nested structures Wide acceptance, not only in database systems, but also in browsers, tools, and applications

Structure of XML Data Tag: label for a section of data Element: section of data beginning with <tagname> and ending with matching </tagname> Elements must be properly nested Proper nesting <account> … <balance> …. </balance> </account> Improper nesting <account> … <balance> …. </account> </balance> Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. Every document must have a single top-level element

Example of Nested Elements <?xml version = "1.0"?> <bank-1> <customer> <customer_name> Hayes </customer_name> <customer_street> Main </customer_street> <customer_city> Harrison </customer_city> <account> <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance> </account> … </customer> . . </bank-1>

Motivation for Nesting Nesting of data is useful in data transfer Example: elements representing customer_id, customer_name, and address nested within an order element Nesting is not supported, or discouraged, in relational databases With multiple orders, customer name and address are stored redundantly normalization replaces nested structures in each order by foreign key into table storing customer name and address information But nesting is appropriate when transferring data External application does not have direct access to data referenced by a foreign key

Structure of XML Data (Cont.) Mixture of text with sub-elements is legal in XML. Example: <account> This account is seldom used any more. <account_number> A-102</account_number> <branch_name> Perryridge</branch_name> <balance>400 </balance> </account> Useful for document markup, but discouraged for data representation

Attributes Elements can have attributes <account acct-type = “checking” > <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance> </account> Attributes are specified by name=value pairs inside the starting tag of an element An element may have several attributes, but each attribute name can only occur once <account acct-type = “checking” monthly-fee=“5”>

Class Activity 6 Convert the following Tree structure to bookstore.xml

Attributes vs. Subelements Distinction between subelement and attribute In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents In the context of data representation, the difference is unclear and may be confusing Same information can be represented in two ways <account account_number = “A-101”> …. </account> <account> <account_number>A-101</account_number> … </account> Suggestion: use attributes for identifiers of elements, and use subelements for contents

Namespaces XML data has to be exchanged between organizations Same tag name may have different meaning in different organizations, causing confusion on exchanged documents Specifying a unique string as a prefix avoids confusion xmlns:prefix Avoid using long unique names all over document by using XML Namespaces <h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table xmlns:f="https://www.w3schools.com/furniture"> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>

More on XML Syntax Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag <account number=“A-101” branch=“Perryridge” balance=“200 /> To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below <![CDATA[<account> … </account>]]> Here, <account> and </account> are treated as just strings CDATA stands for “character data”, text that will NOT be parsed by a parser

XML Document Schema Database schemas constrain what information can be stored, and the data types of stored values XML documents are not required to have an associated schema However, schemas are very important for XML data exchange Otherwise, a site cannot automatically interpret data received from another site Two mechanisms for specifying XML schema Document Type Definition (DTD) Widely used XML Schema Newer, increasing use

Why DTDs? XML documents are designed to be processed by computer programs If you can put just any tags in an XML document, it’s very hard to write a program that knows how to process the tags A DTD specifies what tags may occur, when they may occur, and what attributes they may (or must) have A DTD allows the XML document to be verified (shown to be legal) A DTD that is shared across groups allows the groups to produce consistent XML documents 3

Document Type Definition (DTD) The type of an XML document can be specified using a DTD DTD constraints structure of XML data What elements can occur What attributes can/must an element have What subelements can/must occur inside each element, and how many times. DTD does not constrain data types All values represented as strings in XML DTD syntax <!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >

An XML example <?xml version = "1.0"?> <novel> <foreword> <paragraph>This is the great American novel.</ paragraph> </foreword> <chapter number="1"> <paragraph>It was a dark and stormy night.</paragraph> <paragraph>Suddenly, a shot rang out!</paragraph> </chapter> </novel> An XML document contains (and the DTD describes): Elements, such as novel and paragraph, consisting of tags and content Attributes, such as number="1", consisting of a name and a value 7

A DTD example <!DOCTYPE novel [ <!ELEMENT novel (foreword, chapter+)> <!ELEMENT foreword (paragraph+)> <!ELEMENT chapter (paragraph+)> <!ELEMENT paragraph (#PCDATA)> <!ATTLIST chapter number CDATA #REQUIRED> ]> A novel consists of a foreword and one or more chapters, in that order Each chapter must have a number attribute A foreword consists of one or more paragraphs A chapter also consists of one or more paragraphs A paragraph consists of parsed character data (text that cannot contain any other elements) 8

ELEMENT descriptions Suffixes: Separators Grouping ? optional foreword? + one or more chapter+ * zero or more appendix* Separators , both, in order foreword?, chapter+ | or section|chapter Grouping ( ) grouping (section|chapter)+

Elements without children The syntax is <!ELEMENT name category> The name is the element name used in start and end tags The category may be EMPTY: In the DTD: <!ELEMENT br EMPTY> In the XML: <br></br> or just <br /> In the XML, an empty element may not have any content between the start tag and the end tag An empty element may (and usually does) have attributes

Elements with unstructured children The syntax is <!ELEMENT name category> The category may be ANY This indicates that any content--character data, elements, even undeclared elements--may be used Since the whole point of using a DTD is to define the structure of a document, ANY should be avoided wherever possible The category may be (#PCDATA), indicating that only character data may be used In the DTD: <!ELEMENT paragraph (#PCDATA)> In the XML: <paragraph>A shot rang out!</paragraph> The parentheses are required! Note: In (#PCDATA), whitespace is kept exactly as entered Elements may not be used within parsed character data

Elements with children A category may describe one or more children: <!ELEMENT novel (foreword, chapter+)> Parentheses are required, even if there is only one child A space must precede the opening parenthesis Commas (,) between elements mean that all children must appear, and must be in the order specified “|” separators means any one child may be used All child elements must themselves be declared Children may have children Parentheses can be used for grouping: <!ELEMENT novel (foreword, (chapter+|section+))>

Elements with mixed content #PCDATA describes elements with only character data #PCDATA can be used in an “or” grouping: <!ELEMENT note (#PCDATA|message)*> This is called mixed content Certain (rather severe) restrictions apply: #PCDATA must be first The separators must be “|” The group must be starred (meaning zero or more)

An expanded DTD example <!DOCTYPE novel [ <!ELEMENT novel (foreword, chapter+, biography?, criticalEssay*)> <!ELEMENT foreword (paragraph+)> <!ELEMENT chapter (section+|paragraph+)> <!ELEMENT section (paragraph+)> <!ELEMENT biography(paragraph+)> <!ELEMENT criticalEssay (section+)> <!ELEMENT paragraph (#PCDATA)> ]> 9

Class Activity 7a Create bookstore.dtd for the following bookstore.xml

Attributes In addition to elements, a DTD may declare attributes An attribute describes information that can be put within the start tag of an element Attribute has the form in dtd, <!ATTLIST element-name att-name type requirement> In XML: <dog name="Spot" age="3"></dog> In DTD: <!ATTLIST dog name CDATA #REQUIRED age CDATA #IMPLIED >

Attribute Requirements Recall that an attribute has the form <!ATTLIST element-name att-name type requirement> The requirement is one of: A default value, enclosed in quotes Example: <!ATTLIST degree CDATA "PhD"> #REQUIRED The attribute must be present #IMPLIED The attribute is optional #FIXED "value" The attribute always has the given value If specified in the XML, the same value must be used

Attributes The format of an attribute is: <!ATTLIST element-name att-name type requirement att-name type requirement> where the name-type-requirement may be repeated as many times as desired Note that only spaces separate the parts, so careful counting is essential The element-name tells which element may have these attributes The att-name is the name of the attribute Each element has a type, such as CDATA (character data) Each element may be required, optional, or “fixed” In the XML, attributes may occur in any order

Important attribute types There are ten attribute types These are the most important ones: CDATA The value is character data (man|woman|child) The value is one from this list ID The value is a unique identifier ID values must be legal XML names and must be unique within the document NMTOKEN The value is a legal XML name This is sometimes used to disallow whitespace in the name It also disallows numbers, since an XML name cannot begin with a digit

Less important attribute types IDREF The ID of another element IDREFS A list of other IDs <person person_id="e10001" parent_id="e10002 e10003"> <!ATTLIST person person_id ID #REQUIRED parent_id IDREFS #IMPLIED> NMTOKENS A list of valid XML names ENTITY An entity ENTITIES A list of entities (&entity-name;) <author>&writer;&copyright;</author> <!ENTITY writer "Donald Duck."> <!ENTITY copyright "Copyright W3Schools."> NOTATION A notation (format of non-xml data within xml document)

Class Activity 7b Update your bookstore.dtd to include the attributes of the following bookstore.xml

Inline DTDs If a DTD is used only by a single XML document, it can be put directly in that document: <?xml version="1.0"> <!DOCTYPE myRootElement [  ]> <myRootElement>  </myRootElement> An inline DTD can be used only by the document in which it occurs 5

External DTDs An external DTD (a DTD that is a separate document) is declared with a SYSTEM or a PUBLIC command: <!DOCTYPE myRootElement SYSTEM "http://www.mysite.com/mydoc.dtd"> The name that appears after DOCTYPE (in this example, myRootElement) must match the name of the XML document’s root element Use SYSTEM for external DTDs that you define yourself, and use PUBLIC for official, published DTDs External DTDs can only be referenced with a URL The file extension for an external DTD is .dtd External DTDs are almost always preferable to inline DTDs, since they can be used by more than one document 6

Another example: XML mydoc.dtd <?xml version="1.0"?> <!ELEMENT weatherReport (date, location, temperature-range)> <!ELEMENT date (#PCDATA)> <!ELEMENT location (city, state, country)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ELEMENT temperature-range ((low, high)|(high, low))> <!ELEMENT low (#PCDATA)> <!ELEMENT high (#PCDATA)> <!ATTLIST low scale (C|F) #REQUIRED> <!ATTLIST high scale (C|F) #REQUIRED> <?xml version="1.0"?> <!DOCTYPE weatherReport SYSTEM "http://www.mysite.com/mydoc.dtd"> <weatherReport> <date>05/29/2002</date> <location> <city>Philadelphia</city>, <state>PA</state> <country>USA</country> </location> <temperature-range> <high scale="F">84</high> <low scale="F">51</low> </temperature-range> </weatherReport>

Limitations of DTDs DTDs are a very weak specification language You can’t put any restrictions on element contents It’s difficult to specify: All the children must occur, but may be in any order This element must occur a certain number of times There are only ten data types for attribute values But most of all: DTDs aren’t written in XML! If you want to do any validation, you need one parser for the XML and another for the DTD This makes XML parsing harder than it needs to be There is a newer and more powerful technology: XML Schemas However, DTDs are still very much in use

Application Program Interface There are two standard application program interfaces to XML data: SAX (Simple API for XML) Based on parser model, user provides event handlers for parsing events E.g. start of element, end of element DOM (Document Object Model) XML data is parsed into a tree representation Variety of functions provided for traversing the DOM tree E.g.: Java DOM API provides Node class with methods getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … Also provides functions for updating DOM tree

Storage of XML Data XML data can be stored in Non-relational data stores Flat files Natural for storing XML XML databases Database built specifically for storing XML data, supporting DOM model Oracle Berkeley DB XML, eXist, BaseX Relational databases Data must be translated into relational form Advantage: mature database systems Disadvantages: overhead of translating data and queries IBM DB2, Oracle, MS SQL Server, PostgreSQL

Storage of XML in Relational Databases :String Representation Store each top level element as a string field of a tuple in a relational database Use a single relation to store all elements, or Use a separate relation for each top-level element type E.g. account, customer, depositor relations Benefits: Can store any XML data even without DTD As long as there are many top-level elements in a document, strings are small compared to full document Drawback: Need to parse strings to access values inside the elements Parsing is slow.

Mapping XML Data to Relations Relation created for each element type whose schema is known: An id attribute to store a unique id for each element A relation attribute corresponding to each element attribute A parent_id attribute to keep track of parent element All subelements that occur only once can become relation attributes For text-valued subelements, store the text as attribute value For complex subelements, can store the id of the subelement Subelements that can occur multiple times represented in a separate table Similar to handling of multivalued attributes when converting ER diagrams to tables