XML1 © 2015the University of Greenwich 1 XML 1 Introduction, Syntax, DTDs and XSDs Dr Kevin McManus

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

XML I.
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
An Introduction to XML Based on the W3C XML Recommendations.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
DTDs and Schemas © 24/01/2012University of Greenwich1 XML 2 DTDs, Schemas and Namespaces Kevin McManus Recycled from Gill Windall’s notes.
3 November 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Document Type Definitions
© 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.
26-Jun-15 XML. 2 HTML and XML, I XML stands for eXtensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to.
Tutorial 11 Creating XML Document
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introduction to XML This material is based heavily on the tutorial by the same name at
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
Validating DOCUMENTS with DTDs
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML – DTDs and Schemas © 2012the University of Greenwich 1 XML DTDs and Schemas Kevin McManus
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XML eXtensible Markup Language. Topics  What is XML  An XML example  Why is XML important  XML introduction  XML applications  XML support CSEB.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
CP3024 Lecture 9 XML: Extensible Markup Language.
XML Extensible Markup Language Aleksandar Bogdanovski Programing Enviroment LABoratory
Windows Presentation Foundation (WPF) Chapter 16 Dr. Abraham.
WEB APPLICATION DEVELOPMENT For More visit:
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XP 1Creating Web Pages with XML Tutorial 1 New Perspectives on XML Tutorial 1 – Creating an XML Document.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Unit 4 Representing Web Data: XML
XML QUESTIONS AND ANSWERS
Creating an XML Document
New Perspectives on XML
Presentation transcript:

XML1 © 2015the University of Greenwich 1 XML 1 Introduction, Syntax, DTDs and XSDs Dr Kevin McManus

XML1 © 2015the University of Greenwich 2 XML Basics  This lecture aims to cover:  What is XML and why it is significant  Content versus presentation  Displaying XML documents  What XML is actually used for  Well-formed XML documents  Further XML syntax  Valid XML documents  Introduction to Schemas, DTD and XSD  Namespaces  Technologies related to XML

XML1 © 2015the University of Greenwich 3 What is XML? 1. eXtensible Markup Language  HTML tags and attributes are restricted to those that the browser has been coded to recognise  XML is extensible because tags and attributes can be invented to suit any application e.g Hamsters and other Furry Rodents

XML1 © 2015the University of Greenwich 4 What is XML? 2. A simplified version of SGML (Standardised General Markup Language)  a language for defining mark-up languages  XML and HTML are related via SGML  hence the family likeness SGML HTML Other SGML languages XML XHTML Other XML languages is a subset of is defined using

XML1 © 2015the University of Greenwich 5 What is XML?  SGML is too complex for  the average human to cope with  easy automatic processing  Generic tools for manipulating SGML documents are expensive, large and complex  XML is designed for  ease of use  easy automatic processing  Generic tools for manipulating XML documents are relatively cheap and efficient

XML1 © 2015the University of Greenwich 6 What is XML? 3. A W3C standard   the core specification is XML A pervasive technology  but pervasive things can be a bit difficult to get a handle on 5. More than just hype  although it has been heavily hyped

XML1 © 2015the University of Greenwich 7 W3C Design Goals of XML 1.XML shall be straightforwardly usable over the Internet. 2.XML shall support a wide variety of applications. 3.XML shall be compatible with SGML. 4.It shall be easy to write programs which process XML documents. 5.The number of optional features in XML is to be kept to the absolute minimum, ideally zero. 6.XML documents should be human-legible and reasonably clear. 7.The XML design should be prepared quickly. 8.The design of XML shall be formal and concise. 9.XML documents shall be easy to create. 10.Terseness in XML markup is of minimal importance.

XML1 © 2015the University of Greenwich 8 Why XML?  HTML tags and attributes are pre-defined in the HTML (XHTML) standard and describe presentation  XML tags and attributes are defined to describe content and structure  XML is used to model data XML separates content from presentation

XML1 © 2015the University of Greenwich 9 Separation of Content and Presentation Frogs and Toads of the British Isles Frogs and Toads of the British Isles content and meaning is clear content and meaning ????? presentation in a web browser is defined presentation ?????

XML1 © 2015the University of Greenwich 10 Separation of Content and Presentation web browser on a PC app printed paper mobile phone audio Presentation can be rendered differently for different devices and needs assistive technology Frogs and Toads of the British Isles

XML1 © 2015the University of Greenwich 11 Separation of Content and Presentation Enables meaningful searches XML search engine Frogs and Toads of the British Isles Frogs and Toads of the British Isles Frogs and Toads of the British Isles Frogs and Toads of the British Isles query: FIND book WHERE ISBN=

XML1 © 2015the University of Greenwich 12 Book publisherBook retailer Separation of Content and Presentation A format for data exchange and communication SQL Server on Windoze Oracle on UNIX XML

XML1 © 2015the University of Greenwich 13 Separation of Content and Presentation  An alternative to Database technology?  Not really, XML is not a replacement for a RDBMS but may be used in places where a full RDBMS may be overkill  XML schemas are well established but the development of XML ontologies continues  e.g. OWL, DAML, OIL Data storage An ontology is a formal naming and definition of the types, properties, and interrelationships of the entities that exist in a particular domain of discourse Source: Wikipedia

XML1 Displaying XML documents  XML documents define content but not presentation  Some browsers can conveniently display XML documents as a hierarchical structure

XML1 © 2015the University of Greenwich 15 Displaying XML documents  So how do you tell browsers (or other presentation software) how to display document that use XML defined tags?  using style sheets of course:  There are two main style sheet languages CSS – Cascading Style Sheets XSL – eXtensible Stylesheet Language  XSL is much more complex and powerful XSL-FO and XSLT  For now we'll just use CSS to explore some possibilities  we will look at XSLT later XML document + style sheet => presentable document

XML1 © 2015the University of Greenwich 16 Displaying XML documents Hamsters and other Furry Rodents Frogs and Toads of the British Isles book { display:block } ISBN { display:inline; font-family:arial; color:blue; font-size:10pt; font-weight:bold } title { display:inline; font-family:arial; } date { display:none} books-css.xml books.css

XML1 © 2015the University of Greenwich 17 Displaying XML documents This example mixes three XML languages that the browser understands XHTML + SVG + MathML note filename is.xml

XML1 © 2015the University of Greenwich 18 Displaying XML documents note filename is.html

XML1 © 2015the University of Greenwich 19 Well Formed and Valid XML Documents  An XML document that conforms to the strict syntax rules in the XML 1.0 specification can be considered to be well-formed  makes life easy for an XML parser  In addition, an XML document can be considered to be valid  if it conforms to a set of language rules defined in a schema either...  a Document Type Definition (DTD) or…  an XML Schema (XSD)  XML documents don't need to have an associated DTD or XSD  in which case they can only be checked for being well formed but not for validity

XML1 © 2015the University of Greenwich 20 XML Syntax Rules 1.Document has a single root element 2.Tags must be properly nested - no overlapping tag pairs 3.All tags must have a closing tag - or be self closing 4.Tag names are case sensitive 5.Tag attributes are in the opening tag - unique attribute name - attribute value must be quoted

XML1 © 2015the University of Greenwich 21 XML Syntax Rules 1. Only one root element is allowed in a document This is called the document element Some HTML doc A bit of text Some HTML doc A bit of text not well formed well formed To be well-formed an XML document must have a document element that encloses all the other elements

XML1 © 2015the University of Greenwich 22 XML Syntax Rules  Any element contained inside another element has to be completely contained within it  you can't have one element partly within another  The following may work as XHTML but it is not well formed XML  Whereas this is well formed XML (XHML) 2. All elements must be "properly nested" bold text bold italic text italic text hold text bold italic text italic text bold text bold italic text italic text

XML1 © 2015the University of Greenwich 23 XML Syntax Rules Rules 1 and 2 combined mean that it is always possible to represent an XML document as a simple hierarchical tree Some XHTML Some XHTML A bit of text html body head h1 titleSome XHTML pA bit of text

XML1 © 2015the University of Greenwich 24 XML Syntax Rules  The following may be acceptable as HTML but is not well-formed XHTML first paragraph second paragraph  Whereas this is first paragraph second paragraph  If the tag is truly empty (i.e. it has no content) then the empty tag notation may be used so…  may be rewritten as 3. All elements must have a closing tag

XML1 © 2015the University of Greenwich 25 XML Syntax Rules  is different to is different to  closing tags must match case  of course... Hamsters and other Furry Rodents...would clearly be wrong 4. Tag names are case sensitive

XML1 © 2015the University of Greenwich 26 XML Syntax Rules  Start tags and empty tags but not end tags can contain attributes  Attributes always exists as name= "value" pairs  The attribute value must always be quoted with " or '  The attribute name must be unique within the tag  Some bad attribute examples Some rules concerning attributes Snow White Ford Ka credit close account

XML1 © 2015the University of Greenwich 27 Some More XML Syntax  Knowing about elements (i.e. tags), attributes and well- formed documents allows you create basic XML documents  Other aspects of XML syntax include  XML declaration  processing instructions  comments  character references and Entities  special symbols  CDATA sections

XML1 © 2015the University of Greenwich 28 XML Declaration  Ideally all XML documents should start with an XML declaration (SGML processing instruction)  If included the declaration must:  be the first line in the document  be on a single line beginning with  include version= to indicate the version of xml  currently this must be "1.0"  the declaration may optionally include:  encoding= indicates the encoding used to store the file typically this is "UTF-8" (8 bit Unicode)  standalone= " [yes|no] " does the document depend on external markup declarations?

XML1 © 2015the University of Greenwich 29 Processing Instructions  Instructions intended for an application processing the XML document  Processing instructions have the form  target identifies the program that the instruction is intended for  instruction is the instruction to the target program  A very common PI is targetinstruction

XML1 © 2015the University of Greenwich 30 Character References  As in HTML these can be used to include non- standard characters in the document  i.e. things that can be displayed but not easily entered from a standard keyboard  Format is:  &#DDD; &#xHHH;  DDD is the decimal number or HHH is the hex number representing the character in the character set it's Greek to me Φ Δ Δ  it's Greek to me Φ Δ Δ

XML1 © 2015the University of Greenwich 31 Entities  Some symbols have a special meaning in XML and can be entered as entities (or character references)  Standard symbols  less than symbol (<) - <  greater than symbol (>) - >  quotation mark (") - "  apostrophe (') - &apos;  ampersand (&) - &  copyright ( © ) - ©  Customised ones e.g. &copyw; to insert a predefined (e.g. in a DTD) copyright statement

XML1 © 2015the University of Greenwich 32 Character References and Entities

XML1 © 2015the University of Greenwich 33 CDATA Sections  A way of including data that you don't want interpreted as XML  character data not to be parsed  Form is  Why would you do this?  to hide executable JavaScript in an XML document  perhaps to include examples of badly formed XML in an XML document e.g. ]]>  Comments like HTML use

XML1 © 2015the University of Greenwich 34 XML Applications  Used by current generation user agents  eXtensible Hypertext Markup Language XHTML  Scalable Vector Graphics SVG  Mathematical Markup Language MathML  Other human-facing client software  Synchronised Multimedia Integration Language SMIL  only supported by the Real browser  Voice over XML VoiceXML (VML)  specialised industry and commerce applications

XML1 © 2015the University of Greenwich 35 XML Applications C Standard vocabularies for representing and exchanging specialist data e.g. legal, scientific, medical, mathematical vocabularies

XML1 © 2015the University of Greenwich 36 XML Applications  Meta data (data about data) to describe resources e.g.  Resource Description Framework RDF  DARPA Agent Markup Language DAML  Ontology Integration Language OIL  Web Ontology Language OWL Fred Bloggs

XML1 © 2015the University of Greenwich 37 XML Applications  Buried deep in application communications  SOAP, XML-RPC, WSDL, UDDI  Business to business (B2B) data exchange  ebXML  Probably of more value to B2B than a B2C website focussed e-commerce  competes with JSON in B2C applications

XML1 © 2015the University of Greenwich 38 Applications of XML CML MathML WML VoiceML XHTML SMIL SVG RDF SOAP UDDI WSDL ebXML etc. etc. Core XML Syntax DTD XSD Namespaces Supporting Specifications Xpath Xlink Xpointer Xquery XSLT XSL-FO CSS DOM etc. Supporting Tools Browsers – IE Mozilla APIs – DOM SAX Parsers – Expat MSXML Xerces IDEs – XMLSpy Stylus XML Technologies

XML1 © 2015the University of Greenwich 39 DTD and XSD Schemas  Document Type Definitions (DTD) and XML Schemas (XSD) are alternative ways of defining an XML language  They contain rules to specify the vocabulary and grammar of a language  the tags and attributes in the vocabulary  permissible values for attributes  optional and mandatory tags and attributes  tags nesting rules  XML languages defined by a DTDs or schemas are used to create valid XML documents

XML1 © 2015the University of Greenwich 40 valid XML document DTD and XSD Schemas  For an XML document to be valid it must conform to the rules specified in its DTD or XSD XML documents that use the language defined in the DTD or XSD DTD or XSD defines an XML language encapsulated definition of the data model

XML1 © 2015the University of Greenwich 41 Why do we need valid documents?  Applications must validate all incoming data  data i/o is a major source of system error  check that required elements are present  check that attribute values are appropriate  A DTD or XSD represents an agreed data model in a machine readable form  can be processed by standard software  COTS code used at each end to generate and check the data  validating parsers Estate AgentMortgage Broker agreed format XML

XML1 © 2015the University of Greenwich 42 DTD and XSD Schemas  DTDs  easy for humans to cope with  older than schemas  supported by a much wider range of XML tools and software  have poor support for namespaces  XSDs  more verbose  much more expressive than DTDs  data types, constraints on values  an XML based vocabulary  can be manipulated with general purpose XML tools  namespace support

XML1 © 2015the University of Greenwich 43 Defining DTDs  root element is recommended_books  the root element contains zero or more book elements  each book element contains the following elements: author, title, year_published, publisher, course and recommended_by  the author and recommended_by elements both consists of firstname and surname elements As an example we shall develop a DTD for an XML document type intended to list books recommended by lecturers for various courses. The first version of such documents will have the following structure:

XML1 Stephen Spainhour Webmaster in a Nutshell 1999 O'Reilly WAT Gill Windall Benoît Marchal Applied XML Solutions 2000 Sams WAT Kevin McManus goodbooks1.xml Note how the firstname and surname elements appear in both author and recommended_by elements None of the tags in this example contain attributes

XML1 © 2015the University of Greenwich 45 goodbooks1.dtd <!ELEMENT book (author, title, year_published, publisher, course, recommended_by)> contains 10 element definitions

XML1 © 2015the University of Greenwich 46 goodbooks1.dtd >(#PCDATA)surnameELEMENT<! >(#PCDATA)firstnameELEMENT<! >(firstname, surname)recommended_byELEMENT<! >(#PCDATA)courseELEMENT<! >(#PCDATA)publisherELEMENT<! >(#PCDATA)year_publishedELEMENT<! >(#PCDATA)titleELEMENT<! >(firstname, surname)authorELEMENT<! >(author, title, year_published, publisher, course, recommended_by) bookELEMENT<! >(book*)recommended_booksELEMENT<! element contentselement / tag nametype

XML1 © 2015the University of Greenwich 47 goodbooks1.dtd  The DTD can be read as meaning:  recommended_books contains zero of more book elements  each book element contains in order the elements:  author  title  year_published  publisher  course  recommended_by  the author and recommended_by elements both consists of firstname and surname elements  the title, year_published, publisher, course, firstname and surname elements consist of text  the actual data

XML1 © 2015the University of Greenwich 48 DTD syntax parsed character data - a string of text#PCDATA parentheses ( ) are used to group elements so this means zero or more occurrences of eleA followed by eleB (eleA,eleB)* eleA is followed by eleBeleA, eleB eleA or eleB occurs but not botheleA | eleB eleA occurs zero or more timeseleA* eleA occurs one of more timeseleA+ eleA is optionaleleA? Meaning of contentsExpression

XML1 © 2015the University of Greenwich 49 Four Element Forms  Empty Elements have no element content  can still contain information in attributes.  Element-Only Elements contain only child elements  content model is a list of child elements arranged using the expressions listed in the previous table  Text-Only Elements contain only character data (text)  content model is simply #PCDATA  Mixed Elements contain both child elements and character data  content model must contain  a choice list beginning with #PCDATA  the rest of the choice list contains the child elements  it must end in an asterisk indicating that the entire choice group is optional  although this constrains the type of child element it does not constrain the order or quantity

XML1 © 2015the University of Greenwich 50 Quick Quiz Here's a DTD Why is the following not a valid document according to the DTD?

XML1 © 2015the University of Greenwich 51 goodbooks2.xml  Extending the recommended books example to include attributes  The definition of the document type is changed to:  make the year_published element optional  allow more than one course to be referenced  include a rating attribute of the book element which can take the values "ok" or "good" or "excellent" and has a default value of "ok"

XML1 Stephen Spainhour Webmaster in a Nutshell 1999 O'Reilly WAT Internet Publishing Gill Windall Benoît Marchal Applied XML Solutions Sams WAT Kevin McManus attribute repeated course element attribute omitted year_published goodbooks2.xml

XML1 © 2015the University of Greenwich 53 goodbooks2.dtd <!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)> year_published is now optional course can occur more than once new rule defining a rating attribute for the book element

XML1 © 2015the University of Greenwich 54 Attribute Rules  "ok" is the default value from the rating enumerated series  Other attribute definitions are possible:  #REQUIRED – the attribute is required  #IMPLIED – the attribute is optional  #FIXED value – the attribute has a fixed value (constant)  As well as enumerated attribute types there are:  CDATA – unparsed character data  NOTATION – declared elsewhere in the DTD, usually a mime type  ENTITY – declared elsewhere in the DTD as an ENTITY (same as a name)  ID – unique identifier  IDREF – reference to an ID elsewhere in the DTD  NMTOKEN – name containing only token characters, i.e. no whitespace  Attributes can be defined anywhere in the DTD  but usually placed immediately after the corresponding element  Multiple attributes for an element are declared in a singe attribute list <!ATTLIST book rating (ok | good | excellent) "ok" reviewer CDATA #IMPLIED >

XML1 © 2015the University of Greenwich 55 Not so Quick Quiz  How do you decide if information should be in an element or an attribute?

XML1 © 2015the University of Greenwich 56 Linking the DTD to the XML document name of the root element URL of document containing the DTD Stephen An XML document can refer to an external DTD using

XML1 © 2015the University of Greenwich 57 Linking the DTD to the XML document <!DOCTYPE recommended_books [ <!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)> ]> Stephen Alternatively the DTD can be included inline within the XML document

XML1 © 2015the University of Greenwich 58 Quick Quiz This program was brought to you by Webbed Wonders. We can be contacted at Lettuce Towers Braythorpe Street Wessex WA1 7QT Thank you for your interest. Suppose we want to define an element that can contain a mixture of other elements and plain text Which of the following do you think is the correct way of specifying in a DTD the element as used above? It's not possible because the document isn't well-formed.

XML1 © 2015the University of Greenwich 59 What else can you do with DTDs?  Specify that an attribute value is unique within a document  a bit like a primary key in a data base table  Specify that the value of one attribute refers to an attribute type ID using an attribute type IDREF  like a foreign key  The ID value must be a valid name so cannot start with a 0-9 character

XML1 © 2015the University of Greenwich 60 What else can you do with DTDs?  Define your own entities, often commonly used strings e.g &Disclaimer;  Define ways of handling non-XML data e.g

XML1 © 2015the University of Greenwich 61 What can you not do with DTDs?  Specify the data type (e.g. integer) of elements or attributes  the only element data type recognised is string  attributes can validate enumerated or ID values  Easily mix XML vocabularies from different DTDs  namespaces are possible but not well supported  Accurately define the structure of a mixed element  cf. the preceding quick quiz  Because of these and other restrictions there have been a number of initiatives to develop alternatives to the DTD  W3C supports the XML Schemas XSD specification

XML1 © 2015the University of Greenwich 62 goodbooks3.xsd <xs:schema xmlns:xs=" elementFormDefault="qualified"> Re-writing goodbooks2.dtd as an XML schema results in a significantly longer file. This is listed over the next 4 slides with the corresponding DTD for comparison

XML1 © 2015the University of Greenwich 63 goodbooks3.xsd <!ELEMENT book (author, title, year_published?, publisher, course+, recommended_by)> unless stated the value of minOccurs and maxOccurs is 1

XML1 © 2015the University of Greenwich 64 goodbooks3.xsd Note how the attribute definition is nested within the definition of the book element

XML1 © 2015the University of Greenwich 65 goodbooks3.xsd note data types

XML1 © 2015the University of Greenwich 66 goodbooks3.xsd

XML1 © 2015the University of Greenwich 67 Things to notice about goodbooks3.xsd  XML schemas are much more verbose than DTDs  The XML schemas language itself conforms to XML syntax rules and so can be manipulated using standard XML tools  More specific restrictions can be made on the occurrence of elements than with DTDs e.g.  both the above mean the same but in schemas minOccurs and maxOccurs can be used to restrict the number of allowed occurrences  In DTDs the only data type for elements is #PCDATA whereas schemas contain much more support for data types e.g.  A full range of data types are supported (e.g. boolean, float, datetime) plus you can define your own.  XML Schemas make use of namespaces

XML1 © 2015the University of Greenwich 68 Linking a Schema to an XML document Not totally standard and somewhat tied to W3C but the method below works with at least some tools that support Schemas <recommended_books xmlns:xsi=" xsi:noNamespaceSchemaLocation="goodbooks3.xsd"> Stephen Spainhour this line associates the schema stored in goodbooks2.xsd in the same directory with the XML document

XML1 © 2015the University of Greenwich 69 Namespaces Namespaces are a way of avoiding name conflicts, i.e. where different XML vocabularies use the same names to mean different things. In designing an XML based language we may want to include elements from several other XML languages e.g. ProductMLCustomerML InvoiceML when defining a new XML language to describe invoice documents we may want to draw on existing languages for describing products and customers

XML1 © 2015the University of Greenwich 70 Namespaces What to do about name clashes, e.g. it is likely that ProductML and CustomerML both contain elements Giant Widget George Barford We don't want applications that process InvoiceML to confuse the elements. Dear Mr Giant Widget, Your George Barford has been despatched today...

XML1 © 2015the University of Greenwich 71 Namespaces Namespaces give a mechanism for "qualifying" element names with a prefix so that they are all unique, e.g. Giant Widget George Barford Wherever you see element names including a prefix followed by a ":" you can be sure that namespaces are being used e.g.

XML1 © 2015the University of Greenwich 72 Namespaces The prefix needs to be defined in the XML document that is using it by including the xmlns attribute. For example to define the prod: and cust: prefixes in an invoice document declaring a default namespace that uses no prefix <invoices xmlns:prod=" xmlns:cust=" xmlns=" Giant Widget George Barford.... declaring a namespace associated with the prod prefix declaring a namespace associated with the cust prefix

XML1 © 2015the University of Greenwich 73 Namespaces In the previous example it is tempting to guess that this line… <invoices xmlns:prod=" xmlns:cust=" xmlns=" associates the prod: prefix with an XSD located at and cust: with one at But these URLs need not be actual locations at all - they are simply unique names used to identify namespaces. URIs (URLs & URNs) are convenient ways of specifying unique values. There is a way of tying prefixes to actual XSDs (but not DTDs) so that documents can be validated against multiple Schemas. The syntax is both messy and unclear.

XML1 © 2015the University of Greenwich 74 References  There are masses of XML books and websites.  SAMS Teach Yourself XML in 24 hours - Morrison  Cheap as chips, good scope but little depth  W3Schools online tutorial  Try their online XML test  World Wide Web consortium at  The home of the XML specification and so much more.  XML in practice from  Articles, white papers, user groups and more

XML1 © 2015the University of Greenwich 75 Summary  XML is a meta-language used to define application specific markup languages  XHTML, SVG, MathML, RSS, SOAP, WSDL, etc., etc.  XML is designed to be straightforward and easy to use  XML separates content from presentation  CSS and XSL can be used to render XML documents in a readable form  more on XML rendering next week  XML provides simple syntactic rules that result in well-formed hierarchically structured documents  DTDs or XSDs are used to define valid XML languages  DTDs are  widely supported  have limited features  XSDs are  an XML language  provide tighter specification than DTDs  provide some support for namespaces

XML1 © 2015the University of Greenwich 76 Questions