XML Fundamentals Cheng-Chia Chen.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

An Introduction to XML and Web Technologies XML Documents
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: Extensible Markup Language
Namespace in XML Transparency No. 1 Namespace in XML Cheng-Chia Chen.
SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
An Introduction to XML Based on the W3C XML Recommendations.
Introduction to XML: DTD
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Namespace in XML Transparency No. 1 Namespace in XML Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
Extensible Markup Language XML MIS 520 – Database Theory Fall 2001 (Day) Lecture 14.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
26-Jun-15 XML. 2 HTML and XML, I XML stands for eXtensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 11 Creating XML Document
XML Primer. 2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)
XML Fundamentals Transparency No. 1 XML Fundamentals Cheng-Chia Chen November 2004.
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 2 XML Documents (Based on Møller and Schwartzbach,
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XML Syntax - Writing XML and Designing DTD's
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
XML Basics A brief introduction to XML in general 1XML Basics.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
Unit 10 Schema Data Processing. Key Concepts XML fundamentals XML document format Document declaration XML elements and attributes Parsing Reserved characters.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
Well Formed XML The basics. A Simple XML Document Smith Alice.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
PART 1 XML Basics. Slide 2 Why XML Here? You need to understand the basics of XML to do much with Android All of they layout and configuration files are.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
Extensible Markup Language XML
Extensible Markup Language XML
The XML Language.
Chapter 7 Representing Web Data: XML
New Perspectives on XML
Presentation transcript:

XML Fundamentals Cheng-Chia Chen

Well-formed XML XML Data Model Namespaces Contents Well-formed XML concrete textual representation of XML XML Data Model Conceptual tree model Namespaces How does XML avoid name conflicts?

Well-formed XML Document An XML document is a sequence of characters: Each character is an atomic unit of text as specified by ISO/IEC 10646 [unicode]. usually given a .xml extension file name MIME media type: application/xml or text/xml Ex: <?xml version=“1.0” encoding=“UTF-8”> <student> 張得功 </student>

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] Characters used in XML A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks(#xD800~#xDFFF, FFFE, and FFFF. */ character encoding may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings.

ASCII – Needs 7-bits of storage Codes 0 – 127 used ASCII code ASCII – Needs 7-bits of storage Codes 0 – 127 used

[3] S ::= (#x20 | #x9 | #xD | #xA)+ Whitespace White Space: [3] S ::= (#x20 | #x9 | #xD | #xA)+ S (white space) consists of one or more space (#x20) characters, tabs, carriage returns or line feeds. Whitespace can used to separate otherwise indistinguishable parts of an XML Document. <student age=“15”>…</student> <studentage=“15”>…</student>

the document, expressed XML Declaration <?xml version=“1.0” encoding=“Big5” standalone=“no” ?> Besides using file extension name, an xml document may use an XML declaration to identify itself as an XML document. If used, it should occur first (no proceding whitespace allowed) in the document. Version of the XML specification 1.0 or 1.1 character encoding of the document, expressed in Latin characters, e.g., UTF-8, UTF-16, iso-8859-1, no: parsing affected by external DTD subset yes: not affected .

Elements, tags and character data The example : <?xml version=“1.0” encoding=“UTF-8” ?> <student> 張得功 </student> is composed of a single element named student Start-tag: <student> End-tag: </student> Everything between start-tag and end-tag is called content Content encompasses real information Whitespace is part of the content, though many applications will choose to ignore it <student> and </student> are markups 張得功 and its surrounding whitespace are character data

Structure of an element Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, and may have a set of attribute specifications. The name used in start-tag and end-tag must be identical. Note: xml is case sensitive, so <student> != <Student> Each attribute specification has a name and a value. Element [39] element ::= EmptyElemTag | STag content ETag

[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) Element (cont’d) Content of Elements : those between the start-tag and end-tag are called the element's content: [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) i.e., Any string containing none of <, & and ]]>. If an element has empty content, it is represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. Tags for Empty Elements [44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>‘

Examples of empty elements <IMG align="left” src="http://www.w3.org/Icons/WWW/w3c_home" /> ></IMG> 1. <br></br> 2. <br/> <br> </br> Note: 1 = 2 != 3.

Start tag with attribute ( in document) and end tag <tag attributeName = “ attrbute-value “ … > </tag> name of the attribute value or values of the attribute Each element may contain zero or more attributes name(or type) of the element single or double quotes, ‘ or “ must match start tag and end tag name must match

Attach additional information to elements Attributes Attach additional information to elements An attribute is a name-value pair attached to an element’s start-tag One element can have more than one attribute Name and value are separated by = and optional whitespace Attribute value is enclosed in double or single quotation marks <tel type=“office”>02-29381111</tel> Attribute order is not significant <student age=“20” gender=“male”> 趙得勝 </student>

[40] STag ::= '<' Name (S Attribute)* S? '>' Start Tag Start-tag [40] STag ::= '<' Name (S Attribute)* S? '>' [ WFC: Unique Att Spec ] [41] Attribute ::= Name Eq AttValue Example: <termdef id=“dt-dog” term=“dog”> End-tag [42] ETag ::= '</' Name S? '>’ </termdef> </termdef > vs </ termdef> < /termdef>

Rules for naming elements, attributes… Names and Tokens XML Names Rules for naming elements, attributes… Names and Tokens [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name ( #x20 Name)* [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (#x20 Nmtoken)* Names beginning with (x|M)(m|M)(l|L) are reserved. Name is used for naming elements, attributes, entities etc. Nmtoken (Nmtokens) is used for values of special attributes(ID,IDREFS,NMTOKEN,NMTOKENS).

AttValues (attribute value literal) are those that can occur as an attribute value. [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" Enclosed by double or single quotes. Can contain entity/char references (see later slide)or any char data but excluding < and & and ( ’ or ”).

Comments may appear Comments 1. anywhere in a document outside other markup; 2. within the document type declaration at places allowed by the grammar. They are not part of the document's character data. The string "--" (double-hyphen) must not occur within comments. Comments [15] Comment ::= '<!--' ( (Char - '-') | ('-' (Char - '-')) )* '-->' Example: <!-- declarations for <head> & <body> --> <error <!-- comments cannot appear here! --> a=“aa”> ..

Processing Instructions (PIs) Processing instructions (PIs) allow documents to contain instructions for applications. Processing Instructions: [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) The PI begins with a target (PITarget) used to identify the application. The remaing part is called PIData, which should not contain substring “?>”. The target names "XML", "xml", and so on are reserved for standardization. Ex: <?xml-stylesheet type=“text/css” href=“style.css” ?> xml-stylesheet is reserved for XSLT stylesheet.

Processing Instruction and comment <?PItarget ***other staff*** ?> <!-- 這是說明或註解 --> may contain any characters except the substring “?>” may contain any characters except the string “--”

[1]document ::= prolog element Misc* XML Document [1]document ::= prolog element Misc* elemet is called the root or document element of the document [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [27] Misc ::= Comment | PI | S

What if the character data inside an element contains < or & ? Character references What if the character data inside an element contains < or & ? <expr> x+1 < z </expr> Instead of using ‘<‘, we can use its character code (60) reference: < --- decimal #60 < --- hexadecimal #X3c or #x3C Rule: if C is a char with code point dddd (decimal) or yyyy (hexideciaml), then we can represent C using & #dddd; or &#xyyyy; Cf: in C or Java, we use \t or \011 to represent HT (#09). \\ or \x5c to represent back slash \ (#x5c)

Numeric code is hard to remember. Entity reference Numeric code is hard to remember. Can use a name to denote a char or a string Such name is called an entity. Entity reference – If xxx is an entity => &xxx; is its entity reference While parsing an XML document, xml processor would replace every encountered entity reference with its actual character. XML predefines 5 entity references – you can define your own. < – the less-than sign (<) & – the ampersand (&) > – the greater-than sign(>) -- not needed in general " – the straight, double quotation marks (") &apos; – the straight single quote (')

What if my element content has a lot of special characters ? CDATA Section What if my element content has a lot of special characters ? Ex: <expr> x < y && z < 1 </expr> Solution 1: <expr> x < y &amps;&amps; z < 1 </expr> Hard to read/comprehend Solution 2: <expr><![CDATA[ x < y && z < 1 ]]></expr>

[18] CDSect ::= CDStart CData CDEnd [19] CDStart ::= '<![CDATA[' CDATA Sections CDATA sections may occur as part of the content of an element; used to escape blocks of text containing many special characters. begin with the string "<![CDATA[" and end with the string "]]>": CDATA Sections [18] CDSect ::= CDStart CData CDEnd [19] CDStart ::= '<![CDATA[' [20] CData ::= (Char* - (Char* ']]>' Char*)) [21] CDEnd ::= ']]>' What cannot occur inside a CDATA section? Ans: ']]>' Every character inside CDATA section is recognized as a literal character, so ‘<‘ and ‘&’ may and must occur in their literal form. Example: <![CDATA[<greeting>Hello, world!</greeting>]]>

Character Data and Markup XML Document consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations and white space outside root element All text that is not markup constitutes the character data of the document. I.e., it may occur in the content of an element or In the content of an CDATA Section.

Character Data and Markup (cont’d) In the content of elements, character data is any string of characters, which does not contain the start-delimiter (< and & ) of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>". To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double-quote character (") as """. Character Data : [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) i.e., Any string containing none of <, & and ]]>.

Possible contents of an element [39] element ::= EmptyElemTag | STag content ETag Content of Elements [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* In addition to char data and child elements, an element may contain as children also references, PIs, comments or CDATA sections.

General rules for well-formed XML Documents 1: balanced start and end tags The set of tags is unlimited but all start tags must have matching end tags Example of legal XML <student> <name> DeTsi Wang</name> <email> wang@cs.nccu.edu.tw</email> <age> 20 </age> </student> 2: There must be exactly one root element

Rules for well-formed XML Documents Rule 3: Proper element nesting All tags must be nested correctly. Like HTML, XML can intermix tags and text, but tags may not overlap each other. Legal XML <student> <name> DeTsi Wang</name> <email> wang@cs.nccu.edu.tw</email> <age> 20 </age> </student> Illegal XML <b><i>This text is bold and italic</b> and italic</i>

Rules for well-formed XML Documents Rule 4: Attribute values must be single or double quoted Legal <tag attribute=“value”> <tag attribute=‘value’> Illegal <font size=6> <font size=“60’> Rule 5: An element may not have two attributes with the same name <font size=“6” size = “10”/> Rule 6: Comments and processing instructions may not appear inside tags <font <!– error comment --> size = “6” /> Rule 7: No unescaped < or & signs may occur in the character data of an element or attributes <font zise=“<20”> 20&3 </font>

An example XML document : Recipes in XML Define our own “Recipe Markup Language” Choose markup tags that correspond to concepts in this application domain recipe, ingredient, amount, ... No canonical choices granularity of markup? simply <date>14 Jun 95</date> or <date><y>95</y><m>6</m><d>14</d></date> structuring? elements or attributes? ...

Example (1/2) <collection> <description>Recipes suggested by Jane Dow</description> <recipe id="r117"> <title>Rhubarb Cobbler</title> <date>Wed, 14 Jun 95</date> <ingredient name="diced rhubarb" amount="2.5" unit="cup"/> <ingredient name="sugar" amount="2" unit="tablespoon"/> <ingredient name="fairly ripe banana" amount="2"/> <ingredient name="cinnamon" amount="0.25" unit="teaspoon"/> <ingredient name="nutmeg" amount="1" unit="dash"/> <preparation> <step> Combine all and use as cobbler, pie, or crisp. </step> </preparation>

Example (2/2) <comment> Rhubarb Cobbler made with bananas as the main sweetener. It was delicious. </comment> <nutrition calories="170" fat="28%" carbohydrates="58%" protein="14%"/> <related ref="42">Garden Quiche is also yummy</related> </recipe> </collection>

Building on the XML Notation Defining the syntax of our recipe language DTD, XML Schema, ... Showing recipe documents in browsers XPath, XSLT Recipe collections as databases XQuery Building a Web-based recipe editor HTTP, Servlets, JSP, ... ...

There are more than one XML data model. XML data models An XML document may contain lots of information which not all applications would need/like to use. eg: <abc>abc <![CDATA[ def ]]> end</abc> <abc>abc def end</abc> need to be differentiated? XML data models are abstracted views of XML documents so that unintended information of an XML document is ignored in the model. There are more than one XML data model. DOM (document object model) XPath 1.0 ; XPath 2.0; XML information set … All uses tree structure to model an XML document. though we could also model XML documents as graphs.

Conceptually, an XML document is a tree structure XML Trees Conceptually, an XML document is a tree structure node, edge root, leaf child, parent sibling (ordered), ancestor, descendant

An Analogy: File Systems

Tree View of the XML Recipes

Text nodes: carry the actual contents, leaf nodes Nodes in XML Trees Root nodes: every XML tree has one root node that represents the entire tree Element nodes: define hierarchical logical groupings of contents, each have a name Text nodes: carry the actual contents, leaf nodes Attribute nodes: unordered, each associated with an element node, has a name and a value Namepace nodes: effective namespace associated with an element. Comment nodes: ignorable meta-information Processing instructions: instructions to specific processors, each have a target and a value

Types of node in an XML tree The tree contains nodes. Types of nodes and their possible children: root nodes : element ( = 1), comment, PI element nodes: element, text, PI, comment, [attribute, namespace] text nodes: leaves attribute nodes : leaves namespace nodes: leaves processing instruction nodes : leaves comment nodes : leaves

Rough classification: Data-oriented languages XML Applications Rough classification: Data-oriented languages inventory, customer and employee records in a company regular flat wide tree ; traditionally stored in db Document-oriented languages XHTML, DOCBook, WML, XML formats of word, openOffice loosely structured, tags ignorable, mixed content Protocols and programming languages XML Schema, XSLT, WDSL ebXML, XMI, BML Hybrids patient record : billing info; notes from doctor article collection: isbn, name; abstract

end tag must not be omitted element/attribute names all in lower case Example: XHTML XMLification of HTML end tag must not be omitted element/attribute names all in lower case attribute values must be present and quoted. decomposed into modules reuseable by other applications <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Hello world!</title></head> <body> <h1>This is a heading</h1> This is some text. </body> </html>

Example: CML CML : XML-based data-oriented language for representation of molecules and chemical reaction. <molecule id="METHANOL"> <atomArray> <stringArray builtin="id">a1 a2 a3 a4 a5 a6</stringArray> <stringArray builtin="elementType">C O H H H H</stringArray> <floatArray builtin="x3" units="pm"> -0.748 0.558 ... </floatArray> <floatArray builtin="y3" units="pm"> -0.015 0.420 ... <floatArray builtin="z3" units="pm"> 0.024 -0.278 ... </atomArray> </molecule> Methanol 甲醇

core data component, collaboration protocol agreements, messaging, Example: ebXML ebXML: a worldwide initiative aiming to utilize XML for exchange of electronic business data. It has delivered many XML standards for business processes, core data component, collaboration protocol agreements, messaging, registries and repositories. <MultiPartyCollaboration name="DropShip"> <BusinessPartnerRole name="Customer"> <Performs initiatingRole='//binaryCollaboration[@name="Firm Order"]/ InitiatingRole[@name="buyer"]' /> </BusinessPartnerRole> <BusinessPartnerRole name="Retailer"> <Performs respondingRole='//binaryCollaboration[@name="Firm Order"]/ RespondingRole[@name="seller"]' /> <Performs initiatingRole='//binaryCollaboration[...]/ <BusinessPartnerRole name="DropShip Vendor"> ... </MultiPartyCollaboration>

A XML-based markup language for theological texts. Example: ThML A XML-based markup language for theological texts. <h3 class="s05" id="One.2.p0.2">Having a Humble Opinion of Self</h3> <p class="First" id="One.2.p0.3">EVERY man naturally desires knowledge <note place="foot" id="One.2.p0.4"> <p class="Footnote" id="One.2.p0.5"><added id="One.2.p0.6"> <name id="One.2.p0.7">Aristotle</name>, Metaphysics, i. 1. </added></p> </note>; but what good is knowledge without fear of God? Indeed a humble rustic who serves God is better than a proud intellectual who neglects his soul to study the course of the stars. <added id="One.2.p0.8"><note place="foot" id="One.2.p0.9"> <p class="Footnote" id="One.2.p0.10"> Augustine, Confessions V. 4. </p> </note></added>

XML Namespace

<widget type="gadget"> <head size="medium"/> Motivation name clashes. Consider an XML language WidgetML which uses XHTML as a sublanguage for help messages: <widget type="gadget"> <head size="medium"/> <body><subwidget ref="gizmo"/></body> <info> <head><title>Description of gadget</title> </head> <body><h1>Gadget</h1> A gadget contains a big gizmo </body> </info></widget> Meanings of head and body depend on context! complicates things for processors and might even cause ambiguities. The solution: different namespaces for different use of the same name.

html head widget body head h1 … body info … The Idea Assign a namespace to each set of elements/attributes (which forms an XML language) http://www.w3.org/1999/xhtml Each namespace is identified and referenced by a URI Qualify every element/attribute names with the URI of its namespace: {http://www.w3.org/1999/xhtml}head => name = namespace URI + local part html head body h1 … widget head body info …

Problems for qualifying names URI as part of a name would use too much space since it is usually a long string. Not all URIs are legal Attribute/element names. (XML names do not allow/restrict the use of special characters: (.:_- ok) (/,#,%,… no) Solution: use namespace prefix as a proxy for namespace URI. xmlns:aPfx = “aURI” Notes: URI = URL  URN (extended to IRI at 1.1). URI here used only for identification - doesn't have to point at anything.

Namespace declarations Namespaces are declared by special namespace attributes (xmlns: or xmlns) and associated prefixes. Example: <foo:e1 xmlns:foo="http://www.w3.org/TR/xhtml1"> ... <foo:head>...</foo:head> </...> xmlns:prefix1="URI1" declares a namespace with a prefix: prefix1 and a URI: URI1. Scope rule: lexical A namespace declaration has effect on the element containing the declaration as well as all its descendants unless it is overridden by other declaration in nested declarations. Both element and attribute names can be qualified with namespaces. Note: the prefix is just a proxy - applications should use only the URI for identification.

could be disabled by xmlns="" The default namespace for backward compatibility and simplicity. declaration: xmlns=“aURI" Unprefixed element names are assigned the default namespace aURI. could be disabled by xmlns="" Default namespace declaration has no effect on attributes.

<ex xmlns=“http://test.com/” att1=“…” Example <ex xmlns=“http://test.com/” att1=“…” xmlns:s=“http://test.com/” > <ex att1=“abc” > … </ex> <ex xmlns=“” s:att1=“abc”>…</ex> </ex> Notes 1. the 1st and 2nd <ex> belong to the same namespace (http:://test.com/) but the 3rd <ex> belongs to no namespace. 2. Global attribute s:att1 belongs to namespace: http:://test.com. 3. Both <att1> attributes are local in the sense that they belong to the local namespace of ex and are different from s:att1. 4. Note the asymmetry of default namespace on elements and attributes

An example: WidgetML with namespaces <widget xmlns="http://www.widget.org" xmlns:xhtml="http://www.w3.org/TR/xhtml1" type="gadget"> <head size="medium"/> <body><subwidget ref="gizmo"/></big> <info><xhtml:head> <xhtml:title>Description of gadget</xhtml:title> </xhtml:head> <xhtml:body> <xhtml:h1>Gadget</xhtml:h1> A gadget contains a big gizmo </xhtml:body> </info></widget> The main part of WidgetML uses the default namespace which has the URI http://www.widget.org; XHTML uses the namespace prefix xhtml which is assigned the URI http://www.w3.org/TR/xhtml1.

Notes related to XML namespaces Namespace awareness. XML languages and applications should consider Namespaces as an inherent part of XML. Reserve colon (:) as a prefix/localpart separator and do not use it in your element/attribute names or any other names. It is the namespace URI instead of namespace prefix that is used for identifying a namespace. URI references which identify namespaces are considered identical only when they are exactly the same character-for-character. E.g. 1. http://a.b.c/~wine/d , 2. http://a.B.c/%7Ewine/d, 3. http://a.b.c/%7ewine/d , 4. d (relative URI deprecated) All 4 URIs are treated as equal in URI spec, but are seen as different namespace URIs in xml namespace. Note: Relative URI should not be used.

Uniqueness of Attributes Why are the <bad …> tags illegal ? <x xmlns:n1="http://www.w3.org" xmlns:n2="http://www.w3.org" > <bad a="1" a="2" /> <bad n1:a="1" n2:a="2" /> </x>

Uniqueness of Attributes (cont’d) Both <good … /> elements are legal. Why ? <x xmlns:n1="http://www.w3.org" xmlns="http://www.w3.org" > <good a="1" b="2" /> <good a="1" n1:a="2" /> </x>

XML: a notation for hierarchically structured text Summary XML: a notation for hierarchically structured text concrete textual representation and Well-formedness Conceptual tree model Namespaces

Essential Online Resources http://www.w3.org/TR/xml11/ http://www.w3.org/TR/xml-names11 http://www.unicode.org/