XML eXtensible Markup Language by Darrell Payne
Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML Training Course 2001 DevXCon Training Conference Currently developing XML course for Logicon
XML eXtensible Markup Language Standard General Markup Language(SGML) Meta-tag language Used for creating other markup languages Standard adopted for SGML in 1986 Hyper Text Markup Language(HTLM) Application of SGML Formatting Language eXtensible Markup Language(XML) Meta-tag language XML = DATA World Wide Web Consortium W 3 C ((!Standard) && (Specification)) // “c” code humor XML Version 1.0 February 1998 XML is not designed to replace HTML
SGML – HTML – XML diagram SGMLXML HTML application WML application
XML Family of Tools and Their Relationship Namespace SAX DOM XML Info set XML Schema XPath XSL XSLT XPointer XLink Style and Transformation Linking and Pointing Underlying And Object Model Programmatic Interface Complex Data Modeling XML Developer's Guide - McGraw Hill - Page 12
HTML vs. XML Html Predefined tags Syntax is loose File extensions usually “.html” of “.htm” Not required to be Well – Formed Some closing tags optional Attribute value quotes may be omitted XML User defined tags Syntax is exact File extensions usually “.xml” Closing tags mandatory Required to be Well - Formed
Well - Formed All XML documents must be Syntactically correct! Single root element All element start tags have end tags XML is case sensitive Properly nested tags //error //correct Attributes values in quotes “value“ or ‘value‘
Basic XML Parts Markup Tags Attributes, names and values Character Data Text PCDATA CDATA Binary XML document has two main sections Prolog Root Misc Optional and considered superfluous
Simple XML File Hello World! <!-- More Comments -->
Declaration If used: required Declaration optional Specifies version to which document conforms XML documents without XML declaration might be assumed to conform to the latest version Other declaration examples optional Default – Good for ASCII text – 8 bit characters “UTF-16” Good for foreign – 16 bit characters Used for Unicode characters To stay uniform use with 8 or 16 optional No external subset referenced – default
Comments <!-- More Comments --> XML uses same comment syntax as HTML
Root Element Lines preceding root element are contained in the Prolog All XML documents must contain only one root element All other elements are “child element”s
Child Element Hello World!
Sibling Element Hello World! Goodbye World! Nothing more to add!
Updating Microsoft’s Internet Explorer instmsia.exe Updates Microsoft’s Installer msxml3sp1.exe Updates Microsoft’s Internet Explorer IE now has built-in XML parser “msxml”
Create XML Document Include declaration Create root element Create child element Enter child element text “student name” Save file with “.xml” extension Open using Internet Explorer After success, add siblings elements and retest using Internet Explorer
Document viewed in Microsoft's Internet Explorer
More about Elements Element types Container Element Contains other elements Data Element Contains DATA Mixed Content Contains other elements and DATA Empty Element Contains no elements or DATA
Container Element Contains other elements Some text way down here in the center of it all
Data Element Contains DATA Parsable Character Data PCDATA Character Data CDATA
PCDATA Contains text Can be parsed by parser Can contain all text except < > “ ‘ &
Entity References XML provides built in entity references < > " ' &
CDATA Contains text Is a declaration Can contain reserved characters, “, ‘, & Starts with / ends with <![CDATA[ Data would be here ]]> CDATA can not contain ]]>
Declarations
Why CDATA section “C++” code example CDATA example If (this->getX() < 5 && array1[0] != 3) cerr displayError(); PCDATA example If (this->getX() < 5 && array1[0] != 3) cerr << this->displayError();
Mixed Content Elements and PCDATA combined outer element stuff inner element stuff more outer element stuff
Empty Element Contains no text or data May have an attribute Short cut notation for empty element Does this look unfamiliar HTML example of such a type of tag //Non Well Formed //Well Formed
Elements element Start-tag content End-tag
Create XML Document 2 Include declaration Create root element Create child element Enter child element text “student name” Create child element Child to root, sibling to Make this an empty element Create child element Child to root, sibling to Enter C++ code example in PCDATA section Create child element Enter same C++ code in a CDATA section Save file Open using Internet Explorer
XML Parser – DOM & SAX Required to process an XML document C, Java, Python, Perl Parsers are of type Document Object Model(DOM) Tree structure Like a drive directory structure Slower and requires large amounts of memory Simple API for XML(SAX) Events driven Events = tags, text, etc. Smaller, faster, but requires programmer to deal with data Validating and non-validating
XML Structure Logical structure Document divided into units Allows sub units XML is a logical tree structure document Physical structure Data stored inside document Data stored outside document Entities one example
Valid Conforms to some schema schema “s” Document Type Definition(DTD) Schema By definition, all valid XML documents are Well – Formed documents
DTD Document Type Definition Document Type Declaration(DTD) File extension of “.dtd” DTD is not an XML document DTD is a schema “s” Introduced into an XML document via the Document Type Declaration Three types of DOCTYPE declarations Internal Subset Contained in the Prolog External Subset Exist in different file Prolog contains reference to file containing DTD Referenced using key work SYSTEM or PUBLIC Internal Subset and External Subset combination
Internal Subset <!DOCTYPE document [ ]> Hello World! <!-- More Comments -->
External Subset Hello World! <!-- More Comments -->
DTD for HelloWorld.xml
Internal Subset and External Subset combination I <!DOCTYPE document SYSTEM "HelloWorld3.dtd"[ ]> <!-- More Comments -->
Internal Subset and External Subset combination II
Putting it all together HelloWorld3.dtd HelloWorld3.xml <!DOCTYPE document SYSTEM "HelloWorld3.dtd"[ ]> <!-- More Comments -->
XML Validator Type in HelloWorld3.xml
Create XML Document 3 Create root element Create child element of session1 Enter child element text “xml class” Create child element of session2 Enter child element text “class information” Create child element of session3 Enter child element text “more” Create DTD for this file dtd_info.dtd Reference file in XML document Save files Open validate_vbs.html Enter.xml file name Validate
Schema Schemas are XML documents Schemas can be manipulated via a parser More complicated than DTDs Schemas have “ElementType”s
Schema vs. DTD
Schema vs. DTD II
Schema vs. DTD III
Topics not covered NamespaceWhitespace XpathXpointer Xlink XSLXSLT SOAPDDI Web Services SMIL XHTML