XML Extensible Markup Language
What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language – defines structure and meaning, NOT formatting or presentation
Why XML? ● supports construction of domain specific markup languages ● creates a common data format ● facilitates data interchange ● structures large, complex documents
History ● Standard Generalized Markup Language (SGML) – too complex ● Hyper-Text Markup Language (HTML) – not extensible, limited to small set of fixed tags – polluted with non-semantic tags (e.g.,, and the dreaded ● XML working group formed in 1996 ● XML is really a slimmed down SGML
XML Applications ● Chemical Markup Language (CML) – originaly a SGML application – used to describe: molecular structures and sequences, spectrographic analysis, crystallography, chemical databases, and so on ● Mathematical Markup Language (MathML) – adequate for almost all: education, scientific, engineering, business, economics, and statistics needs – limited for advances math/theoretical physics
MathML example a b 2 (a+b) 2
XML Document “Goodness” ● well-formed – satisfies the basic rules of XML syntax ● valid – satisifes the domain specific rules for the language as definded in the Document Type Definintion (DTD)
Well-formed 1.Must start with an XML declaratoin 2.Elements with content must contain matching start and end tags 3.Empty elements must end with /> 4.The document must contain exactly one element that contains all other elements 5.Elements may nest but not overlap
XML Declaration – standalone – yes if this file contains a complete document
Tags ● anything that begins with ● end tags begin with </ ● empty tags end with /> ● tag names – start with letter or underscore (_) – remianing characters can be letters, numbers, _, hyphens or periods
Attributes ● start tags can include zero or more attributes ● attributes are name/value pairs separated by and equals sign (=) ● the rules for attribute names are the same as for tag names ● the value is any string enclosed in quotes (single or double) ● if the sting contains quotes entity references must be used: ' or "
Comments ● can't be nested or contained inside start/end tags
Entity References && << >> "“ ''
CDATA ● used for content that resembles XML: <![CDATA[ Hello! ]]>
Valid Documents ● body matches Document Type Definition (DTD)
DTDs <!DOCTYPE greeting [ ]> Hello!
ELEMENTs ● name follwowed pattern ● pattern is similar to a regular expression
ATTLIST ● specifies attributes for a tag
Internal Document Type <!DOCTYPE root_element_name [ declarations ]> <!DOCTYPE greeting [ ]>
External Document Type (System) <!DOCTYPE root_element_name SYSTEM DTD_URL> <!DOCTYPE greeting SYSTEM “ >
External Document Type (Public) <!DOCTYPE root_element_name PUBLIC DTD_name DTD_URL>
Internal General Entity... THE END &bkc;
Internal Parameter Entity...
ELEMENTs
ELEMENT content_type ● ANY – anything ● #PCDATA – only character data – no contained elements ● EMPTY – element contains no content ● reg_exp – a regular expression denoting acceptable children
Regular expressions ● element_name ● re+1 or more ● re*0 or more ● re?0 or 1 ● (re1 | re2 |... | reN )re1 or re2... or reN ● (re1, re2,..., reN )re1 followed by re2,... reN
XHTML head element ● what is the declaration for the XHTML “head” element: – must contain exactly one “title” element – may contain at most one “base” element – may contain 0 or more “meta” elements – title, base, and meta elements can appear in any order
ATTLISTs <!ATTLIST element_name attribute_name type “default value”>