XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents (with DTD) Attributes, Entities, Style Sheets More concepts from the „XML family“
The HTML-Dilemma HTML - a language to markup documents Heading 1 Heading 2 paragraph...
The HTML-Dilemma HTML is... simple...but unfortunately... Extensibility: No semantic markup Structure: No complex structures beyond layout Validity: Structural weakness
SGML SGML - Rules to define markup languages +Metalanguage: Highly flexible +Architecture to process data on different media without losing the structure of the data ¬Complexity (user, programmer)
XML: The Language Concept What is XML ? Extensible Markup Language (XML) is a text-based meta- markup language which allows you to define an infinite number of markup languages based upon the standards defined by XML. Rather than providing a set of pre-defined tags, as with HTML, XML specifies the standards with which you can define your own markup languages with their own sets of tags.
XML is - as SGML - based upon the idea of structured markup of data XML: The Language Concept
Tags and attributes can be defined individually Document structure in any complexity can be described XML-documents can - but don‘t have to - contain a formal description of their grammar
XML: The Language Concept XML consists of tags content...that are nested content...and that constitute an XML- document, if some well- formedness rules are met.
Well-formed documents Every open tag must explicitly be closed Empty elements ( in HTML) in XML are written as or closed Attribute-values are to be put in quotation marks: Child markup must nest completely within parent markup, i.e. markup needs to be completely hierarchical (as SGML) No markup-character ( < or &) in text, all attributes are CDATA by default You should declare your XML version at the start:
Well-formed document „ORDER“ Mustermann cd rom drive monitor
XML Basics XML-documents are well- formed if they conform with basic syntax requirements XML provides rules for defining markup languages. There are two ways of defining these rules (i.e. the grammar of a particular markup language XML-documents can contain an explicit definition of required/allowed tags and their structure, i.e. a Document Type Definition (DTD). XML-documents that confirm with a DTD are valid
Valid document „Order“ Mustermann cd rom drive monitor
DTD of valid document „Order“ ORDER.DTD
Declaration of elements in a DTD Elements can contain other elements or character data Elements can have mixed content Elements can be defined as mandatory, optional, etc. <!ELEMENT a (b, c?, (d|e)+, f*) <!ELEMENT (address, cc*, message, signature?)
Attributes All elements can contain attributes: Attributes have to be declared similar to elements: Attributes can be optional, mandatory or „fixed“ <!ATTLIST DESCRIPTION ean CDATA #REQUIRED picture CDATA #FIXED „ status(sale | normal) „normal“>
Valid XML-Document Mustermann <DESCRIPTION ean=„ “ picture=„ status=„sale“>cd rom drive
DTD <!ATTLIST DESCRIPTIONean CDATA #REQUIRED picture CDATA #FIXED „ status(sale | normal) „normal“>
Valid XML-documents An XML-document is valid if it is well-formed and conforms with the specifications as defined in a DTD. Any well-formed XML-document can become valid if it is made compliant with a DTD. Functionally, a DTD is analogous to a relational database schema or an IDL. Applications can use the DTD to check an XML-document instance for structural validity and to create new instances of the defined document type.
Internal DTDs <!DOCTYPE ORDER [ ]> Mustermann DTDs can also be part of a document instance
Logical and physical structure of XML-documents The logical structure is determined by the sequence of tags in the document. Irrespective of the logical structure, an XML-document can be divided into any number of physical entities. Thus, it is possible to combine physically distributed XML-data into one XML-document. Entities references are used to refer to external data. References pointing to entities are written between „ & “ and „ ; “
External entity referneces <!doctype ORDER [ ]> &Head; &ItemsPC; &ItemsCD-ROM; XML-documents can be spread over different files:
XML Entities, Unicode <!DOCTYPE EXAMPLE [ ]> The new standard &xml; supports international character sets (ISO (Unicode)); the example shows different notations for number „1“: 1 (in ASCII), ١ (in Devanagari), १ (in Arabisch) and ൧ (in Malayalam).
Presentation of XML-documents XML-documents are presented using style sheets. A style sheet determines the document’s layout. Style Sheets are referred to by a processing instruction, e.g.: W3C is developing XSL, a style sheet language for XML. In addition, presentation of XML-documents in a browser, for example, is possible using CSS which is also used to display HTML.
Why 2 Style-Sheet-Languages? 1) CSS: Simple; every element is assigned a layout 2) XSL: More than CSS (Scripting, Transformation), but more complex ORDER {background-color:blue} NAME, DATE, {Display:Block; font-size:28pt; font-family:Times,serif} {color:yellow}
XML and CSS Mustermann ORDER { Display: Block; background-color: blue; float: left; padding: 15pt} NAME, DATE, {Display: Block; font-size: 28pt; font-family: Times, serif} {color:yellow} BODY {Display: Block; background-color: green; float: left; padding: 12pt} DESCRIPTION {font-size: 28pt; font-family: Times, sans-serif} +=
The XML-family Besides the specifications of XML 1.0 (recommendation since ) there are more W3C initiatives on XML. The most important related standards are: XLink (Working Draft, ) XPointer (Working Draft, ) XML Namespaces (Recommendation, ) XSL (Working Draft ) DOM (Recommendation, ) RDF (Recommendation, ) XML Schemas (Working Drafts, ) (XML-Data, DCD, SOX, DDML)
Linking in XML XML supports much more powerful linking capabilities than HTML. XLink describes uni- as well as sophisticated multi-directional links. XPointer specifies a mechanism for pointing to fragments of a target document, even without identifiers: “book.html#section2”. simple link extended link (XLink) link to element in instance (XPointer)
Namespaces in XML How can an application know which namespace is relevant if different DTDs are in use (i.e. for own documents, data exchange or search engines)? In order to prevent element and attribute names from colliding namespaces have been developed. Example: „Title“ (heading, evidence of ownership) <EXAMPLE xmlns:h=" xmlns:b=" xmlns:p=" My XML text XML, Java and the future of the Web realty