XML and friends Part 1 - XML and DTD ELAG 2001 workshop 8 Jan Erik Kofoed © BIBSYS Library Automation
XML © Jan Erik Kofoed XML – eXtensible Markup Language Describing content The grammar of the language is given, but ”any” word is allowed Readable for both man and machine Generally usable as meta language Uses tags for describing content
XML © Jan Erik Kofoed SGML and XML SGML – Standard Generalized Markup Language, ISO 8879:1986(E) XML – Extensible Markup Language 1.0 2nd Ed., W3C Recommendation XML is compatible with SGML SGML is of many considered as difficult and expensive XML is designed to be easy to implement and to operate together with SGML and HTML
XML © Jan Erik Kofoed Design goals for XML 1.XML shall be straight forwardly useable over the internet. 2.XML shall support a wide variety of applications. 3.XML shall be compatible with SGML. 4.It shall be easy to write programs which process XML documents. 5.The number of optional features in XML is to be kept to the absolute minimum, ideally zero. 6.XML documents should be human-legible and reasonably clear. 7.The XML design should be prepared quickly. 8.The design of XML shall be formal and concise. 9.XML documents shall be easy to create. 10.Terseness in XML markup is of minimal importance.
XML © Jan Erik Kofoed HTML and XML HTML – HyperText Markup Language 4.01, W3C Recommendation –Now replaced by: XHTML - Extensible HyperText Language 1.0, W3C Recommendation Both XML and HTML is compatible with SGML. HTML describes presentation. XML describes content. XHTML is HTML with XML syntax.
XML © Jan Erik Kofoed HTML shows layout Book Hamsun, Knut: Markens grøde. Oslo, Aschehoug, 1948
XML © Jan Erik Kofoed XML is marking the content Hamsun, Knut Markens grøde Oslo Aschehoug 1948
XML © Jan Erik Kofoed One simple document Åse Østby Furuveien Skogheim
XML © Jan Erik Kofoed Remember to add encoding! Åse Østby Furuveien Skogheim
XML © Jan Erik Kofoed Architecture of XML-documents Processing instruction Element content –Empty element Attribute Comment Entity &entity; CDATA ]]> DTD
XML © Jan Erik Kofoed Example is the root element. ]]> Hamsun, Knut Markens grøde Kristiania Gyldendal 1917 <Dedication "Til Marie" on the title page. >
XML © Jan Erik Kofoed Presented i MS IE is the root element. ]]> Hamsun, Knut Markens grøde Kristiania Gyldendal 1917
XML © Jan Erik Kofoed Two types of XML documents Well formed XML –Must follow certain rules Valid XML –Must be well formed –Must follow rules given in a DTD, Document Type Definition
XML © Jan Erik Kofoed Well formed XML 1.The document must begin with a XML declaration. 2.All elements that contains data must begin with a start and end tag. 3.Empty elements without end tag must end with: /> 4.One root element must span all other elements. 5.Elements may be nested, but cannot be overlapped. 6.Attribute values must be inside quotes: “ “ 7.The characters < and & must only be used to start tags and entities. 8.An element may not have two attributes with the same name. 9.Comments and processing instructions may not appear inside tags.
XML © Jan Erik Kofoed Rules for XML Name of elements and attributes –must start with a letter or _ –can then contain letter, number, -,. or _ –is case sensitive –may not start with xml, XML, Xml, xMl... –non ASCII (“national”) letters are allowed Standard character set is UTF8 (Unicode) –use the encoding attribute.
XML © Jan Erik Kofoed Reserved attributes (1) xml:lang –language code. Defined in RFC xml:space –preservePreserve space, tab and carriage return –defaultThe XML processor decides how spaces shall be processed.
XML © Jan Erik Kofoed Reserved attributes (2) xml:link –simpleone way pointer –documentpointer to a member of a group –extendedmultiple and extended pointer –groupgroup with pointers to documents xml:attribute –old-attribute new-attribute switches attributes
XML © Jan Erik Kofoed Entities in XML Entities starts with & and ends with ; && << >> &qout;” '’ &#xnnnn;character with Unicode value nnnn
XML © Jan Erik Kofoed Document type definition DTD Rules for the structure of XML documents Defines names of elements and attributes Defines succession (order) Defines occurrence Defines type of attributes Defines default values for attributes Defines required elements and attributes
XML © Jan Erik Kofoed DTD: entities (1) General entity – –Ex.: – &www; Parameter entity – –Ex.: –
XML © Jan Erik Kofoed DTD: entities (2) External entities – – &addresses; Non-XML entity – Notation –
XML © Jan Erik Kofoed DTD: ELEMENT –
XML © Jan Erik Kofoed DTD: Definition of occurrences (none)Must occur exactly once. ?Can occur zero or once. +Must occur once or more. *Can occur zero or more.
XML © Jan Erik Kofoed DTD: attributes Example: –
XML © Jan Erik Kofoed DTD: Attribute types –CDATAString –(name | name |...)List of values –ENTITYDefined entity –ENTITIESList of entities –IDUnique identifier –IDREFReference to an ID –IDREFSList of ID references –NMTOKENA word built from name characters –NMTOKENSList of nmtokens –NOTATIONNon-analyzed entities
XML © Jan Erik Kofoed DTD:Examples (1) <!ATTLIST person name CDATA #REQUIRED number ID #REQUIRED sex (M | K) #IMPLIED> <person name=”Mary Hill” number=”p ” sex=”M” />
XML © Jan Erik Kofoed DTD:Examples (2) Gold feber
XML © Jan Erik Kofoed DTD:Examples (3) <!DOCTYPE DOCUMENT [ ]> Susan Jack Chelsea David
XML © Jan Erik Kofoed DTD:Examples (4)
XML © Jan Erik Kofoed DTD:Examples (5a)
XML © Jan Erik Kofoed DTD:Examples (5b) Hamsun, Knut Markens grøde Oslo Aschehoug 1948