Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML D Nathan Intro and formalism. Roots  A computer is not a typewriter electronic texts are more than sequences of characters they have structure, and.

Similar presentations


Presentation on theme: "XML D Nathan Intro and formalism. Roots  A computer is not a typewriter electronic texts are more than sequences of characters they have structure, and."— Presentation transcript:

1 XML D Nathan Intro and formalism

2 Roots  A computer is not a typewriter electronic texts are more than sequences of characters they have structure, and context they also have multiple readings  Markup provides a means of making structure, context and readings explicit only that which is symbolically explicit can be digitally processed digital processing is about more than reproducing paper

3 Textual ontologies  As annotations, markup adds value to data  Facilitate multiple readings and multiple usages different contexts different formats different audiences different purposes  There’s more: texts can not only be read but also analysed and manipulated

4 What is markup, again?  A way of naming and identifying the parts of a document in a sharable and consistent way  A way of making explicit the distinctions we want a computer to make when it processes a sequence of characters  Making the document “machine readable” (computers can read and process it as if they understand it)

5 ... and again?  “A set of codes that tell an agent how to interpret, process or display content”  Thus, it’s usually more useful to markup what things really are than what they look like

6 Example James Bond 1007 Fast Drive Aston Martin 420HP DB5 01865 007 080 25/10/06 Dear Mr Khazakstanspy It is with some regret that....  What is “25/10/06”?  How do we know?  What does the software know?

7 Design principles  XML came out of SGML - a system for incremental and collaborative “enrichment” of texts  XML design principles 1. XML shall be straightforwardly usable over the Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process XML documents. 5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero. 6. XML documents should be human-legible and reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness is of minimal importance.

8 XML  eXtensible Markup Language: a generic markup language  Simplifies the representation of structured data as linear character strings, i.e. can be thought of as: as a stream of text and/or as a (tree) structure  XML looks like HTML, except that it: is extensible must be well-formed can be validated is application-, platform-, and vendor- independent

9 XML landscape SGML XML Markup languages XML languages HTML XHTML MathML SMIL IXF SVG CBML Related technologies CSS XSL:FO XPath XQuery XSLT XLink layout navigate, query transform link Grammars Schema DTD

10 XML Formalism  Create explicit formal structures using only plain text  structures are defined by tags in angle brackets: eg:  tags are usually in pairs: a start/open tag, and an end/close tag: the dog chased...  but can also be single and closed: the dog sat down

11 Elements, tags and content  Elements Tags (opening, closing, empty) Content is not empty; it has no content

12 Attributes and values  Tags can have attributes with values : the dog sat down  Attribute names within elements are unique  Order of attribute/value pairs insignificant: the dog sat  Often attributes values have to be drawn from a closed set, e.g. consider: Fifi ?

13 Names  You can name your elements, attributes or values (almost) anything, but...  Names should begin with “a-z” or “_”

14 Characters  XML must be ASCII or Unicode  XML is case sensitive; in general use lower case  Reserved characters, &, “ less-than (<)< greater than (>)> ampersand (&)& quote (“)"

15 Character entity references  “Stand in” for reserved characters e.g. <  Provide standardised references e.g. &t-pal;  Provide “short cuts” for strings e.g. &n;  Have to be declared, but can be created to purpose

16  Nesting (hierarchy), but no overlap: the cat sat on the mat Syntax

17 More syntax  All elements must be closed  Most attributes have values; values must be enclosed in (plain) double quotes  There are no size or number limits

18 The XML document  A plain text file  Main parts: prolog, body  Body has a single root node (= element)  Comments  Processing instructions (PI) This (optional) special PI also called the XML declaration:  Document type declaration ]>

19 XML document layout  Is unimportant! ... in most circumstances, but some applications might treat the white space differently

20 This is the same as... Before the hammer descends on cap, his shield demolishes the evil mechanism! KRAK! The screaming suddenly stops-- and, in the ensuing silence, both men sink slowly to the ground...

21 ... this! Before the hammer descends on cap, his shield demolishes the evil mechanism! KRAK! The screaming suddenly stops- - and, in the ensuing silence, both men sink slowly to the ground...

22 Putting it together

23 ... in XML (The Guardian, July 1, 1997, Andrew Higgins in Hong Kong) A last hurrah and an empire closes down With a clenched-jaw nod from the Prince of Wales, a last rendition of God Save the Queen, and a wind machine to keep the Union flag flying for a final 16 minutes of indoor pomp...

24 XML capable software (other than displaying “raw” XML)  most browsers including XML, CSS, XSLT  software using XML-based data formats e.g. Transcriber may keep XML hidden but you can often manipulate it  software that exports data in some XML format e.g. MS Excel, Toolbox, Filemaker Pro  dedicated XML editing software e.g. oXygen


Download ppt "XML D Nathan Intro and formalism. Roots  A computer is not a typewriter electronic texts are more than sequences of characters they have structure, and."

Similar presentations


Ads by Google