Download presentation
Presentation is loading. Please wait.
Published byNigel Pearson Modified over 9 years ago
1
XML : a brief introduction Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by: Email p.johnston@ukoln.ac.uk URL http://www.ukoln.ac.uk/
2
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 2 XML: a brief introduction Markup & markup languages SGML & XML Two perspectives on XML Some features of XML XML & HTML Uses of XML
3
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 3 Markup & markup languages Markup –text added to the data content of a document in order to convey information about data –markup pre-dates computers! Marked-up document contains –data and –information about that data (markup)
4
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 4 Markup & markup languages Markup language –formalised system for providing markup Definition of markup language specifies –what markup is allowed –how markup is distinguished from data –what markup means
5
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 5 Exercise 1 From your own experience, can you suggest –some instances of where markup is used? –some examples of markup languages?
6
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 6 SGML and XML Standard Generalized Markup Language ISO 8879 : 1986 General, flexible, powerful Used (mainly but not exclusively) in large publishing environments Extensible Markup Language Recommendation of W3C, 1998, 2000 Subset of SGML Less flexible; easier to implement, use Used (increasingly) everywhere…often invisibly… Define means of describing tree-structured data in text format, using markup embedded in data
7
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 7 SGML and XML –not strictly markup languages! –“meta-languages” - languages for describing markup languages –can define unlimited number of markup languages All conforming languages can be processed by single program (“parser”) Rules made public so any programmer can write parser Many parsers available for application developer Data independent of platform, vendor
8
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 8 A document perspective (1) Individual documents have structure –component parts –relationships between parts Physical structure –depends on medium Logical structure –hierarchical, tree structure –independent of physical rendition Document types –set of documents sharing common logical structural model
9
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 9 A document perspective (2) Logical structure communicated to human reader through presentational conventions Presentation defined by “procedural” markup –instructs “agent” what to do with text –e.g. how to format it Problems –markup specific to processing system –specific to delivery medium –human interprets logical structure but software can’t
10
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 10 A document perspective (3) Descriptive markup –identifies the logical components of a document –does not specify what procedures are to be applied to text –so e.g. how to format it must be specified separately Benefits –markup (potentially) independent of processing system –permits reuse and delivery to multiple media –makes logical structure available to software N.B. exchange requires consensus on what markup means!
11
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 11 Exercise 2 HTML –conceived as describing the logical structure of hypertext document –acquired features which described presentation –extended by browser vendors In the HTML examples, can you see –where markup describes presentation? –where markup describes logical structure?
12
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 12 A data perspective (1) The structured document is just one type of structured data Other types of structured data can be represented as tree-structures A “serialization” syntax is useful for various sorts of structured data (relational, object etc.) –for exchange between application programs on different platforms, across networks etc. SGML too complex, “heavyweight” - but XML ideal
13
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 13 A data perspective (2) A “document” might be any collection of information processed as a unit –a report –a patient record –a purchase order transaction –a configuration file for an operating system –some “structured information about a resource” (a metadata record) –… –etc! Applications less concerned with publishing, formatting, presentation
14
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 14 XML : elements XML uses embedded tags to delimit and label parts of document –tags Elements –containers delimited by tags which include element type name –start tag –end tag Elements may contain –character data –other elements –both of the above –nothing (empty elements) Document element as root of element tree
15
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 15 XML : attributes Attributes –pairs of names and values –occur inside element start tag, after element type name – Element can contain only one occurrence of each attribute Attribute values may contain –character data only Attribute values must be surrounded by quotes
16
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 16 XML : elements & attributes Nouns and adjectives? –use character data for “content” –use attributes for “information about content” Document-centric view? No hard and fast rules Design decisions tend to be based (wrongly!) on behaviour of tools XML documents are human-readable… … but ease of human-readability may not be the most important consideration in their design
17
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 17 XML : document types & vocabularies “XML lets me make up names for element types! Great!” But…. –XML says nothing about what your names mean –will a human recipient of your document recognise your element? –will a software agent process your element correctly? Communication requires consensus on –structural model of class of document/data –labelling of components –semantics of components Shared use of common XML “vocabularies”
18
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 18 XML : DTDs, XML Schemas Two methods to codify syntax rules of vocabulary used to describe document type –what markup is allowed –structural constraints on use of markup –say nothing about what markup means Document Type Definition (DTD) –inherited from SGML –part of XML Recommendation XML Schema –recent recommendation of W3C –support for data-typing i.e. tighter control on element content –support for combining vocabularies –use XML syntax
19
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 19 XML : Validation & well-formedness Validation –parser can check markup of individual document against rules expressed in DTD or Schema –authoring tool can enforce rules of DTD/Schema while document is edited Well-formed documents –not checked against DTD/Schema, but do follow basic syntax rules e.g. –all tags use proper delimiters –all elements have start and end tags –all elements nested –attribute values in quotes –appropriate use of special characters
20
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 20 Exercise 3 Well-formedness –Identify the errors which mean that the three examples are not well-formed XML –How would you correct the errors?
21
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 21 XML : namespaces (1) Applications wish to use element from multiple vocabularies (DTDs/Schemas) –particularly true of metadata applications Problems of “name collisions” – in GPs Directory Schema – in MPs Appointments Schema XML Namespaces –recommendation of W3C –provides universal naming mechanism A Namespace is a collection of names A Namespace is itself given a name, which has the form of a URI
22
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 22 XML : namespaces (2) Element type names and attribute names can be qualified by a namespace name (a URI) Association with namespace through use of a namespace prefix Declaration of namespace –xmlns:health=“http://nhs.gov.uk/xml/”http://nhs.gov.uk/ –xmlns:parl=“http://gov.gov.uk/xml/”http://gov.gov.uk/ Use of qualified name –
23
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 23 XML and HTML HyperText Markup Language (HTML) –recommendation of W3C (version 4.01) –designed as an application of SGML (not XML) –simple, easy to create –(partial?) support in browsers, editors –mixes description of structure and presentation Browsers –permissive – will display invalid HTML –support proprietary extensions Context –explosion of Web –new devices
24
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 24 XML and HTML (2) XHTML 1.0 –expression of HTML 4.01 as XML (not SGML) –same features but restrictions on syntax –case sensitivity, XML well-formedness rules –current W3C recommendation for creation of docs for Web XHTML 1.1 –modularisation of XHTML –separation of structural markup from presentational markup –support for managing extensions
25
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 25 Uses of XML (1) Data (and metadata) exchange –e-commerce –e-government (http://www.govtalk.gov.uk) –rights management –bibliographic data –news syndication –scientific data –health - patient records –(… plus hundreds more…) –Web services Within systems and between systems Many standards/protocols built on XML
26
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 26 Uses of XML (2) Storage –publishing –scholarly texts –archival finding aids –document management –… –preservation
27
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 27 XML : summary (1) Means of describing structured data in text format Independent of platform, vendor –reuse of data –exchange of data Used –for many types of structured data –in many different applications –both for storage and exchange –data may be stored in database, exposed as XML
28
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 28 XML : summary (2) Use of XML –usually invisible to end-user –increasingly invisible to information manager? –generated and consumed by software –requires consensus amongst communication partners
29
Managing networks: understanding new technologies, Birmingham, 13 Sep 2001 29 Acknowledgements UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. http://www.ukoln.ac.uk/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.