Introduction to Informatics - Fall 02 I.What is XML? XML and HTML Where does it fit in with other markup languages? II. How does it work? Your own private language DTDs and schemas XSLT: Extensible style sheet transformation language Xpath, Xlink, Xpointer, Xforms III. How will it change the web? Examples of XML applications
Introduction to Informatics - Fall 02 I. What is XML? XML is Extensible Markup Language It is a meta-language It is a language used to create languages that can describe data It is extensible Authors can define their own tags and attributes that can be easily processed and displayed across platforms XML became a World Wide Web Consortium (W3C) Recommendation 2/10/98, corrected 10/6/00
Introduction to Informatics - Fall 02 Phase 1: Began 6/96, ended in the W3C XML 1.0 Recommendation, 2/98 (revised 10/00) Phase 2: Began 2/98 Working Groups developed Recommendation Namespaces in XML (1/99) and Recommendation for style sheet linking (6/99) Phase 3: Began 9/99, with unfinished work from phase 2 and ended 5/02 Introduced a Working Group on XML Query XML Protocol Activity was launched in 9/00 Phase 4: Began 5/02, focus on completing work in progress, cleaning up existing specs, and aligning them better with each other and with other W3C specifications
Introduction to Informatics - Fall 02 So what ’ s wrong with HTML? It ’ s simple enough for children to use This is because it is rigid and inflexible It does a good job representing the structure and format of documents It can ’ t tell us anything about the meaning of documents It can be used across platforms It is rife with proprietary markup It can be searched The inability of search engines to capture the meaning of content leads to poor performance
Introduction to Informatics - Fall 02 So what’s right with XML? It should be easily usable over the Internet Web servers should require minimal configuration changes to be able to serve XML documents It should be easy to write programs that process XML documents Experimental XML software is written in Java, with some XML parsers contained in class files of a few KB XML documents should be human-legible and clear Users of XML can create their own tags and attributes with self-explanatory names An XML file should as readable as plain text
Introduction to Informatics - Fall 02 The XML standard should be prepared quickly The design of XML shall be formal and concise Syntax descriptions in XML specification use a formal grammar that is concise, easy to understand, and easy to translate into code XML documents shall be easy to create Well-formedness enables you to quickly mark up any document or translate it from HTML to XML Terseness in XML markup is of minimal importance Clear and unambiguous syntax is always given preference over saving a few keystrokes
Introduction to Informatics - Fall 02 XML is used to create specialized markup languages by defining sets of tags and attributes It is a subset of SGML and allows “ generalized markup ” It is useful for storing structured data that will be published in a variety of media By itself, XML does not define any tags You create your own tags (your own markup language) CML: Chemical Markup Language MathML: Mathematical Markup Language ebXML: Electronic Business Markup Language Properly done, XML documents can be viewed across platforms
Introduction to Informatics - Fall 02 XML describes data in a human readable and machine understandable format This format is intended to capture the meaning of the data There is no indication of how the data are to be displayed It is a database-neutral and device-neutral language Data marked up in XML can be targeted to different formats XML can also be used to publish data on different platforms
Introduction to Informatics - Fall 02 SGML HTML HTML 3.2 HTML 4.01 XHTML XML CSS CML ebXML MML XSLT Some relationships among markup languages
Introduction to Informatics - Fall 02 How XML supports other Web markup languages and applications
Introduction to Informatics - Fall 02 I. What is XML? XML and HTML Where does it fit in with other markup languages? II. How does it work? Your own private language DTDs and schemas XSLT: Extensible style sheet transformation language Xpath, Xlink, Xpointer, Xforms III. How will it change the web? Examples of XML applications
Introduction to Informatics - Fall 02 II. How does it work? An XML document us actually composed of three different files 1. The raw XML file (.xml) This file has the basic data marked up with XML tags It will contain markup that will link the file to both the DTD(or “ schema ” ) and the XSL stylesheet It must follow certain rules to be considered “ well formed ” and “ valid ” This is necessary if the document is to be displayed by a browser or parser
Introduction to Informatics - Fall 02 Here's a simple HTML document: Memo form TO: John Doe CC: Jane Doe FROM: Bozo T. Clown Please take note: our phone number has changed. Yours in clownitude, Bozo
Introduction to Informatics - Fall 02 XML reflects the structure of the data by creating tags identifying: The type of document as a Its content divisions: a and a When it was sent: An addressing scheme with two types of actions: and The sender of the message as The name of the recipient as The text of the memo: The signature as an entity called &sig;
Introduction to Informatics - Fall To: John Doe CC: Jane Doe From: Bozo T. Clown Please take note our phone number has changed. &sig; Here ’ s the same document as an XML file
Introduction to Informatics - Fall 02 Rules for writing XML There must be a “ root element ” Documents must be “ well formed ” Elements must be properly nested If a DTD is used, documents must be “ valid ” Markup on the document must conform to the DTD Every tag must be closed Empty tags are closed with a slash XML is case sensitive All attribute values must be in quotation marks All entity references must be declared in a DTD before being used in a document
Introduction to Informatics - Fall A Document Type Definition (DTD) It is a set of rules that defines the tags, elements, entities, attributes and other elements that can be used in XML files It determines how they can be used It also specifies how they are logically related Elements in a DTD are hierarchical and nested DTDs can be internal (within the document) or external (.dtd extension) For the XML document to be “ valid, ” it must conform to the rules laid out in the DTD to which it is linked
Introduction to Informatics - Fall 02 DTDs have Elements These are the basic tags used in the markup One must be a “ root element ” and is the most inclusive container All other elements are nested with it An element can be defined by using other elements It can also be defined as containing text (#PCDATA) The sequence determines the nesting Elements defined in the DTD must appear in the document There is special markup that allows choice
Introduction to Informatics - Fall 02 The generic form of an element is: The “ rule ” is the “ content model ” of the element It specifies the nested elements used to define the main element It also specifies the order in which the elements must appear In our example the root element is It is defined in terms of and It is written as:
Introduction to Informatics - Fall 02 DTDs have Attributes These contain additional information associated with the element The information is a form of metadata It is “ about ” the element rather than part of the element They are useful for enumerated data (ex: product id #) There is a small predefined set of attributes that can be used Attributes and their values appear in the opening tag of a paired tag (or in the unpaired tag)
Introduction to Informatics - Fall 02 The generic form of an attribute is: The element name is required because attributes must be attached to elements There is a set of attribute types that can be used to specify categories of content (for example) CDATA: Character data (anything except markup) ID: unique value (only appear once in a document) NOTATION: provides processing instructions (how to open a binary file)
Introduction to Informatics - Fall 02 In our example there is an attribute called “ type ” that is placed in the opening tag The value is “ informative ” Assume this is one of several types of memos that could be sent In a DTD, it might look like this: <!ATTLIST memo (informative | directive | scheduling) The “ | ” (pipe) is a separator It sets a condition where one one value from the sequence may appear in the document markup
Introduction to Informatics - Fall 02 Entities provide a type of shorthand in XML markup They reference text or other elements and call them when used in the DTD or document General entities place data into the document Internal means that they are used only within the document External means that they are in an external DTD and can be reused Parameter entities are used in the DTD They can refer to another element or group of elements and can be reused in the same or different DTDs
Introduction to Informatics - Fall 02 The entity has the generic form: In the example, it appears in the DTD as: In our example, we represented a text string with an entity “ Yours in clownitude, Bozo ” was represented in the document with: &sig; The entity is expanded when the document is parsed This is a convenient way to include large blocks of text that only have to be entered once
Introduction to Informatics - Fall 02 Here ’ s what a DTD (memo.dtd) would look like for this memo <!ELEMENT from (sender+) + = must appear at least once or many times ? = may be omitted or can appear once * = may be omitted or can appear many times | = one or the other but only one may appear #PCDATA = text
Introduction to Informatics - Fall 02 Schemas XML Schema are an alternative to DTDs DTDs are “ global, ” so an element can only be defined once This is a problem if the element is used differently in two different contexts Schemas allow global (the same everywhere) and local (differ in different contexts) elements DTDs cannot specify the data type of an element Schemas can specify data types DTDs are not written in XML Schemas are
Introduction to Informatics - Fall 02 Schemas divide content into two types Simple types These contain only text In DTDs these are represented by the attribute_type “ PCDATA ” (a name, integer, date … ) Complex types These elements define the structure of the document Some will contain other elements Some will contain elements and text Some will contain only text Some will be empty
Introduction to Informatics - Fall 02 Here is the memo DTD as a schema:
Introduction to Informatics - Fall To: John Doe CC: Jane Doe From: Bozo T. Clown Please take note our phone number has changed. &sig; Here is how the memo calls the schema
Introduction to Informatics - Fall An XSL stylesheet This file contains transformation rules that determine how the components of an XML file will be rendered and displayed in a range of formats (.xsl extension) With XSL-FO, specific formatting or style rules can be applied to specific components of a DTD This language is not supported by any browsers yet With XSLT, a transformation process can be specified to convert XML documents into other formats (HTML, RTF, LaTeX, text) This can be used An XSL stylesheet is also an XML document and must be "well formed"
Introduction to Informatics - Fall 02 The process begins with an XML document and an XSLT style sheet The XSLT parser translates both into trees The XML document is the source tree The XSLT style sheet is the style tree Trees consist of nodes Root nodeElement nodesText nodes Attribute nodesProcessing instruction nodes Namespace node The XSLT processor uses these trees to create a result tree This becomes the final or result document
Introduction to Informatics - Fall 02 Memo HeaderMemotext DateTo FromCC #PCDATAName Sender #PCDATA XML Memo as a source tree
Introduction to Informatics - Fall 02 Memo form And here ’ s what the XSL stylesheet might look like
Introduction to Informatics - Fall 02 There are other components of XML that greatly extend its power and flexibility Xpath This is a syntax that locates nodes in the hierarchical structure of an XML document It is used in XSLT This specifies the current node It uses patterns: these can be repeated throughout the document It also uses expressions: these are context specific This syntax is a sophisticated shorthand to use when writing processing instructions
Introduction to Informatics - Fall 02 Xlink This is extensible linking language It allows more complex type of linking Here ’ s a simple link Xlink defines “ linksets ” or extended links A set of files can be connected through a chain of links moving from the first to the last file in the linkset replace new onLoad
Introduction to Informatics - Fall 02 Xpointer This is a syntax for linking to specific locations within XML documents It uses Xpath expressions to define the locations #xpointer(element_name[position()=1]) This is appended to the end of a URL in an Xlink expression Xforms This is a subset of XML that is going to be used someday to allow more complex forms to be created in XHTML
Introduction to Informatics - Fall 02 I. What is XML? XML and HTML Where does it fit in with other markup languages? II. How does it work? Your own private language DTDs and schemas XSLT: Extensible style sheet transformation language Xpath, Xlink, Xpointer, Xforms III. How will it change the web? Examples of XML applications
Introduction to Informatics - Fall 02 III. How will it change the web? XML has interesting potential to change a portion of the web It is expected to move us closer to write once display anywhere (XSLT) It will be an important component of the “ semantic web ” Search engines that can process XML should be much more precise and return more relevant results It can improve business processes, particularly if professions develop their own markup languages
Introduction to Informatics - Fall 02 Examples of XML applications Resource Description Framework (RDF) This is a framework that allows the description and interchange of metadata Because it is designed to be platform independent, it becomes a hub for metadata activity RDF provides a model for metadata, and a syntax so that independent parties can exchange it and use it RDF makes it possible to use multiple pieces of software to process the same metadata It also allows a single piece of software to process (at least in part) many different metadata vocabularies
Introduction to Informatics - Fall 02 Extensible Hypertext Markup Language (XHTML) Synchonized MultiMedia Markup Language (SMIL) Math Markup Language (MathML) Chemical Markup Language (CheML) Commerce Markup Language (CML) Electronic Business XML (ebXML) National Library of Medicine XML Data formats Electronic Component Information Exchange (ECIX) Geography Markup Language (GML) Research Information Exchange Markup Language (RIXML) MARC to XML conversions