Download presentation
Presentation is loading. Please wait.
1
XML: eXtensible Markup Language
Creating portable data
2
Java E-Commerce © Martin Cooke, 2003
Lecture contents Motivation Applications Well-formedness & validation Next lecture: processing XML in Java 03/01/2019 Java E-Commerce © Martin Cooke, 2003
3
Why use XML?
4
Java E-Commerce © Martin Cooke, 2003
Sun’s recent slogan “Java + XML = portable programs + portable data” 03/01/2019 Java E-Commerce © Martin Cooke, 2003
5
Java E-Commerce © Martin Cooke, 2003
Markup Information added to data to enhance its meaning Identification of parts, boundaries and relationships of elements within documents Identification of attributes Without markup, most documents appear as meaningful to machines as this document does to humans 03/01/2019 Java E-Commerce © Martin Cooke, 2003
6
Java E-Commerce © Martin Cooke, 2003
What’s wrong with HTML? <html> <head> <title> Martin’s page </title> </head> <body bgcolour=”#ffffff”> <p align=”center”> Some text or other </p> </body> </html> HTML is the most successful electronic publishing language ever invented, but .... .... HTML is for presentation, not content, limiting its applicability 03/01/2019 Java E-Commerce © Martin Cooke, 2003
7
Java E-Commerce © Martin Cooke, 2003
XML example <?xml version="1.0" encoding="UTF-8"?> <Curriculum> <Course Title="Z" Lect="Bogdanov"> <Lecture>Props</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
8
Java E-Commerce © Martin Cooke, 2003
XML example Encodes boundaries <?xml version="1.0" encoding="UTF-8"?> <Curriculum> <Course Title="Z" Lect="Bogdanov"> <Lecture>Props</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
9
Java E-Commerce © Martin Cooke, 2003
XML example Encodes boundaries roles eg course v lecture <?xml version="1.0" encoding="UTF-8"?> <Curriculum> <Course Title="Z" Lect="Bogdanov"> <Lecture>Props</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
10
Java E-Commerce © Martin Cooke, 2003
XML example Encodes boundaries roles eg course v lecture positions <?xml version="1.0" encoding="UTF-8"?> <Curriculum> <Course Title="Z" Lect="Bogdanov"> <Lecture>Props</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
11
Java E-Commerce © Martin Cooke, 2003
XML example Encodes boundaries roles eg course v lecture positions containment eg lecture is part of course <?xml version="1.0" encoding="UTF-8"?> <Curriculum> <Course Title="Z" Lect="Bogdanov"> <Lecture>Props</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
12
Java E-Commerce © Martin Cooke, 2003
XML example Encodes boundaries roles eg course v lecture positions containment eg lecture is part of course attributes eg title <?xml version="1.0" encoding="UTF-8"?> <Curriculum> <Course Title="Z" Lect="Bogdanov"> <Lecture>Props</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
13
A brief history of markup
GML: Generalised Markup Language Developed in 60’s and 70’s by IBM Used for IBM technical manuals SGML: Standardised GML 70’s, 80’s with ANSI standard in 1983 Flexible and very general, but difficult and costly HTML Early 90’s: compact markup for hypertext docs Now seen as a step backwards XML 03/01/2019 Java E-Commerce © Martin Cooke, 2003
14
Java E-Commerce © Martin Cooke, 2003
XML is … Simpler than SGML More flexible than HTML An application of SGML Not a markup language, but a toolkit … … however, common to refer to documents as being “written in XML” Surrounded by a family of technologies which extend its use (eg transformation) 03/01/2019 Java E-Commerce © Martin Cooke, 2003
15
Java E-Commerce © Martin Cooke, 2003
XML features Represent most kinds of information, unambiguously (unlike HTML) Easily customisable Supports internationalisation through UNICODE Allows validation of documents Easy to read by humans and machines Open standard, managed by W3C 03/01/2019 Java E-Commerce © Martin Cooke, 2003
16
Applications
17
Applications of a better data format
Better search engines find all places selling X Customised data presentation from a single source HTML WML (wireless markup language) PDF Reliable information exchange CML (chemical structures) VoxML (voice) B2B transactions of all types MathML etc 03/01/2019 Java E-Commerce © Martin Cooke, 2003
18
Java E-Commerce © Martin Cooke, 2003
XHTML Source: 03/01/2019 Java E-Commerce © Martin Cooke, 2003
19
Java E-Commerce © Martin Cooke, 2003
MathML example Can be used for display or calculations Source: 03/01/2019 Java E-Commerce © Martin Cooke, 2003
20
SVG (scalable vector graphics)
SVG benefits: Zooming Text stays text. Text in SVG images remains editable and searchable Small file size Display independence Interactivity and intelligence Source: 03/01/2019 Java E-Commerce © Martin Cooke, 2003
21
Java E-Commerce © Martin Cooke, 2003
VoiceXML Source: 03/01/2019 Java E-Commerce © Martin Cooke, 2003
22
Java E-Commerce © Martin Cooke, 2003
DocBook “DocBook is a very popular set of tags for describing books, articles, and other prose documents, particularly technical documentation. DocBook is defined using the native DTD syntax of SGML and XML. Like HTML, DocBook is an example of a markup language defined in SGML/XML.” Source: Source: 03/01/2019 Java E-Commerce © Martin Cooke, 2003
23
Java E-Commerce © Martin Cooke, 2003
Apache documentation Source: 03/01/2019 Java E-Commerce © Martin Cooke, 2003
24
Java E-Commerce © Martin Cooke, 2003
WML Source: 03/01/2019 Java E-Commerce © Martin Cooke, 2003
25
A closer look at XML documents
26
Java E-Commerce © Martin Cooke, 2003
Simple example <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
27
Java E-Commerce © Martin Cooke, 2003
Document prolog indicates that this document is marked up in XML Format: <?xml prop=value … ?> Param values must be quoted with single or double quotes (unlike HTML) <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
28
Document type declaration
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Specifies validation & root Format: <!DOCTYPE root-element SYSTEM dtd> DTD is optional (see later) 03/01/2019 Java E-Commerce © Martin Cooke, 2003
29
Java E-Commerce © Martin Cooke, 2003
Elements <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Building blocks of XML documents One root element Format <name att=value …> content </name> Or: <name att=val /> None-empty elements must have a closing tag (unlike HTML) 03/01/2019 Java E-Commerce © Martin Cooke, 2003
30
Java E-Commerce © Martin Cooke, 2003
Elements <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Elements may contain other elements An element’s start and end tags must reside within the same parent (ie boxes cannot overlap) 03/01/2019 Java E-Commerce © Martin Cooke, 2003
31
Java E-Commerce © Martin Cooke, 2003
Attributes <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Used to identify specific elements, or to elaborate elements Values must be quoted (single or double) Not always clear whether to use attributes or elements 03/01/2019 Java E-Commerce © Martin Cooke, 2003
32
Attributes or elements?
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Attributes shouldn’t really hold content Attribute order is ignored, whilst element order is significant Attributes values can be restricted (see DTDs later) Use attributes as unique refererences if needed 03/01/2019 Java E-Commerce © Martin Cooke, 2003
33
An XML document is a tree
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Curriculum Course Course Lecture Lecture Lecture Lecture Lecture 03/01/2019 Java E-Commerce © Martin Cooke, 2003
34
… with content at its leaves
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Curriculum Course Course Lecture Lecture Lecture Lecture Class diagrams Lecture Use cases Sets Propositions Predicates 03/01/2019 Java E-Commerce © Martin Cooke, 2003
35
Java E-Commerce © Martin Cooke, 2003
… and attributes <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Curriculum SYSTEM "Curric.DTD"> <Curriculum> <Course Title="Z" Lect="Kyrill Bogdanov"> <Lecture>Propositions</Lecture> <Lecture>Predicates</Lecture> <Lecture>Sets</Lecture> </Course> <Course Title="UML" Lect="Marian Gheorge"> <Lecture>Use Cases</Lecture> <Lecture>Class Diagrams</Lecture> </Curriculum> Curriculum Course Course Title=Z Lect=… Title=UML Lect=… Lecture Lecture Lecture Lecture Class diagrams Lecture Use cases Sets Propositions Predicates 03/01/2019 Java E-Commerce © Martin Cooke, 2003
36
Java E-Commerce © Martin Cooke, 2003
Well-formedness Element containing text or elements must have start and end tags Good <curriculum> <course>Z</course> <course>Java</course> </curriculum> Bad <curriculum> <course>Z <course>Java </curriculum> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
37
Java E-Commerce © Martin Cooke, 2003
Well-formedness Element containing text or elements must have start and end tags Empty element’s tag must close with /> Good <graphic filename=“icon.png”/> Bad <graphic filename=“icon.png”> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
38
Java E-Commerce © Martin Cooke, 2003
Well-formedness Element containing text or elements must have start and end tags Empty element’s tag must close with /> Attributes must be in quotes Good <course Title=“Java”> <course Title=‘Z’> Bad <course Title=UML> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
39
Java E-Commerce © Martin Cooke, 2003
Well-formedness Element containing text or elements must have start and end tags Empty element’s tag must close with /> Attributes must be in quotes Elements must not overlap Good <curriculum> <course>Z</course> </curriculum> Bad <curriculum> <course>Z </curriculum> </course> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
40
Java E-Commerce © Martin Cooke, 2003
Well-formedness Element containing text or elements must have start and end tags Empty element’s tag must close with /> Attributes must be in quotes Elements may not overlap Markup chars must not appear in parsed content Good <equation>5 < 2</equation> Bad <equation>5 < 2</equation> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
41
Java E-Commerce © Martin Cooke, 2003
Well-formedness Element containing text or elements must have start and end tags Empty element’s tag must close with /> Attributes must be in quotes Elements may not overlap Markup chars must not appear in parsed content Element names start with letters or ‘_’, and contain letters, numbers, ‘-’, ‘.’ and ‘_’ Good <curriculum> <_course> <time-slot> <book.chapter> Bad <1example> <the first> <book*chapter> 03/01/2019 Java E-Commerce © Martin Cooke, 2003
42
Java E-Commerce © Martin Cooke, 2003
Why the rules? Unlike HTML, an arbitrary XML document doesn’t necessarily have a grammar eg HTML knows that a <p> cannot contain another <p>, so the end tag is optional 03/01/2019 Java E-Commerce © Martin Cooke, 2003
43
Pros and cons of adding a grammar
Grammars enables documents to be validated Enforce restrictions such as required fields, limited choices Serves as a clear description of the syntax for users and developers Can act as a standard eg XHTML Good for debugging CONS More effort during development Grammars need to be maintained Can slow down processing Need to learn a new syntax (although it is trivial for CS) 03/01/2019 Java E-Commerce © Martin Cooke, 2003
44
Document Type Definition (DTD)
<!ELEMENT Curriculum (Course)*> <!ELEMENT Course (Lecture)*> <!ATTLIST Course Title CDATA #REQUIRED Lect CDATA #REQUIRED > <!ELEMENT Lect (#PCDATA)> A DTD is a sequence of declarations Doesn’t conform to XML syntax Easy to understand for CS #PCDATA keyword stands for “parsed character data” and means that the textual content will be parsed to look for XML entities (see later) 03/01/2019 Java E-Commerce © Martin Cooke, 2003
45
Java E-Commerce © Martin Cooke, 2003
Element definitions <!ELEMENT article (title,subtitle?,author+,(para|table|list)*,biblio?) > a,b,c sequence a? option a* zero or more a+ one or more (a|b|c) alternatives (a,b,c) grouping 03/01/2019 Java E-Commerce © Martin Cooke, 2003
46
Attribute definitions
<!ATTLIST Course Title CDATA #REQUIRED Lect CDATA #REQUIRED Lect2 CDATA #IMPLIED Semester (first|second) “first” > 03/01/2019 Java E-Commerce © Martin Cooke, 2003
47
Java E-Commerce © Martin Cooke, 2003
Entities <!ENTITY uos “The University of Sheffield” > DTDs can also contain entity definitions Simplest use is to substitute in any parsed text (PCDATA) eg &uos; CDATA is not parsed, so entities will not be substituted 03/01/2019 Java E-Commerce © Martin Cooke, 2003
48
Virtual keyboard definition
03/01/2019 Java E-Commerce © Martin Cooke, 2003
49
Alternative to DTDs: XML Schema
XML Schema is a proposal to introduce a grammar definining language which uses XML Adds better typing Predefined: byte, float, long, time, date, boolean, binary, language, uri-reference, … Boundaries on data values Pattern-matching 03/01/2019 Java E-Commerce © Martin Cooke, 2003
50
Java E-Commerce © Martin Cooke, 2003
Overall summary Knowledge of XML is essential for all computer scientists Should lead to a better web with easier to find information Interoperability Impetus towards robust and open standards within industry sectors Supports internationalisation via UNICODE Hardly ever need to write a parser again! 03/01/2019 Java E-Commerce © Martin Cooke, 2003
51
Resources
52
Java E-Commerce © Martin Cooke, 2003
Books Ray (2001) Learning XML, O’Reilly, McLaughlin (2000) Java & XML, O’Reilly, Check if later version 03/01/2019 Java E-Commerce © Martin Cooke, 2003
53
Java E-Commerce © Martin Cooke, 2003
Online documents XML Tutorials for Programmers (online XML parser -- requires registration) Transforming XML to PDF Why XML? XSL for fun and diversion Simplify XML programming with JDOM Easy Java/XML integration with JDOM, Part 1 Tip: Using JDOM and XSLT 03/01/2019 Java E-Commerce © Martin Cooke, 2003
54
Java E-Commerce © Martin Cooke, 2003
Websites 03/01/2019 Java E-Commerce © Martin Cooke, 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.