Li Tak Sing COMPS311F. XML Markup languages Many people might not realize that there were markup languages even before computers were invented. What we.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
Website Design.
An Introduction to XML Based on the W3C XML Recommendations.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Tutorial 11 Creating XML Document
Introduction to XML: Yong Choi CSU Bakersfield.
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introduction to XML This material is based heavily on the tutorial by the same name at
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Computer Sciences Department
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
Li Tak Sing COMPS311F. Applets An applet is any small application that performs one specific task, sometimes running in the context of a larger program,
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Li Tak Sing COMPS311F. XML Schemas XML Schema is a more powerful alternative to DTD to describe XML document structures. The XML Schema language is also.
Week 1 Understanding the Web Design Environment. 1-2 HTML: Then and Now HTML is an application of the Standard Generalized Markup Language Intended to.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XHTML1 Building Document Structure Chapter 2. XHTML2 Objectives In this chapter, you will: Learn how to create Extensible Hypertext Markup Language (XHTML)
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML EXtensible Markup Language. Agenda Introduction to XML XML Rules XML Elements XML Attributes XML Validation XML Exercises XML Namespaces XML CDATA.
1 Credits Prepared by: Rajendra P. Srivastava Ernst & Young Professor University of Kansas Sponsored by: Ernst & Young, LLP (August 2005) XBRL Module Part.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
Well Formed XML The basics. A Simple XML Document Smith Alice.
Introduction to XML XML – Extensible Markup Language.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Objective: To describe the evolution of the Internet and the Web. Explain the need for web standards. Describe universal design. Identify benefits of accessible.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
PART 1 XML Basics. Slide 2 Why XML Here? You need to understand the basics of XML to do much with Android All of they layout and configuration files are.
XML Introduction to XML Extensible Markup Language.
XML Schema – XSLT Week 8 Web site:
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
1 Extensible Stylesheet Language (XSL) Extensible Stylesheet Language (XSL)
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
Unit 4 Representing Web Data: XML
Chapter 7 Representing Web Data: XML
Creating an XML Document
Presentation transcript:

Li Tak Sing COMPS311F

XML Markup languages Many people might not realize that there were markup languages even before computers were invented. What we refer to as a markup language consists of symbols used to annotate texts in documents. For example, in the early days of printing, authors prepared manuscripts of their books on papers. Proofreaders and editors marked on the manuscripts with a markup language that the people working in print shops understood. The symbols in this type of markup language would not actually appear in the resulting books, but they gave instructions on how to present the texts.

XML HTML is a modern markup language that deals with how data should be displayed on a Web browser. HTML does this by enclosing texts in begin and end tags. This is a sample HTML document. Sample HTML I am the first paragraph. I have two sentences. I am the second paragraph. I am longer than the first paragraph. I have three sentences.

HTML In the sample, we have the begin tags for level 3 heading, paragraph, bold font and italics font. There are corresponding end tags with an extra slash / like,, and. Note that the tags may be nested. For example, the pair of bold tags can be placed inside paragraph tags. The sample HTML document displays on Mozilla Firefox as follows.

XML basics So what exactly is XML? XML stands for Extensible Markup Language. It has been defined by the World Wide Web Consortium (W3C) with design goals including, but not limited to, the following: Compatible with the Internet Useful for a wide range of applications Easy to create and process Readable by humans.

XML basics An XML document is very useful. It can hold the data for a purchase order, an invoice, an employment application, a price list, a collection of music CDs or many other kinds of data. Below is a sample XML document, but keep in mind that XML documents found in the real world may be larger than the samples you see in this unit.

XML basics John Mary Lou 30 35

Processing instructions The first line is a processing instruction enclosed with. It captures the XML version number and the character set used. If you are using an XML tool to assist your creation of XML documents, the tool will generate their values for you according to the tool’s current configuration. Other processing instructions are allowed. In general, processing instructions provide information to applications to help them process XML documents. For example, stylesheet information may be provided to help applications correctly interpret the XML documents.

Elements An element is enclosed in a pair of begin and end tags. For instance, in the previous XML document, we have a begin tag and an end tag. The end tag looks just like the begin tag except for the extra slash. The employee-list element is the root element of this XML document. It has a child element employee which in turn has child elements name, hours and rate. The data can be used to calculate the weekly payroll. We see that elements in XML documents can nest and repeat.

Elements This document has elements named employee-list, employee, name, hours and rate. Element names are case sensitive in XML. Therefore is not the proper end tag for the begin tag due to the unmatched case in the first character of the tag name. The first character of an element name can be any letter from the alphabet or an underscore. The remaining characters can be alphanumeric, hyphens, underscores and even periods. Spaces are allowed in the content of an element as in the following element. Mary Lou Spaces are not permitted inside an element name. Therefore the following is not allowed. 30 After replacing spaces with hyphens or underscores, the following is allowed. 30

Empty elements You can use an empty element to represent that the item is unknown or not applicable. An empty element for a commission element can be represented in one of three ways.

Whitespaces The characters for spaces, line feeds, tabs and carriage returns are collectively called whitespaces. In XML adjacent whitespaces inside a pair of begin and end tags are significant. The following three elements are different unless programmers make the decision to treat them the same. Oliver Au

Whitespaces On the other hand, whitespaces outside of a pair of begin and end tags are insignificant The above and the following are the same in XML In HTML however, two or more consecutive whitespaces are always treated the same as one whitespace.

Entity references Can you spot a problem with the following element? 3

Entity references The content of the element is a Boolean expression that makes use of the less than operator 3 U+003C 5

Entity references Characherunicode in XMLDTD name &U+0026& <U+003C< >U+003E> "U+0022" 'U+0027&apos;

XML attributes An element can have any number of attributes. The following is an element that captures the year of publication of an attribute. Wiley This is another difference between HTML and XML. The double quotes around an attribute value, as in "2002", are optional in HTML but are compulsory in XML.

XML parsers The meaning of a sentence is not determined only by the words used. We often have to determine the sentence structure before we can correctly understand the sentence. In computer science and linguistics, parsing is the process of recognizing the structure of a program, an HTML document, an XML document or an English sentence. A program that performs this task is called a parser. All the popular Web browsers have a built-in XML parser. Even the programs that you write to process XML documents for a course assignment are also XML parsers. Fortunately, you don’t have to build the parsing capability from scratch as it comes with Java’s class library.

XML namespaces XML elements have names. When an application processes two or more kinds of XML documents, there may be element name conflicts. Suppose we have an XML document holding the information of some fruit. Apples Oranges

We have another XML document holding the information of a piece of furniture. Oak Dining Table

XML namespaces If we were to merge the two XML documents as one, XMP parsers trying to process the merged document will be confused. The element name table is used for different purposes under distinct structures. We can use qualified names to prevent confusion. In the following merged XML document, h and furn are local names. We qualify the local names with an optional prefix xmlns which stands for XML name space. Other prefixes are also allowed. The qualified name say xmlns:h is defined as a uniform resource identifier (URI) which is a character string identifying an Internet resource. An XML parser would not actually access the URI which just uniquely identifies a qualified name.

XML namespaces Apples Oranges Oak Dining Table

XML namespaces Prefixes and namespaces can be defined for elements at any level. Once defined, the prefixes can be used in the child elements. You can also define two prefixes in one element as shown in the root element below.

Apples Oranges Oak Dining Table

Default namespace Having to repeat the prefix on each tag is a tedious chore. An alternative is to define a default namespace as follows without the local names of h or furn. Prefixes are not required for the distinction.

Default namespace Apples Oranges Oak Dining Table

An XML document of library books This XML document uses a popular and space efficient character set utf-8 which employs 1 byte to represent commonly used characters and more bytes for others like Chinese characters. It has the advantage of being backward compatible with the original ASCII character set. The document demonstrates the use of attributes and comments.

An XML document of library books Complete idiot's guide to XML David Gulbransen Que Java developer's guide to e-commerce with XML and JSP William B. Brogden Chris

An XML document of library books Minnick Sybex XPath essentials Andrew Watt Wiley

XML versus HTML Due to their similar appearance and shared lineage, people often like to compare XML with HTML. It is true that both are captured in plain texts that can be edited with an ordinary editor and that their elements are enclosed in begin and end tags. But they also have important differences. The following table summarizes the differences between the two.

A comparison between HTML and XML XMLHTML Emphasizes data contentsEmphasizes data display Allows customized tagsOnly allow pre-defined tags Tages are case-sensitiveTages are not case-sensitive Multiple adjacent whitespaces in an element content are different from a single whitespace Multiple adjacent whitespaces in an element content are the same as a single whitespace Double quotes around attribute values are compulsory Double quotes around attribute values are optional Processed by tailor-made programs as well as generic XML parsers Processed mainly by standard Web browsers

Metalanguages A metalanguage is a language used to describe another language. Though XML is precise, it is also generic enough to allow many different documents to be syntactically correct. These documents are said to be well formed. For different applications, XML documents hold different kinds of data in different document structures. If one computer program produces XML documents for another program to process, the two programs must agree on the same document structure. A metalanguage builds on top of XML syntax to further describe the structure of the documents for the two programs to share. Starting in the next section, we will study two representative metalanguages Document Type Definition (DTD) and XML Schema Definition (XSD).

Document Type Definition (DTD) Many metalanguages have been used to specify XML document structures. DTD was the first such language proposed and it is still taught and used today. However, the popularity of DTD has been overtaken by a more powerful alternative called XML Schema. Our coverage on DTD will therefore be relatively brief.

Referring to a DTD file Following is anemployee-list with a declaration added. The first word after the DOCTYPE keyword must be the name of the root element which in our case is employee-list. In this declaration, we specify "employee-list.dtd" as the file to hold the allowed syntax for the employee-list element. We use the SYSTEM keyword to indicate that the DTD file is defined by ourselves. An alternative PUBLIC keyword may be used but it is not applicable to us in this course.

Referring to a DTD file John Mary 30 35

Referring to a DTD file Without any path information, the DTD file is assumed to be in the same directory as the XML file. We could use one of the following declarations which specify a DTD file with a relative path, an absolute path and a URL respectively. The double-dot.. in the relative path stands for the parent directory.

Defining elements in DTD The following is the content of the employee.dtd file with five declarations. An declaration has two pieces of information. The first one is the name of the element being defined. The second one is an expression that defines the element.

Defining elements in DTD The first declaration in employee.dtd defines an employee-list as zero or more employee elements using a trailing asterisk. (employee*) The second declaration defines employee as a sequence of name, hours and rate with commas. (name, hours, rate) The remaining declarations define individual elements name, hours and rate as parsed character data denoted by #PCDATA.

Repetitions in DTD The following are the characters you can place after an element in an expression to denote repetitions. SymbolMeaning *Zero or more times +One or more times ?Zero or one time

Choices in DTD An element can be defined as one of several things. For example, a vehicle element may be defined as a motorcycle, car, van or truck. We use vertical strokes to separate choices.

Attributes in DTD The following is the PUBLISHED element you saw earlier with two attributes. Que We can use an declaration to define the list of attributes allowed in an element. If we want to allow two attributes place and year in the PUBLISHED element, we use the following declaration.

Both attributes hold CDATA which stands for character data. The place attribute is required in the PUBLISHED element thus we use #REQUIRED. The year attribute has a default value of "2000" if not specified. Here are some additional options for attributes that could be used. OptionMeaning #REQUIREDAttribute values must specified in the XML element #IMPLIEDAttribute values are optional in the XML element "default value"Attributes will have the default value if ommitted. #FIXED "fixed value"Attributes have the fixed values

Drawbacks of DTD DTD itself does not follow XML syntax, which means that people using DTD have to learn a separate set of rules in addition to the XML rules. In addition, DTD has a rather limited set of data types. We cannot allow data more details than #PCDATA. For example, even integer data can only be defined as #PCDATA. The ways to construct complex elements are limited to simple sequence, repetitions and choices. For example, we will have an awkward definition to specify the course workload of a full-time student as three to six courses.

Drawbacks of DTD Finally, DTD does not support reuse. If two elements have a similar structure, their structures must be repeated at the top-level as follows.