SDPL 20112: XML Basics1 2 Basics of XML and XML documents Survivor's Guide to XML, or XML for Computer Scientists / Dummies 2.1 XML and XML documents 2.2.

Slides:



Advertisements
Similar presentations
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
Advertisements

An Introduction to XML Based on the W3C XML Recommendations.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
History Leading to XHTML
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
Querying XML Documents and Data CBU Summer School (2 ECTS) Prof. Pekka Kilpeläinen Univ of Kuopio, Dept of Computer Science
Introduction to XML: DTD
Document Type Definitions
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
Chapter 10 © 2001 by Addison Wesley Longman, Inc. 1 Chapter 10 Sebesta: Programming the World Wide Web.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
COS 381 Day 14. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
VALIDATING AN XML DOCUMENT
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Structured-Document Processing Languages Spring 2011 Course Review Repetitio mater studiorum est!
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XML Syntax - Writing XML and Designing DTD's
Structured-Document Processing Languages Spring 2005 Course Review Repetitio mater studiorum est!
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
XML Extensible Markup Language Aleksandar Bogdanovski Programing Enviroment LABoratory
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
SDPNotes 2: Document Grammars and Instances 2 Document Grammars and Instances A look at the foundations of hierarchical document structures 2.1 Language-Theoretic.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
SDPL 2004Notes 2: Document Instances and Grammars1 2 Document Instances and Grammars Fundamentals of hierarchical document structures, or Computer Scientist’s.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
Structured-Document Processing Languages Spring 2004 Course Review Repetitio mater studiorum est!
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
SDPL 20062: Document Instances and Grammars1 2 Document Instances and Grammars Fundamentals of hierarchical document structures, or Computer Scientist’s.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Introduction to XML Extensible Markup Language.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Unit 4 Representing Web Data: XML
The XML Language.
Chapter 7 Representing Web Data: XML
Allyson Falkner Spokane County ISD
Presentation transcript:

SDPL 20112: XML Basics1 2 Basics of XML and XML documents Survivor's Guide to XML, or XML for Computer Scientists / Dummies 2.1 XML and XML documents 2.2 Basics of XML DTDs 2.3 XML Namespaces

SDPL 20112: XML Basics2 2.1 XML and XML documents n XML Extensible Markup Language, W3C Recommendation, February 1998 –not an official standard, but a stable industry standard –2 nd Ed 2000, 3 rd Ed 2004, 4 th Ed 2006, 5 th Ed Nov 2008 »editorial revisions, not new versions of XML 1.0 n a simplified subset of SGML, Standard Generalized Markup Language, ISO 8879:1987 –what is said later about valid XML documents applies to SGML documents, too

SDPL 20112: XML Basics3 What is XML? n Extensible Markup Language is not a markup language! –does not fix a tag set nor its semantics (unlike markup languages, e.g. HTML) n XML documents have no inherent (processing or presentation) semantics –even though many think that XML is semantic or self- describing; See next

SDPL 20112: XML Basics4 Semantics of XML Markup n Meaning of this XML fragment? –The computer cannot know it either! –Implementing the semantics is the topic of this course

SDPL 20112: XML Basics5 What is XML (2)? n XML is a (A) way to use markup to represent information (B) metalanguage »supports definition of specific markup languages through XML DTDs or Schemas »E.g. XHTML a reformulation of HTML using XML n (C) Often “XML”  XML + XML technology –that is, processing models and languages we’re studying (and many others...)

SDPL 20112: XML Basics6 How does it look? Pekka Kilpeläinen Pekka Kilpeläinen </client> XML Handbook XSLT Programmer’s Ref XML Handbook XSLT Programmer’s Ref </invoice>

SDPL 20112: XML Basics7 Essential Features of XML n Overview of XML essentials –many details skipped »some to be discussed in exercises or with other topics when the need arises –Learn to consult original sources (specifications, documentation etc) for details! »The XML specification is easy to browse

SDPL 20112: XML Basics8 XML Document Characters n XML documents are made of ISO (32-bit) characters; in practice of 16-bit Unicode chars (cf. Java) n Three aspects or characters (see next): representation by bytes/octets numeric code points visual presentation

SDPL 20112: XML Basics9 External Aspects of Characters n Documents are stored/transmitted as a sequence of bytes (or octets). An encoding determines how characters are represented by bytes. –UTF-8 (  7-bit ASCII) is the XML default encoding – encoding="iso " ~> 256 Western European chars as single bytes n A font (collection of character images called glyphs) determines the visual presentation of characters

SDPL 20112: XML Basics10 XML Encoding of Structure 1 n XML is, essentially, just a textual encoding scheme of labelled, ordered and attributed trees, in which –internal nodes are elements labelled by type names –leaves are text nodes labelled by string values, or empty element nodes –the left-to-right order of children of a node matters –element nodes may carry attributes (= name-string-value pairs)

SDPL 20112: XML Basics11 XML Encoding of Structure 2 n XML encoding of a tree –corresponds to a pre-order walk –start of an element node with type name A denoted by a start tag, and its end denoted by end tag –start of an element node with type name A denoted by a start tag, and its end denoted by end tag –possible attributes written within the start tag: –possible attributes written within the start tag: »names must be unique: attr k  attr h when k  h – text nodes written as their string value

SDPL 20112: XML Basics12 XML Encoding of Structure: Example <S> S E <W> W Hello Hello </W> W world! </W> <W> world! </S> A=1

SDPL 20112: XML Basics13 XML: Logical Document Structure n Elements –indicated by matching (case-sensitive!) tags … –indicated by matching (case-sensitive!) tags … –can contain text and/or subelements –can be empty: or (e.g. in XHTML) –unique root element  document a single tree

SDPL 20112: XML Basics14 Logical document structure (2) n Attributes –name-value pairs attached to elements –‘‘metadata’’, usually not treated as content –in start-tag after the element type name … … n Also: – –

SDPL 20112: XML Basics15 CDATA Sections n “CDATA Sections” to include XML markup characters as textual content if (Count 0) ]]> if (Count 0) ]]>

SDPL 20112: XML Basics16 Two levels of correctness (1) n Well-formed documents – roughly: follows the syntax of XML, markup correct (elements properly nested, tag names match, attributes of an element have unique names,...) –violation is a fatal error n Valid documents –(in addition to being well-formed) obey an associated grammar (DTD/Schema)

SDPL 20112: XML Basics17 XML docs and valid XML docs XML docs and valid XML docs XML documents = well-formed XML documents DTD-valid documents Schema-valid documents

SDPL 20112: XML Basics18 An XML Processor (Parser) n Reads XML documents and reports their contents to an application –relieves the application from details of markup –XML Recommendation specifies, partly, the behaviour of XML processors: –recognition of characters as markup or data; what information to pass to applications; how to check the correctness of documents; –validation based on comparing document against its grammar

SDPL 20112: XML Basics Basics of XML DTDs n A Document Type Declaration provides a grammar (document type definition, DTD) for a class of documents [Defined in XML Rec] n Syntax (in the prolog of a document instance): [ ]> n DTD = union of the external and internal subset (could be empty); internal has higher precedence –can override entity and attribute declarations (see next)

SDPL 20112: XML Basics20 Markup Declarations n DTD consists of markup declarations –element type declarations »similar to productions of ECFGs –attribute-list declarations »for declared element types –entity declarations »short-hand notations and physical storage units –notation declarations »information about external (binary) objects

SDPL 20112: XML Basics21 How do Declarations Look Like? <!ATTLIST item price NMTOKEN #REQUIRED unit (FIM | EUR) ”EUR” > string of name charactersname characters; name characters (Letter | [0-9] |. | - | _ | : ) +

SDPL 20112: XML Basics22 Element type declarations n The general form is where E is a content model (regular expr.); (≈ ECFG production elemType  E ) n Content model operators: E | F : alternationE, F: concatenation E? : optionalE* : zero or more E+ : one or more(E) : grouping n No priorities: either (A,B)|C or A,(B|C), but no A,B|C

SDPL 20112: XML Basics23 Attribute-List Declarations n Can declare attributes for elements: –Name, data type and possible default value: –Name, data type and possible default value: n Semantics mainly up to the application –parser checks that ID attributes are unique and that targets of IDREF and IDREFS attributes exist

SDPL 20112: XML Basics24 Mixed, Empty and Arbitrary Content n Mixed content: n Mixed content: –may contain text ( #PCDATA ) and elements n Empty content: n Empty content: n Arbitrary content: (= )

SDPL 20112: XML Basics25 Entities (1) n Named storage units or XML fragments (~ macros in some languages) –character entities: »< < and < all expand to ‘ < ‘ (treated as data, not as start-of-markup) »other predefined entities: & > &apos; &quote; for & > ' " –general entities are shorthand notations: School of Computing">

SDPL 20112: XML Basics26 Entities (2) n physical storage units comprising a document –parsed entities –parsed entities –document entity is the starting point of processing –entities and elements must nest properly: <!DOCTYPE doc [ <!ENTITY chap1 ( … as above …) > ] <doc>&chap1;</doc> …</sec> …</sec>

SDPL 20112: XML Basics27 Unparsed Entities n For connecting external binary objects to XML documents; (XML processor handles only text) n Usage: n Usage: –parser provides notation and address to the application –an SGML legacy technique(?)

SDPL 20112: XML Basics28 An Alternative Way n I have rarely used unparsed entities or notations n Easier "HTML-style" linking: n Easier "HTML-style" linking: –the link indicates the format and the address directly –maintenance of numerous entities might be easier with (indirect references through) explicit declarations

SDPL 20112: XML Basics29 Parameter Entities n to parameterize and modularize DTDs: %tableDTD; n to parameterize and modularize DTDs: %tableDTD; (Parameter entities inside a markup declaration allowed only in external DTD subset)

SDPL 20112: XML Basics30 Speculations about XML Parsing n Parsing involves two things: 1. Pulling the entities together, and checking the well- formedness 2. Building a parse tree for the input (a'la DOM), or otherwise informing the application about document content and structure (e.g. a'la SAX) n Task 2 is simple (  simplicity of XML markup; see next) n Checking well-formedness is straightforward; n Implementing validation is a bit more challenging

SDPL 20112: XML Basics31 Building an XML Parse Tree <S> S <W> W Hello Hello </W> <E> E <W> W world! world! </W> </S> </E>

Simplified XML Tree Building Algorithm C ← new Document(); // current node C ← new Document(); // current node Scan document text until at end: case of scanned text: " ":N ← new ElementNode(Type); C.addChildNode(N); C ← N; C ← N; "Txt": C.addChildNode(new TextNode(Txt)); " ":C ← C.parentNode(); return C; SDPL 20112: XML Basics32 Uses DOM like tree operations Uses DOM like tree operations

SDPL 20112: XML Basics33 Sketching XML validation n Treat the document as a tree d n Document is valid w.r.t. a grammar (DTD/Schema) G iff d is a syntax tree over G –Check that the root is labelled by the start symbol of G –For each element node n of the tree, check that its »attributes match with those of the element type »content matches the content model of its type: If n is of type A and its children of type B 1, …, B n, check that the grammar has a production A  E for which B 1 … B n  L(E) (1)

SDPL 20112: XML Basics34 Sketching the validation… (2) n How to check condition (1; matching of children with a content model)? –by an automaton built from content model E –Example: –Example:

SDPL 20112: XML Basics XML Namespaces n Documents often comprise parts processed by different applications (and/or defined in different schemas) –for example, in XSLT scripts: –for example, in XSLT scripts: </xsl:template> –How to manage multiple sets of names? HTMLelements XSLT elements/ instructions

SDPL 20112: XML Basics36 XML Namespaces (2/5) n Solution: XML Namespaces, W3C Rec. Jan’99, for separating possibly overlapping “vocabularies” (sets of element type and attribute names) within a single document n by introducing (arbitrary) local name prefixes, and binding them to (fixed) globally unique URIs –For example, the local prefix “ xsl: ” conventionally used in XSLT scripts

SDPL 20112: XML Basics37 XML Namespaces briefly (3/5) n Namespace identified by a URI (through the associated local prexif) e.g /XSL/Transform for XSLT –conventional but not required to use URLs –the identifying URI has to be unique, but it does not have to be an existing address n Association inherited to sub-elements –see the next example (of an XSLT script)

SDPL 20112: XML Basics38 XML Namespaces (4/5) </xsl:template></xsl:stylesheet>

SDPL 20112: XML Basics39 XML Namespaces briefly (5/5) n Built on top of basic XML –By overloading attribute syntax ( xmlns:foo=... ) –does not affect validation »namespace attributes must be declared for DTD-validity »all element type names must be declared (with their prefixes!) –> Other schema languages (XML Schema, Relax NG) better for validating documents with Namespaces n Processing languages allow to declare NS prefixes; see next

Example: Namespaces in XSLT SDPL : XML Basics 40 aaa aaa bbb bbb <xsl:transform version="1.0" xmlns:xsl=" xmlns:bar=" xmlns:bar=" </xsl:template></xsl:transform> XML:

Example: Namespaces in XQuery SDPL : XML Basics 41 aaa aaa bbb bbb xquery version "1.0"; declare namespace zap = " doc("ns-test.xml")//zap:* ns-test.xml : n For NS support in XML APIs, see next