Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.

Slides:



Advertisements
Similar presentations
Introduction to the Logical Structure of XML Documents Web Engineering, SS 2007 Tomáš Pitner, Michael Derntl.
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: Extensible Markup Language
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
XPath Eugenia Fernandez IUPUI. XML Path Language (XPath) a data model for representing an XML document as an abstract node tree a mechanism for addressing.
Introduction to XML: DTD
XML Schemas Microsoft XML Schemas W3C XML Schemas.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Representing Web Data: XML CSI 3140 WWW Structures, Techniques and Standards.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XML Extensible Markup Language Aleksandar Bogdanovski Programing Enviroment LABoratory
New Perspectives on XML, 2nd Edition
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XPath Aug ’10 – Dec ‘10. XPath   XML Path Language   Technology that allows to select a part or parts of an XML document to process   XPath was.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
Tutorial 13 Validating Documents with Schemas
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Well Formed XML The basics. A Simple XML Document Smith Alice.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 5 courtesy of Ghislain Fourny/ETH © Department of Computer.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Lecture 23 XQuery 1.0 and XPath 2.0 Data Model. 2 Example 31.7 – User-Defined Function Function to return staff at a given branch. DEFINE FUNCTION staffAtBranch($bNo)
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 14 This presentation © 2004, MacAvon Media Productions XML.
XML Extensible Markup Language
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Unit 4 Representing Web Data: XML
Querying and Transforming XML Data
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
The XML Language.
Chapter 7 Representing Web Data: XML
Presentation transcript:

Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka

2 XML data modelling issues data model: describes which information contained in XML documents is accessible may be different for different processors and applications format of input and output data data that is used internally e.g. values of expressions

3 Data modelling issues the ”normal” processing model: an XML processor parses an XML document checks at least the well-formedness may also validate the document provides the information of the document in some form to an application

4 XML 1.0 reporting requirements For instance: an XML processor must always provide all characters in a document that are not part of markup to the application a validating XML processor must inform the application which of the character data in a document is white space appearing within element content an XML processor must normalize line-ends to LF before passing them to the application

5 XML 1.0 reporting requirements (ctnd.) A validating XML processor must include the replacement text of an entity in place of an entity reference an XML processor must supply the default value of attributes declared in the DTD for a given element type but not appearing in the element’s start tag

6 XML data modelling issues several XML data models exist: XML Information set (Infoset) base for the other data models describes information after parsing PSVI (Post Schema Validation Infoset) type information added XQuery 1.0 and XPath 2.0 Data Model also used in XSLT 2.0 input/output + internal representation DOM

7 XML Information set W3C Recommendation: 24 Oct 2001 purpose: to provide a set of definitions for use in other XML specifications that need to refer to the information in a well-formed XML document not meant to be exhaustive; not a set of minimum requirements that a processor has to return abstract definitions: no concrete interfaces etc. provided

8 XML Information set An XML document’s information set consists of a number of information items an information item is an abstract description of some part of an XML document mainly to be used in other specifications each information item has a set of associated named properties

9 XML Information set describes the tree structure provided by the processor (no special interface is specified) e.g. entities expanded to their replacement text, attributes with their default values properties: e.g. for each element its child elements and attributes

10 Information items document information item element information items attribute information items processing instruction information items unexpanded entity reference information items character information items

11 Information items (cont.) comment information items document type declaration information item unparsed entity information items notation information items namespace information items

12 Document information item correponds to the document as a whole do not confuse with the ”real” root element (-> document element) there is exactly one document information item in the information set all information items are accessible from the properties of the document information item, either directly or indirectly through the properties of other information items

13 Document information item Properties: children document element notations unparsed entities base URI character encoding scheme standalone version all declarations processed

14 Document information item children property: an ordered list of child information items, in document order: exactly one element information item (=document element) one processing instruction (PI) information item for each PI outside the document element (the same for comments) comments and PIs within the DTD are excluded a document type declaration information item (if there is a DTD) document element property: The element information item corresponding to the document element

15 Element information items There is an element information item for each element appearing in the XML document one of the element information items is the value of the document element property of the document information item (root element) all other element information items are accessible recursively

16 Element information items An element information item has the following properties: namespace name local name prefix children (element, pi, character, comment) attributes namespace attributes in-scope namespaces base URI parent

17 Attribute information items There is an attribute information item for each attribute (specified or defaulted) of each element in the document including namespace declarations attributes declared in the DTD with no default value and not specified in the element’s start tag are not represented by attribute information items

18 Attribute information items An attribute information item has the following properties: namespace name, local name, prefix normalized value specified Was the value specified in the start tag or defaulted from DTD? attribute type (ID, IDREF, ENTITY, NMTOKEN,…) references (target of IDREF = some ID) owner element

19 Character information items there is a character information item for each data character that appears in the document each character is a logically separate information item but XML applications are free to chunk characters into larger groups as necessary propertiers of a character information item: character code Is this character element content whitespace? parent

20 Example <msg:message doc:date=” ” xmlns:doc=” xmlns:msg=” >Phone home!

21 The information items for the sample document A document information item an element information item with namespace name ” local part ”message”, and prefix ”msg” an attribute information item with the namespace name ” local part ”date”, prefix ”doc”, and normalized value ” ”

22 The information set for the sample document (cont.) three namespace information items for the namespaces two attribute information items for the namespace attributes eleven character information items for the character data

23 What is not in the information set? For instance, the document type name the difference between the two forms of an empty element: and the order of attributes within a start-tag white space within start-tags (other than significant white space in attribute values) and end-tags the difference between CR, CR-LF, and LF line termination

24 Data model of XPath defines input for XPath, XSLT and XQuery all values needed in the expressions of these specifications based on XML information set augmented by possible schema validation information -> input = Post Schema- Validation Infoset (PSVI)

25 Data model of XPath data model supports also values that are not supported by XML information set, e.g. well-formed document fragments, sequences of fragments, sequences of documents atomic values (boolean, integer…), sequences of atomic values, sequences of mixing nodes and atomic values

26 Tree model A conceptual model: no particular implementation is assumed A tree that contains nodes (7 types): document node element nodes attribute nodes text nodes namespace nodes processing instruction nodes comment nodes

27 For each node: tree properties and content ”tree properties”: parent children attributes namespaces the content of text, attribute, or element node can be interpreted in two ways: as a string value: ”123” as a typed value: (integer) 123

28 Values of properties and content every value handled by the data model is a sequence of zero or more items an item is either a node or an atomic value a sequence cannot be a member of a sequence a single item appearing on its own is modeled as a sequence containing one item

29 Document order document order is defined on all the nodes in the document: root node is the first node element nodes in order of the occurrence of their start tags attribute nodes and namespace nodes before the children of the element namespace nodes before attribute nodes

30 Document node a tree whose root node is a document node is referred to as a document otherwise the tree is a fragment the element node for the document element is a child of the document node other children: processing instruction nodes comment nodes string-value: concatenation of the string- values of all text node descendants of the document node in document order

31 Element nodes An element node for every element in the document children: element nodes (subelements) comment nodes processing instruction nodes text nodes (content) string-value: concatenation of the string-values of all text node descendants of the element node in document order

32 Attribute nodes Each element node has an associated set of attribute nodes the element node is the parent of each of these attribute nodes but: an attribute node is not a child of its parent element a defaulted attribute is treated the same as a specified attribute

33 Attribute nodes if an attribute was declared for the element with the default #IMPLIED, but the attribute was not specified on the element, there is no attribute node for this attribute String-value: the normalized value as specified by the XML specification

34 Namespace nodes Each element has an associated set of namespace nodes one for each distinct namespace prefix that is in scope for the element one for the default namespace if one is in scope for the element The element is the parent of each of these namespace nodes, but a namespace node is not a child of its parent element string-value: the namespace URI

35 PI nodes, comment nodes There is a processing instruction node for every processing instruction there is a comment node for every comment string-value: the content of the comment not including … except for PIs and comments in document type declarations

36 Text nodes Character data is grouped into text nodes as much character data as possible is grouped into each text node string-value: the character data characters inside comments, processing instructions and attribute values do not produce text nodes

37 Example the specification contains an example (the same we had in the lecture): datamodel/#d0e datamodel/#d0e3694 See e.g. the string values of different nodes. There’s also a picture in the end. Don’t worry if you don’t understand everything!

38 Data model issues we have seen... general specification: XML Information Set one specification that is based on the Information Set: XPath and XQuery data model