2005 1 XML eXtensible Markup Language Part 2.

Slides:



Advertisements
Similar presentations
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
Advertisements

1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
XML Study-Session: Part II Validating XML Documents.
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
1 XML Major Sources: ppt CIS550 Course Notes, U. Penn, source for many slides Yaron Kanza’s slides, source.
XML eXtensible Markup Language.
Declare A DTD File. Declare A DTD Inline File For example, use DTD to restrict the value of an XML document to contain only character data.
1 XML – Extensible Markup Language DBI – Representation and Management of Data on the Internet.
1 XML Major Sources: ppt CIS550 Course Notes, U. Penn, source for many slides Yaron Kanza’s slides, source.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Unit 4 – XML Schema XML - Level I Basic.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Chapter 10: XML.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
Neminath Simmachandran
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
New Perspectives on XML, 2nd Edition
More XML namespaces, DTDs CS 431 – Carl Lagoze – Cornell University.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
Working with XML Schemas ©NIITeXtensible Markup Language/Lesson 3/Slide 1 of 36 Objectives In this lesson, you will learn to: * Declare attributes in an.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
1 Extensible Stylesheet Language (XSL) Extensible Stylesheet Language (XSL)
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
1 Extensible Stylesheet Language (XSL) Extensible Stylesheet Language (XSL)
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
DTD (Document Type Definition)
New Perspectives on XML
Presentation transcript:

XML eXtensible Markup Language Part 2

XML Entities

XML Entities should not be Confused with Entities in the Sense of the ER Model An entity is a short string that denotes more complex information, which may reside inside or outside the XML document or its DTD Entities save typing Entities facilitate easy changes (when the same change is likely to be made in many places) Sometimes entities must be used to circumvent XML syntax violations Applications should decode and encode entities, using their definitions

General entities A general entity is defined in the DTD And it is used in the document by writing &Name;

Example <!DOCTYPE mdb [ ]> Oh God! Woody Allen $2M

Browser View

Parameter Entities Parameter entities are used only within DTDs Internal entities are references within the DTD External entities are references that draw information from outside files Parameter Entity declaration:

An Example of a Parameter Entity <!ATTLIST person friend (yes | no) #IMPLIED id ID #REQUIRED knows IDREFS #IMPLIED>

Unparsed Entities <!DOCTYPE mdb [ <!ATTLIST movie id ID #REQUIRED opinion CDATA #IMPLIED starimage ENTITY #IMPLIED> ]> Entities are defined Types are defined

Data Oh God! Woody Allen $2M

Defining Entities Entities can be defined –in the local document as part of the DOCTYPE definition –with a link to external files that contain the entity data (this, too, is done through the DOCTYPE definition) –in an external DTD Define locally when the entity is being used only in one particular document Define by a link to an external file when the entity is being used in many documents

Defining Entities – An Example Local Definition: <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact ]> Global Definition: <!DOCTYPE [ <!ENTITY copyright SYSTEM " > ]>

Another Example <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact ]>

Example (cont’d) Mini-globe revolutionizes keychain industry Today As The World Spins introduces a new approach to key chains. With the new MINI-GLOBE keys can be kept inside a chain, called for upon demand, and stored safely. Never more will consumers lose a key or stand at a door flipping through a stack of keys seeking the right one. &trademark;&copyright;

XML Namespaces

XML Namespaces When an element name appears in two different XML documents, we would like to know that it has the same meaning in both documents –Is the tag used as the XHTML tag in both documents? –If two documents about books have the tag, does it mean that they use the same system for cataloging books?

What XML Namespaces are and What They are not Namespaces merely provide a mechanism for creating unique names (for elements and attributes) that can be used in XML documents all over the Web –A namespace is just a collection of names that were created for a specific domain of applications Namespaces are not DTDs and they do not provide a mechanism for validation of XML documents using multiple DTDs

Identifying an XML Namespace A name space is identified by a URI The URI does not have to point to anything –It is merely used as a mechanism for creating unique names An element or attribute name from a namespace has two parts prefix:name prefix identifies the namespace name is just a name from the namespace

Namespaces are not Part of the XML 1.0 Recommendation When an XML 1.0 parser sees a qualified name prefix:name the parser treats this name just as it would treat any other attribute or element name (it is legal to use the character “:” in element and attribute names) Namespaces must be hardwired into DTDs

But When an application sees a qualified name, it may recognize it and act accordingly –A browser identifies tags that belong to the XHTML namespace and processes them –An XSLT processor identifies tags and attributes that belong to the XSLT namespace and executes them

The W3C Recommendation for Namespaces in XML The two-part naming system is the only thing defined in the W3C Namespace recommendationW3C Namespace recommendation –and even that is not so short! This recommendation is just a collection of syntactic rules –Some rules are rather subtle

Declaring a Namespace An XML namespace is declared in the xmlns attribute XML Namespaces John Doe Using foo as the prefix, instead of using the URI, is more convenient

The Default Namespace The default namespace is declared without a prefix XML Namespaces John Doe All the elements belong to the default namespace

Technically The namespace mechanism is just a mapping from prefixes to URIs, e.g., – is replaced with It is done in a processing layer that operates on the element tree resulting from XML 1.0 parsing It creates unique names

DTDs as Namespaces The URI of a namespace may point to a DTD A DTD defines a namespace comprising all its element names and attribute names –But it is just a namespace – not a DTD!

Example xmlns:bib=“ xmlns:isbn=“ Proceedings of SIGMOD This document is invalid according to either DTD! But the document is well formed! (e.g., in the book element, attribute names are unique)

Alternatively, One Namespace can be Declared as the Default xmlns=“ xmlns:isbn=“ Proceedings of SIGMOD This document is well formed but invalid according to either DTD!

Scope of Namespaces The scope of a namespace declaration is the element containing the declaration and all descendant elements –Must use the prefix anywhere in the scope Only the default namespace can be redeclared More than one namespace can be declared in the same scope –At most one can be the default namespace –All others must have unique prefixes

What about Attributes? Recall that element names and attribute names must be qualified if they belong to a nondefault namespace Unqualified element names belong to the default namespace (if they are inside the scope) However, an unqualified attribute does not belong to the default namespace An unqualified attribute is processed according to the rules that apply to its element name

Namespaces and DTDs: The Problem DTD syntax does not support namespaces The previous example showed an XML document with two DTDs that were used as namespaces –It is impossible to declare constraints that specify where fragments from each namespace can occur

Namespaces and DTDs: The Solutions Use a namespace-aware schema language, or Modify one of the two DTDs so that it will be a DTD for the new document –Two alternatives, as illustrated on the next two slides, using the previous example

One Alternative Add the required new elements to the DTD Give the appropriate unique names to these elements using parameter entities

The Second Alternative Add the required new elements to the DTD, using qualified names Use the attribute-list declaration for the new elements to declare the namespace as a fixed value

Data Exchange and Data Representation in XML

Exchanging Relational Data Each tuple can be wrapped inside an element See example on the following slides

Two Ways of Wrapping Relations in XML Documents projects: title budget managedBy employees: name ssn age

The Project and Employee Relations in XML Pattern recognition Joe Joe Sandra Auto guided vehicle Sandra : Projects and employees are intermixed

Pattern recognition Joe Auto guided vehicles Sandra : Joe Sandra : Employees follow projects Projects Employees

Pattern recognition Joe Auto guided vehicles Sandra : Joe Sandra : Or without “separator” tags … Can be done if it is clear where each employee and each project starts

DTDs for the First Two Documents <!DOCTYPE db [... ]> <!DOCTYPE db [... ]>

Wrapping Relations is not a Good Design Strategy When designing XML documents from ER diagrams, –ER entities are described by XML elements –ER attributes can be described either by XML attributes or by subelements –How to represent ER relationships? By using the built-in relationship in XML between elements and subelements But it is not always possible, so ID references might have to be used

How to use XML Attributes XML attributes describe properties of the contents, rather than the contents cheese fromage branza A food made …

Attributes (cont’d) Another common use for attributes is to express dimensions or types

Jeff Cohen Irma Levy Using Attributes

It is not Always Clear When to Use Attributes L. Simpson L. Simpson

Using IDs Jeff Cohen Irma Levy ID attributes

How to Represent Relationships Two related ER entities, e.g., employees and departments, can be represented as follows A department is an element, and the employees are subelements of the department The relationship must be many-to-one or one-to-one –Subelements are the “many”

No Multiple Copies of the Same Element (to Avoid Redundancies) Cannot represent in this way –A many-to-many relationship –A relationship with more than two entities –A binary relationship between an entity and itself or between two entities that are related by an ISA relationship ID references must be used in the above cases

More Problematic Cases If there are several many-to-one relationships between two ER entities, then only one can be represented as an element-subelement relationship For example, employees can be subelements of their department But the relationship between a department and its manager (who is one of the employees) must be represented by an IDREF

Missing Information is another Problem If there could be an employee without a department, then employees cannot be represented as subelements of departments –IDREFS have to be used

Inverse Relationships XML does not have built-in inverse relationships Must use IDREF to represent inverse relationships For example, add an IDREF attribute to each employee element for denoting the department of the employee

XML Schemas W3Schools on XML Schemas

XML Schemas W3C XML Schema Language, also known as the language for XML Schema Definition (XSD) There are other proposals for XML Schemas

XSDs have Types XSDs use complex types that generalize the content model of DTDs (i.e., the regular expressions for describing elements) Many simple types, e.g., String, Integer –Generalize PCDATA and CDATA Many facets of simple types, e.g., length, maxInclusive, maxExclusive

xs:sequence and xs:all Can specify that subelements should appear in a specific order (i.e., sequence) or in any order (i.e., all) –But xs:all is not as general as xs:sequence Can restrict the number of occurrences of subelements, e.g., a departments can have between 10 and 100 employees

References References are to specific elements or attributes, e.g., a reference to “person”, where “person” is the name of an element

More Features Mixed content can be defined more generally, compared to DTDs Local and global definitions of elements and types Derived types by restriction or extension

XSDs and Namespaces XSDs recognize namespaces Easier (than with DTDs) to check validity of a document with respect to multiple schemas –A very important feature when collecting information from multiple heterogeneous sources –XSDs are more extensible than DTDs

Summary of XML XML is a new data format andits main virtues: –widespread acceptance –the (important) ability to handle semistructured data (data without schema) DTDs provide some useful syntactic constraints on documents, but as schemas they are weak How to store large XML documents? How to query them? How to map between XML and other representations?