XML eXtensible Markup Language Part 2
XML Entities
XML Entities should not be Confused with Entities in the Sense of the ER Model An entity is a short string that denotes more complex information, which may reside inside or outside the XML document or its DTD Entities save typing Entities facilitate easy changes (when the same change is likely to be made in many places) Sometimes entities must be used to circumvent XML syntax violations Applications should decode and encode entities, using their definitions
General entities A general entity is defined in the DTD And it is used in the document by writing &Name;
Example <!DOCTYPE mdb [ ]> Oh God! Woody Allen $2M
Browser View
Parameter Entities Parameter entities are used only within DTDs Internal entities are references within the DTD External entities are references that draw information from outside files Parameter Entity declaration:
An Example of a Parameter Entity <!ATTLIST person friend (yes | no) #IMPLIED id ID #REQUIRED knows IDREFS #IMPLIED>
Unparsed Entities <!DOCTYPE mdb [ <!ATTLIST movie id ID #REQUIRED opinion CDATA #IMPLIED starimage ENTITY #IMPLIED> ]> Entities are defined Types are defined
Data Oh God! Woody Allen $2M
Defining Entities Entities can be defined –in the local document as part of the DOCTYPE definition –with a link to external files that contain the entity data (this, too, is done through the DOCTYPE definition) –in an external DTD Define locally when the entity is being used only in one particular document Define by a link to an external file when the entity is being used in many documents
Defining Entities – An Example Local Definition: <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact ]> Global Definition: <!DOCTYPE [ <!ENTITY copyright SYSTEM " > ]>
Another Example <!DOCTYPE [ <!ENTITY copyright "Copyright 2000, As The World Spins Corp. All rights reserved. Please do not copy or use without authorization. For authorization contact ]>
Example (cont’d) Mini-globe revolutionizes keychain industry Today As The World Spins introduces a new approach to key chains. With the new MINI-GLOBE keys can be kept inside a chain, called for upon demand, and stored safely. Never more will consumers lose a key or stand at a door flipping through a stack of keys seeking the right one. &trademark;©right;
XML Namespaces
XML Namespaces When an element name appears in two different XML documents, we would like to know that it has the same meaning in both documents –Is the tag used as the XHTML tag in both documents? –If two documents about books have the tag, does it mean that they use the same system for cataloging books?
What XML Namespaces are and What They are not Namespaces merely provide a mechanism for creating unique names (for elements and attributes) that can be used in XML documents all over the Web –A namespace is just a collection of names that were created for a specific domain of applications Namespaces are not DTDs and they do not provide a mechanism for validation of XML documents using multiple DTDs
Identifying an XML Namespace A name space is identified by a URI The URI does not have to point to anything –It is merely used as a mechanism for creating unique names An element or attribute name from a namespace has two parts prefix:name prefix identifies the namespace name is just a name from the namespace
Namespaces are not Part of the XML 1.0 Recommendation When an XML 1.0 parser sees a qualified name prefix:name the parser treats this name just as it would treat any other attribute or element name (it is legal to use the character “:” in element and attribute names) Namespaces must be hardwired into DTDs
But When an application sees a qualified name, it may recognize it and act accordingly –A browser identifies tags that belong to the XHTML namespace and processes them –An XSLT processor identifies tags and attributes that belong to the XSLT namespace and executes them
The W3C Recommendation for Namespaces in XML The two-part naming system is the only thing defined in the W3C Namespace recommendationW3C Namespace recommendation –and even that is not so short! This recommendation is just a collection of syntactic rules –Some rules are rather subtle
Declaring a Namespace An XML namespace is declared in the xmlns attribute XML Namespaces John Doe Using foo as the prefix, instead of using the URI, is more convenient
The Default Namespace The default namespace is declared without a prefix XML Namespaces John Doe All the elements belong to the default namespace
Technically The namespace mechanism is just a mapping from prefixes to URIs, e.g., – is replaced with It is done in a processing layer that operates on the element tree resulting from XML 1.0 parsing It creates unique names
DTDs as Namespaces The URI of a namespace may point to a DTD A DTD defines a namespace comprising all its element names and attribute names –But it is just a namespace – not a DTD!
Example xmlns:bib=“ xmlns:isbn=“ Proceedings of SIGMOD This document is invalid according to either DTD! But the document is well formed! (e.g., in the book element, attribute names are unique)
Alternatively, One Namespace can be Declared as the Default xmlns=“ xmlns:isbn=“ Proceedings of SIGMOD This document is well formed but invalid according to either DTD!
Scope of Namespaces The scope of a namespace declaration is the element containing the declaration and all descendant elements –Must use the prefix anywhere in the scope Only the default namespace can be redeclared More than one namespace can be declared in the same scope –At most one can be the default namespace –All others must have unique prefixes
What about Attributes? Recall that element names and attribute names must be qualified if they belong to a nondefault namespace Unqualified element names belong to the default namespace (if they are inside the scope) However, an unqualified attribute does not belong to the default namespace An unqualified attribute is processed according to the rules that apply to its element name
Namespaces and DTDs: The Problem DTD syntax does not support namespaces The previous example showed an XML document with two DTDs that were used as namespaces –It is impossible to declare constraints that specify where fragments from each namespace can occur
Namespaces and DTDs: The Solutions Use a namespace-aware schema language, or Modify one of the two DTDs so that it will be a DTD for the new document –Two alternatives, as illustrated on the next two slides, using the previous example
One Alternative Add the required new elements to the DTD Give the appropriate unique names to these elements using parameter entities
The Second Alternative Add the required new elements to the DTD, using qualified names Use the attribute-list declaration for the new elements to declare the namespace as a fixed value
Data Exchange and Data Representation in XML
Exchanging Relational Data Each tuple can be wrapped inside an element See example on the following slides
Two Ways of Wrapping Relations in XML Documents projects: title budget managedBy employees: name ssn age
The Project and Employee Relations in XML Pattern recognition Joe Joe Sandra Auto guided vehicle Sandra : Projects and employees are intermixed
Pattern recognition Joe Auto guided vehicles Sandra : Joe Sandra : Employees follow projects Projects Employees
Pattern recognition Joe Auto guided vehicles Sandra : Joe Sandra : Or without “separator” tags … Can be done if it is clear where each employee and each project starts
DTDs for the First Two Documents <!DOCTYPE db [... ]> <!DOCTYPE db [... ]>
Wrapping Relations is not a Good Design Strategy When designing XML documents from ER diagrams, –ER entities are described by XML elements –ER attributes can be described either by XML attributes or by subelements –How to represent ER relationships? By using the built-in relationship in XML between elements and subelements But it is not always possible, so ID references might have to be used
How to use XML Attributes XML attributes describe properties of the contents, rather than the contents cheese fromage branza A food made …
Attributes (cont’d) Another common use for attributes is to express dimensions or types
Jeff Cohen Irma Levy Using Attributes
It is not Always Clear When to Use Attributes L. Simpson L. Simpson
Using IDs Jeff Cohen Irma Levy ID attributes
How to Represent Relationships Two related ER entities, e.g., employees and departments, can be represented as follows A department is an element, and the employees are subelements of the department The relationship must be many-to-one or one-to-one –Subelements are the “many”
No Multiple Copies of the Same Element (to Avoid Redundancies) Cannot represent in this way –A many-to-many relationship –A relationship with more than two entities –A binary relationship between an entity and itself or between two entities that are related by an ISA relationship ID references must be used in the above cases
More Problematic Cases If there are several many-to-one relationships between two ER entities, then only one can be represented as an element-subelement relationship For example, employees can be subelements of their department But the relationship between a department and its manager (who is one of the employees) must be represented by an IDREF
Missing Information is another Problem If there could be an employee without a department, then employees cannot be represented as subelements of departments –IDREFS have to be used
Inverse Relationships XML does not have built-in inverse relationships Must use IDREF to represent inverse relationships For example, add an IDREF attribute to each employee element for denoting the department of the employee
XML Schemas W3Schools on XML Schemas
XML Schemas W3C XML Schema Language, also known as the language for XML Schema Definition (XSD) There are other proposals for XML Schemas
XSDs have Types XSDs use complex types that generalize the content model of DTDs (i.e., the regular expressions for describing elements) Many simple types, e.g., String, Integer –Generalize PCDATA and CDATA Many facets of simple types, e.g., length, maxInclusive, maxExclusive
xs:sequence and xs:all Can specify that subelements should appear in a specific order (i.e., sequence) or in any order (i.e., all) –But xs:all is not as general as xs:sequence Can restrict the number of occurrences of subelements, e.g., a departments can have between 10 and 100 employees
References References are to specific elements or attributes, e.g., a reference to “person”, where “person” is the name of an element
More Features Mixed content can be defined more generally, compared to DTDs Local and global definitions of elements and types Derived types by restriction or extension
XSDs and Namespaces XSDs recognize namespaces Easier (than with DTDs) to check validity of a document with respect to multiple schemas –A very important feature when collecting information from multiple heterogeneous sources –XSDs are more extensible than DTDs
Summary of XML XML is a new data format andits main virtues: –widespread acceptance –the (important) ability to handle semistructured data (data without schema) DTDs provide some useful syntactic constraints on documents, but as schemas they are weak How to store large XML documents? How to query them? How to map between XML and other representations?