Querying XML: XQuery, XPath, and SQL/XML in Context

Slides:



Advertisements
Similar presentations
4 XML Schema.
Advertisements

1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
1 XML DTD & XML Schema Monica Farrow G30
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
XML Schema Definition Language
XML Simple Types CSPP51038 shortcourse. Simple Types Recall that simple types are composed of text-only values. All attributes are of simple type Elements.
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
XML Schema Basics SD2520 Databases using XML and Jquery Chapter 12
Unit 4 – XML Schema XML - Level I Basic.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Lecture 15 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
Validating DOCUMENTS with DTDs
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Schema 1 Lecturer.
Dr. Azeddine Chikh IS446: Internet Software Development.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Introduction to XML. What is XML? Extensible Markup Language XML Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation XML Schema 2 Lecturer.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
New Perspectives on XML, 2nd Edition
An Introduction to XML Sandeep Bhattaram
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
Lecture 0 W3C XML Schema. Topics Status Motivation Simple type vs. complex type.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
SDPL : XML Schemas1 2.5 XML Schemas n Short introduction to XML Schema –W3C Recommendation, 1 st Ed. May, 2001; 2 nd Ed. Oct, 2004: »XML Schema.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
CMP 051 XML Introduction Session IV
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Session III Chapter 6 – Creating DTDs
CSCE 315 – Programming Studio Spring 2013
Chapter 7 Representing Web Data: XML
Introduction to XML Schema DoD Users Group Tutorial on XML and Science
Design and Implementation of Software for the Web
THE DATATYPES OF XML SCHEMA A Practical Introduction
X-Informatics: I-400 and I-590 XML Schema
New Perspectives on XML
CMP 051 XML Introduction Session III
XML Technologies X-Schema.
Lecture 9: XML Monday, October 17, 2005.
XML Schema Primer Seong Jong Choi Multimedia Lab.
Session II Chapter 6 – Creating DTDs
Semi-Structured data (XML)
New Perspectives on XML
Presentation transcript:

Querying XML: XQuery, XPath, and SQL/XML in Context Part II Metadata and XML Chapter 4 Metadata – An Overview Chapter 5 Structural Metadata Chapter 6 The XML Information Set (Infoset) and Beyond 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview 4.1 Introduction Metadata : “data about data” We have found four different usages of the word metadata in the data management community. Structural metadata: Information about the structure of the data, the types of data fields, and the relationships between data fields. Some references refer to this sort of metadata as the schema for the data. Semantic metadata: Information defining the meanings of various data values and of the names given to data fields. Catalog metadata: Information providing high-level facts about desired data, often used to locate that data. Integration metadata: Information about the correspondence between data components, often from different sources – that is, which data fields or groups of data fields have the same meaning; for example, “firstname” together with “lastname” can be substituted for “fullname.” The term mapping metadata is sometimes applied to this concept. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview 4.2 Structural Metadata Structural metadata is metadata that describes the structure, type, and relationships of data. Example 4-1 Example SQL Table Definition CREATE TABLE book_catalog.querying_xml.movies ( movie_ID INTEGER CONSTRAINT movie_ID_not_null NOT NULL, movie_title CHARACTER VARYING (50), movie_description CLOB(1M) ) 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview Example 4-2 Table Definition of the TABLES Table CREATE TABLE tables ( TABLE_CATALOG INFORMATION_SCHEMA.SQL_IDENTIFIER, TABLE_SCHEMA INFORMATION_SCHEMA.SQL_IDENTIFIER, TABLE_NAME INFORMATION_SCHEMA.SQL_IDENTIFIER, TABLE_TYPE INFORMATION_SCHEMA.CHARACTER_DATA CONSTRAINT TABLE_TYPE_NOT_NULL NOT NULL CONSTRAINT TABLE_TYPE_CHECK CHECK (TABLE_TYPE IN ( ‘BASE TABLE’, ‘VIEW’, ‘GLOBAL TEMPORARY’, ‘LOCAL TEMPORARY’)), …, CONSTRAINT TABLES_PRIMARY_KEY PRIMARY KEY (TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME), CONSTRAINT TABLES_FOREIGN_KEY_SCHEMATA FOREIGN KEY (TABLE_CATALOG, TABLE_SCHEMA) REFERENCES SCHEMATA, … ) 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview Table 4-1 Contents of the TABLES Table ----------------------------------------------------------------------------------------------------------------------------------------------------------------- CATALOG_NAME SCHEMA_NAME TABLE_NAME TABLE_TYPE … BOOK_CATALOG QUERYING_XML MOVIES BASE TABLE … Table 4-2 Contents of the COLUMNS Table ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- CATALOG_NAME SCHEMA_NAME TABLE_NAME COLUMN_NAME ORDINAL_POSITION DATA_TYPE_DESCRIPTOR_ … IDENTIFIER BOOK_CATALOG QUERYING_XML MOVIES MOVIE_ID 1 … BOOK_CATALOG QUERYING_XML MOVIES MOVIE_TITLE 2 … BOOK_CATALOG QUERYING_XML MOVIES MOVIE_DESCRIPTION 3 … SQL’s data is completely regular. “XML” is inherently semi-structured, which means that some data elements might be missing entirely from a given XML document. As a result, the nature of XML structural metadata is significantly different from that of SQL. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview Example 4-3 Actors and Actresses <actors> <actor> <name>Johnny Depp</name> <gender>Male</gender> <film runtime=“122”> <title>From Hell</title> <role>Inspector Fred Abberline</role> </film> <film> <title>Blow</title> <role>George Jung</role> <film runtime=“97”> <title>Don Juan de Marco</title> <role>Don Juan</role> </actor> <actor> <name>Iliana Douglas</name> <gender>Female</gender> <film runtime=“111”> <title>Ghost World</title> <role>RobertA</role> </film> <film runtime=“116”> <title>Grace of My Heart</title> <role>Dennis Waverly</role> <role>Edna Buxton</role> <film runtime=“106”> <title>The Thin Pink Line</title> <role>Julia Bullock</role> </actor> </actors> In summary, structural metadata serves to describe the data components, the types of those components, and their relationships to one another. However, it has nothing to do with the meaning of the data being described. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview 4.3 Semantic Metadata Semantic metadata is metadata that describes the “meaning” of data. The meaning of data values. The meaning of the names of things that can take on such values. A metadata registry, managed by some registration authority, provides a mechanism by which the names of “things” and the values assigned to them can be managed, making them easier to find and interpret in various data sources. When data elements and value domains are well documented according to ISO/IEC 11179 and the documentation is managed in a metadata registry (MDR), finding and retrieving them from disparate databases as well as sending and receiving them via electronic communications are made easier. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview 4.4 Catalog Metadata Catalog metadata specifies information about identifying and locating data that (usually) cannot be found in the data itself. The Dewey Decimal Classification (DDC, also called the Dewey Decimal System) is a proprietary system of library classification developed by Melvil Dewey in 1876, and has since then been greatly modified and expanded through twenty-two major revisions, the most recent in 2004. Examination of many types of resources for which cataloging is necessary led to a generalization of the requirements for cataloging, which in turn led to standardized vocabularies for catalog metadata. One important standard in this field in the Dublin Core, a set of metadata elements that can be used to describe any resource, that is, anything that has identity. The metadata elements specified by the Dublin Core are: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview Table 4-3 A Movie Described Using the Dublin Core ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Dublin Core Element Name Value Title Pitch Black Creator David Twohy Subject Science Fiction Subject Drama Description It’s evil vs. evil in an electrifying showdown that USA today calls “… best excuse to root for the bad guy since Arnold in the original Terminator.” Publisher Universal Pictures Publisher Interscope Communications Contributor Vin Diesel Contributor Radha Mitchell Contributor Cole Hauser Contributor Keith David Date 2000 Resource Type Movie Format DVD: Region 1 Resource Identifier ISBN 0-7832-4922-5 Language en: US etc. etc. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview In the XML world, a well-known example of a catalog metadata standard is the Resource Description Framework (RDF). One of the documents in the standard, the Primer, clarifies that RDF is intended “for representing information about resources on the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a web page, copyright and licensing information about a web document, or the availability schedule for some shared resources.” In the RDF model, assertions about resources take the form of a subject, an object, and a predicate that specifies the relationship between the subject and the object. For example, the website http://sqlx.org (the subject) is maintained (the predicate) by the authors of this book (the object). Dublin Core and RDF are not competing ways of creating and representing catalog metadata about documents. Dublin Core defines a number of metadata elements used to describe documents and represents a consensus among information retrieval specialists about the minimal information necessary to identify and locate such documents. By contrast, RDF is an architecture for representing and organizing metadata (or, indeed, data) that does not predetermine what that metadata must be. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview In summary, catalog metadata is information that makes it easier to locate desired resources among a collection of like resources. Catalog metadata is not homogeneous, though. We identify three subcategories of catalog metadata: descriptive (or bibliographic, supporting discovery and interpretation of data), administrative (addressing rights management, physical media descriptions, encoding conventions, and so on), and preservational (to track the lineage or provenance of data, the archival requirements, etc.). There are undoubtedly many other ways of subdividing the notion of catalog metadata. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview 4.5 Integration Metadata Integration metadata is metadata that makes it possible to pull together data designed or created by different organizations, where the data from all sources is intended to have the same purpose. <Actress> <name>Rikki Lake</name> <sex code=“2”/> <movie> <title>Serial Mom</title> <length>95</length> <released>1994</released> <character>Misty Sutphin</character> </movie> <title>Last Exit to Brooklyn</title> <length>102</length> <released>1989</released> <character>Donna</character> <title>Hairspray</title> <length>92</length> <released>1988</released> <character>Tracy Turnblad</character> </Actress> </Actors-and-Actresses> Example 4-4 Other Actors and Actresses <Actors-and-Actresses> <Actor> <name>Martin Short</name> <sex code=“1”/> <movie> <title>Mars Attacks!</title> <length>106</length> <released>1996</released> <character>Press Secretary Jerry Ross</character> </movie> <title>Innerspace</title> <length>120</length> <released>1987</released> <character>Jack Putter</character> <title>La La Wood</title> <length>unknown</length> <released>2003</released> <character>Jimmy Glick</character> </Actor> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 4 Metadata – An Overview Integration metadata could provide a mapping between the two designs, as illustrated in Table 4-4. Table 4-4 Integrating Two XML Document Designs ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Data Purpose Example 4-3 Example 4-4 Top-level container <actors> <Actors-and-Actresses> Individual actor container <actor> <Actor> or <Actress> Actor’s name <name>(content) <name>(content) Actor’s gender <gender>(content) <sex>(attribute:code) Filmography <film> <movie> Running time of film Attribute of <film>: runtime <length>(content) Name of film <title>(content) <title>(content) Year film was released (not present) <released>(content) Character played in film <role>(content) <character>(content) In summary, integration metadata is information that assists in correlating data designed by different organizations or individuals. 4.6 Chapter Summary 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.1 Introduction Structural metadata is metadata that describes the structure, type, and relationships of data. A "well-formed" XML document is defined as an XML document that has correct XML syntax. According to W3C, this means: XML documents must have a root element XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must always be quoted This should not be confused with a valid XML document, which is defined as a "well-formed" XML document which also conforms to the rules of a Document Type Definition (DTD) or an XML Schema (XSD), which W3C supports as an alternate to DTD. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.2 DTD “The XML document type definition contains or points to markup declarations that provide a grammar for a class of documents.” When the DTD is contained within the document type declaration, it’s referred to as an internal subset DTD (which is illustrated in Example 5-10); when the document type declaration points to the DTD, that DTD is called an external subset DTD (as illustrated in Example 5-11). “A markup declaration is an element type declaration, an attribute list declaration, an entity declaration, or a notation declaration.” DTDs use a non-XML syntax to provide those markup declarations. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.2.1 SGML Heritage DTDs were invented as a metadata description language for the Standard General Markup Language (SGML). Why SGML used a non-SGML syntax for DTDs is unclear, but that decision was inherited by XML. 5.2.2 Relatively Simple, Easy to Write, and Easy to Read A document type declaration in an XML document must occur as part of the document’s prolog. It has the syntax shown in Example 5-1. Example 5-1 Document Type Declaration Syntax <!DOCTYPE document-type-name optional-external-reference optional-internal-declarations> Example 5-2 Example Document Type Declaration <!DOCTYPE bibliography SYSTEM “biblio.dtd”> The name specified for the DOCTYPE must match the name of the root element of every document that depends on the DTD. DTDs are defined using markup declarations. Markup declarations come in several forms: element declarations, attribute list declarations, entity declarations, notation declarations, processing instruction declarations, and comments. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Example 5-3 Examples of Element Declarations <!ELEMENT catalogued EMPTY> <!ELEMENT review ANY> <!ELEMENT title (#PCDATA | ital | bold | under )*> <!ELEMENT author (salutation?, given, family, suffix? )> Example 5-4 Elements Based on Element Declaration Examples <catalogued/> <review>This is a <ital>really</ital> interesting book, but <pronoun>I</pronoun>, for one, didn’t <under>really</under> understand it and I doubt that <name>Roger</name> did, either.</review> <title>A <bold>Bold</bold> Tale of <ital>Three</ital>Towns</title> <author><salutation>Dr.</salutation><given>Bob</given><family>Smith</family></author> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata All elements must be declared in the DTD as “global” elements – that is, elements that can appear anywhere in an XML document – except for elements used purely in mixed content. <!ELEMENT catalogued EMPTY> The element can be optional. In an instance XML document, you may specify this element either as <catalogued/> or as <catalogued></catalogued>, because there is no semantic difference between those representations. Elements declared to be EMPTY may have attribute list declarations associated with them <!ELEMENT review ANY> Is permitted to have any content at all, including an arbitrary mixture of text and child elements. The child elements have to be declared somewhere in the DTD, but they are not cited in the definition of the element declared ANY. <!ELEMENT title (#PCDATA | ital | bold | under )*> <!ELEMENT author ( salutation?, given, family, suffix? )> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Example 5-5 Attribute List Declaration Syntax <!ATTLIST element-name attribute-name attribute-type attribute-default …> Attribute-default : #FIXED // constant #REQUIRED // mandatory #IMPLICIT // optional Example 5-6 Attribute Types CDATA // character data ID, // unique among the values of all attributes of type ID throughout the containing document. IDREF, IDREFS // have values that must match the ID attribute of some element in the same document. ENTITY, ENTITIES // unparsed entities NMTOKEN, NMTOKENS // name tokens NOTATION ( notation-name ) // notations NOTATION ( notation-name | notation-name | … ) ( identifier ) ( identifier | identifier | … ) 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Example 5-7 Examples of Attribute Declarations <!ATTLIST book ISBN ID #REQUIRED retail-price CDATA #IMPLICIT size (folio|quarto) (quarto) document-type NMTOKEN (#FIXED book) > 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.2.3 Limited Capabilities, Especially with Respect to Data Types It is impossible for a DTD to govern the sequence of child elements in an element defined to have mixed content. Example 5-8 Examples of Elements with Mixed Content <!ELEMENT para (#PCDATA | ital | bold | under )*> <para>This <ital>really</ital> is <bold>not</bold> helpful.</para> <para>George should get a raise this year.</para> <para><bold>Three</bold> years before <ital>this</ital> mast really is enough for anybody.</para> <para><ital>I</ital> do <ital>my</ital> work <ital>my</ital> way.</para> The only data type that DTDs support is text. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.2.4 An Example Document and DTD Example 5-10 An XML Document with DTD <!DOCTYPE bibliography SYSTEM “biblio.dtd” [ <!ELEMENT author (salutation?, given, family, suffix?)]> <bibliography> <books> <book ISBN=“ISBN-0-19-853737-9” document-type=“book” retail-price=“189.00”> <title>The SGML Handbook</title> <author><salutation>Dr.</salutation><given>Charles F.</given> <family>Goldfarb</family> </author> <review> <para>This review was</para> <para> This book is, make sense of it. </para> But beware: <emph>XML Pocket Reference</emph>, the SGML standard! </review> <catalogued date=“1991-03-22”/> </book> <book ISBN=“1-55860-456-1” document-type=“book”> <title>SQL: 1999 Understanding Relational Language Components</title> <author><given>Jim</given> <family>Melton</family> </author> <author><given>Alan R.</given> <family>Simon</family> <suffix>PhD.</suffix> <catalogued/> </book> <book ISBN=“1-861005-06-7” retail-price=“34.99” document-type=“book”> <title>XSLT Programmer’s Reference</title> <author><given>Michael</given><family>Kay</family></author> <review>Dang, this book is great!</review> <book ISBN=“0-8647420321-7” document-type=“book”> <title>India</title> <author><given>Hugh</given><family>Finlay</family></author> <author><given>Tony</given><family>Wheeler</family></author> <author><given>Bryn</given><family>Thomas</family></author> <review>Not yet reviewed.</review> <catalogued date=“2001-07-13”/> </books> <papers/> </bibliography> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Example 5-11 An External Subset DTD (in biblio.dtd) <!ELEMENT bibliography ( books, papers )> <!ELEMENT books ( book* )> <!ELEMENT papers ( paper* )> <!ELEMENT book ( title, author+, review?, catalogued? )> <!ATTLIST book ISBN ID #REQUIRED retail-price CDATA #IMPLICIT size (folio | quarto) quarto document-type NMTOKEN #FIXED book > <!ELEMENT title ( #PCDATA | ital | bold | under )*> <!ELEMENT salutation ( #PCDATA )> <!ELEMENT given ( #PCDATA )> <!ELEMENT family ( #PCDATA )> <!ELEMENT suffix ( #PCDATA )> <!ELEMENT review ANY> <!ELEMENT catalogued EMPTY> <!ATTLIST catalogued date CDATA #IMPLIED > <!ELEMENT ital ( #PCDATA | ital | bold | under )*> <!ELEMENT bold ( #PCDATA | ital | bold | under )*> <!ELEMENT under ( #PCDATA | ital | bold | under )*> <!ELEMENT para ( #PCDATA | ital | bold | under | quote | emph )*> <!ELEMENT quote ( #PCDATA )> <!ELEMENT emph ( #PCDATA )> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.3 XML Schema It was published in 2001 as three documents. The first (XML Schema Part 0: Primer) is a primer and is not normative but is intended more as a tutorial to illustrate various important features of the normative parts. The second part (XML Schema Part 1: Structures) specifies the XML document structures that XML Schema can be used to specify. The last part (XML Schema Part 2: Datatypes) provides a number of data types that can be used to specify the types of attribute values and element content. The development of XML Schema, which began in late 1998, came about because of increasing use of XML for purposes beyond simple document makeup. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Example 5-12 Sample XML Document <xs:complexType name=“Items”> <xs:sequence> <xs:element name=“item” minOccurs=“0” maxOccurs=“unbounded”/> <xs:complexType> <xs:element name=“productName” type=“xsd:string”/> <xs:element name=“quantity”> <xs:simpleType> <xs:restriction base=“xsd:positiveInteger”> <xs:maxExclusive value=“100”/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name=“USPrice” type=“xsd:decimal”/> <xs:element ref=“comment” minOccurs=“0”/> <xs:element name=“shipDate” type=“xsd:date” minOccurs=“0”/> </xs:sequence> <xs:attribute name=“partNum” type=“SKU” use=“required”/> </xs:complexType> <!-- Stock Keeping Unit, a code for --> <xs:simpleType name=“SKU”> <xs:restriction base=“xsd:string”> <xs:pattern value=“\d{3}-[A-Z]{2}”/> </xs:schema> <xs:schema xmlns:xs=“http://www.w3.org/201/XMLSchema”> … <xs:annotation> <xs:documentation xml:lang=“en”> Purchase order schema for Example.com. </xs:documentation> </xs:annotation> <xs:element name = “purchaseOrder” type=“PurchaseOrderType”/> <xs:element name = “comment” type=“xsd:string”/> <xs:complexType name=“PurchaseOrderType”> <xs:sequence> <xs:element name=“shipTo” type=“USAAddress”/> <xs:element name=“billTo” type=“USAAddress”/> <xs:element ref=“comment” minOccurs=“0”/> <xs:element name=“items” type=“Items”/> </xs:sequence> <xs:attribute name=“orderDate” type=“xsd:date”/> </xs:complexType> <xs:complexType name=“USAAddress”> <xs:element name=“name” type=“xsd:string”/> <xs:element name=“street” type=“xsd:string”/> <xs:element name=“city” type=“xsd:string”/> <xs:element name=“state” type=“xsd:string”/> <xs:element name=“zip” type=“xsd:decimal”/> <xs:attribute name=“country” type=“xsd:NMTOKEN”fixed=“US”/> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.3.1 Exploring an XML Schema Unlike DTD, an XML Schema is itself written in XML –that is, it is an XML document. (1) <xs:schema xmlns:xs=“http://www.w3.org/201/XMLSchema”> … </xs:schema> It identifies this bit of XML as an XML Schema document. xmlns:xs. It defines a namespace by means of a Uniform Resource Identifier (URI) and a corresponding prefix by which the namespace will be referenced within this particular document. xs:. Throughout this XML Schema document, the namespace prefix xs: is used to reference the namespace identified by the URI http://www.w3.org/2001/XMLSchema. (2) <xs:annotation> … </xs:annotation> The content of the (optional) element <xs:annotation> serves to document all or part of an XML Schema document as well as providing information to applications that might process the schema document. <xs:documentation> … </xs:documentation> <xs:appinfo> … </xs:appinfo> (3) <xs:element name = “purchaseOrder” type=“PurchaseOrderType”> This element happens to be the “root” of the structure definition; as such, it has to be the first declaration in the schema. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata (4) <xs:element name = “comment” type=“xsd:string”> Type A simple type – ordinary data types A complex type – a structure type : will be discussed in Section 5.3.3. (5) <xs:complexType name=“PurchaseOrderType> <xs:sequence> <xs:element name=“shipTo” type=“USAAddress” /> <xs:element name=“billTo” type=“USAAddress” /> <xs:element ref=“comment” minOccurs=“0” /> <xs:element name=“items” type=“Items” /> </xs:sequence> <xs:attribute name=“orderDate” type=“xsd:date” /> </xs:complexType> Complex types can be given an explicit name, or they can be anonymous. (6) <xs:complexType name=“USAAddress”> … </xs:complexType> (7) <xs:complexType name=“Items”> … </xs:complexType> (8) <xs:element name=“quantity”> <xs:simpleType> <xs:restriction base=“xs:positiveInteger”> <xs:maxExclusive value=“100” /> </xs:restriction> </xs:simpleType> </xs:element> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.3.2 Simple Types (Primitive Types and Derived Types) XML Schema Part2: Datatypes In that document, a data type is defined to be “a 3-tuple, consisting of (a) a set of distinct values, called its value space; (b) a set of lexical representations, called its lexical space; and (c) a set of facets that characterize properties of the value space, individual values, or lexical items.” For example, the character sequences “1,” “01,” and “0000000000001” are all lexical representations of the number we commonly call “one.” Some data types may allow other representations as well, such as “1.0” and “0.1E1.” In the context of XML Schema, the values belonging to a data type can be specified in several ways: axiomatically (that is, from fundamental notions, such as mathematical rules), by enumeration, or by restricting the values belonging to another data type. The lexical representations are character strings that represent the values. For example, a character string value has a length, and the character string type uses two specific facets, minLength and maxLength, to specify the minimum allowed length and the maximum allowed length, respectively. 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata XML Schema provides a number of built-in primitive data types as well as a number of additional built-in types that are derived from the built-in primitives. All of the built-in data types of XML Schema belong to the XML Schema namespace, often indicated by the prefix “xs:.” The corresponding namespace URI is: http://www.w3.org/2001/XMLSchema. <xs:element name=“USPrice” type=“xs:decimal” /> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Table 5-1 Built-in Types ---------------------------------------------------------------------------------------------------------- Primitive Types Derived Types Source of Derived Types string normalizedString token normalizedString language token NMTOKEN token NMTOKENS NMTOKEN (derived by list) NAME token NCName Name ID NCName IDREF NCName IDREFS IDREF (derived by list) ENTITY NCName ENTITIES ENTITY (derived by list) boolean decimal integer nonPositiveInteger integer negativeInteger nonPositiveInteger long integer int long short int byte short nonNegativeInteger integer unsignedLong nonNegativeInteger unsignedInt unsignedLong unsignedShort unsignedInt unsignedByte unsignedShort positiveInteger nonNegativeInteger ------------------------------------------------------------------------------------- Primitive Types Derived Types Source of Derived Types float double Duration dateTime date,time gYearMonth gYear gMonthDay gDay gMonth hexBinary base64Binary anyURI Qname NOTATION 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.3.3 Complex Types and Structures Example 5-15 Complex Type Definition: PurchaseOrderType <xs:complexType name=“PurchaseOrderType> <xs:sequence> <xs:element name=“shipTo” type=“USAAddress” /> <xs:element name=“billTo” type=“USAAddress” /> <xs:element ref=“comment” minOccurs=“0” /> <xs:element name=“items” type=“Items” /> </xs:sequence> <xs:attribute name=“orderDate” type=“xsd:date” /> </xs:complexType> The <xs:complexType> element is used to define a named complex type. The <xs:sequence> element specifies that the object in which it is contained contains a sequence of child elements that must appear in the specified order. The <xs:element> element declares an element that is used as the content of the object in which it is contained. The default value for both minOccurs and maxOccurs attributes is 1. maxOccurs=“unbounded” : occur any number of times The <xs:attribute> element 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Example 5-16 Complex Type Definition: Items Example 5-17 Anonymous Complex Type Definition The <xs:complexType> element is used to define an anonymous complex type, indicated by the absence of a name attribute on the element. The use of <xs:simpleType> is required in order to define the element to have a restriction on its value. When you need to declare an element that has both a simple type (such as xs:string, xs:positiveInteger, or xs:date) and an attribute, you (counterintuitive though it may be) cannot just use the type attribute but must instead declare the element as an <xs:complexType> with <xs:simpleContent>. Example 5-18 Allowing Attributes on Elements of Simple Types <xs:element name=“deliveryDate”> <xs:complexType> <xs:simpleContent> <xs:extension type=“xsd:date”> <xs:attribute name=“verified” type=“xsd:Boolean”/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata Table 5-2 Features of XML Schema Part 1: Structures -------------------------------------------------------------------------------------------------------------------------------------------- Feature XML Schema DTD Syntax XML document Non-XML Simple types Part 2’s xs:types Strings and string-like attribute types Occurrence constraints minOccurs, maxOccurs attributes ?, *, + Complex type definition <xs:complexType> No real analog Mixed content <xs:complexType mixed=“true”> #PCDATA used with element names as alternatives Sequence of child elements <xs:sequence> Element names separated by commas Choice of child elements <xs:choice> Element names separated by vertical bar Groups <xs:group> Parameter entities, parenthesized sequences, or parenthesized choices Entities No analog <!ENTITY> Type derivation Yes No Type re-use Yes No ----------------------------------------------------------------------------------------------------------------------------------------------- 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context

Querying XML: XQuery, XPath, and SQL/XML in Context Chapter 5 Structural Metadata 5.4 Other Schema Languages for XML XML Schema is a complex language with great flexibility and power. Other ways of expressing structural metadata for XML have been devised (though not in the context of the W3C). 5.4.1 RELAX NG 5.4.2 Schematron 5.4.3 Decisions, Decisions, Decisions 5.5 Deriving an Implied Schema from a DTD Example 5-22 DTD Equivalent to RELAX NG Schema Example 5-23 An XML Schema Equivalent to RELAX NG Schema and DTD 5.6 Chapter Summary 2007-7/KNU Querying XML: XQuery, XPath, and SQL/XML in Context