XML Validation III Schemas

Slides:



Advertisements
Similar presentations
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Advertisements

4 XML Schema.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
1 XML DTD & XML Schema Monica Farrow G30
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
XML Schema Definition Language
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
1 XML Document Type Definitions XML Schema. 2 Well-Formed and Valid XML uWell-Formed XML allows you to invent your own tags. uValid XML conforms to a.
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
Unit 4 – XML Schema XML - Level I Basic.
17 Apr 2002 XML Schema Andy Clark. What is it? A grammar definition language – Like DTDs but better Uses XML syntax – Defined by W3C Primary features.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML Schema Vinod Kumar Kayartaya. What is XML Schema?  XML Schema is an XML based alternative to DTD  An XML schema describes the structure of an XML.
1 XML Schemas. 2 Useful Links Schema tutorial links:
Dr. Azeddine Chikh IS446: Internet Software Development.
CSE4500 Information Retrieval Systems XML Schema – Part 1.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Beginning XML 4th Edition. Chapter 5: XML Schemas.
New Perspectives on XML, 2nd Edition
 XML DTD and XML Schema Discussion Sessions 1A and 1B Session 2.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
An OO schema language for XML SOX W3C Note 30 July 1999.
XML – Part III. The Element … This type of element either has the element content or the mixed content (child element and data) The attributes of the.
An Introduction to XML Sandeep Bhattaram
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
Advanced Accounting Information Systems Day 31 XML Language Foundation November 6, 2009.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
CSE3201 Information Retrieval Systems XML Schema – Part 2.
XML Validation III Schemas + RELAX NG Robin Burke ECT 360.
CITA 330 Section 4 XML Schema. XML Schema (XSD) An alternative industry standard for defining XML dialects More expressive than DTD Using XML syntax Promoting.
1 XML and XML in DLESE Katy Ginger November 2003.
XML Schema.
ACG 4401 XML Schemas XML Namespaces XLink.
ACG 4401 XML Schemas XML Namespaces XLink.
Semistructured-Data Model
XML QUESTIONS AND ANSWERS
CMP 051 XML Introduction Session IV
Lecture 9 XML & its applications
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Session III Chapter 6 – Creating DTDs
ISSGC’05 XML Schemas (XSD)
Design and Implementation of Software for the Web
CMP 051 XML Introduction Session IV Chapter 10 – Defining Simple Types
DTD and XML Schema.
ece 720 intelligent web: ontology and beyond
New Perspectives on XML
CMP 051 XML Introduction Session III
XML Technologies X-Schema.
Working with Namespaces and Schemas
Optimising XML Schema for IODEF Data model
Lecture 9 XML & its applications
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
XML Schema Diyar A. Abdulqder
New Perspectives on XML
Presentation transcript:

XML Validation III Schemas Robin Burke ECT 360

Outline Admin Namespaces review XML Schemas Break Data types Elements Attributes Break Data types

Admin Due today Due next week Due 10/24 Project milestone #3 Document analysis Due next week Homework #3 Due 10/24 Milestone #4: Schema or DTD

Quiz

Assessment Homework – 35% Participation (including in-class exercises and labs) – 15% Quizzes – 20% Final project – 30%

Namespaces A way to identify a set of labels element / attribute names attribute values As belonging to a particular application Example course "title" html "title"

Namespace idea Associate a short prefix with an application Use the prefix with a colon to "qualify" names html:title syll:title

Namespace idea, cont'd A namespace is an association between a set of names a unique identifier (URI) a prefix used to identify them

Namespace declaration Standalone <?xml:namespace ns="http://bookpeople.com/book" prefix="book"?> Part of element <html xmlns="http://www.w3.org/1999/xhtml"> in this case, no prefix <book:book xmlns:book="http://bookpeople.com/book">

Namespaces Essential if we must combine documents from different applications For example we want to define a new XML language using XML namespace for new language namespace for defining language

XML so far Languages defined by DTDs OK for text documents contain text elements string attributes OK for text documents Not enough for Databases Business process integration Need data types

Solution XML Schema XML document becomes Write language definition in XML More control over document contents XML document becomes a complex data type XML language definition becomes complex data type specification

XML Schema Always a separate document Written in XML no internal option Written in XML very verbose Can be large and complex

Schemas and namespaces A schema uses elements from one application the XML Schema language to define another Namespaces are necessary Namespaces apply to elements not values

Example 1, XML <grades assignment="Homework 1"> <grade> <student id="1234-12345">Jane Doe</student> <assigned-grade>A</assigned-grade> </grade> <student id="5432-54321">John Doe</student> <assigned-grade>B</assigned-grade> </grades>

Example 1, DTD <!ELEMENT grades (grade*)> <!ATTLIST grades assignment CDATA #IMPLIED> <!ELEMENT grade (student, assigned-grade)> <!ELEMENT student (#PCDATA)> <!ATTLIST student id CDATA #REQUIRED> <!ELEMENT assigned-grade (#PCDATA)>

Data types grades grade student assigned-grade is text a collection of items of type grade can never have more than 40 students grade a structure containing a student and an assigned grade student a structure containing an id and some text probably should constrain the student id assigned-grade is text probably should constrain to A-D,F,I

Built-in types Part of the schema language Base types Derived types 19 fundamental types Examples: string, decimal Derived types 25 more types that use the base types Examples: ID, positiveInteger

Built-in types, cont'd

To declare an element Equivalent to type="string"> <xs:element name="assigned-grade" type="string"> Equivalent to <!ELEMENT assigned-grade (#PCDATA)>

Simple data type A renaming of an existing data type <xs:element name="assigned-grade" type="xs:string"> Or a restriction of a existing type strings beginning with "A-D"

Complex datatype <xs:element name=“name”> <xs:complexType> compositor element declarations attribute declarations </xs:complexType> </xs:element>

Compositor sequence choice all

Sequence compositor like "," in DTD DTD Schema <!ELEMENT foo (bar, baz)> Schema <xs:element name="foo"> <xs:complexType> <xs:sequence> <xs:element ref="bar" /> <xs:element ref="baz" /> </xs:sequence> </xs:complexType> </xs:element>

Elements in sequences Can specify optional / # of occurrences ? * + <xs:element ref="bar" minOccurs="0" type="xs:string"> * <xs:element ref="bar" minOccurs="0" maxOccurs="unbounded" /> + <xs:element ref="bar" minOccurs="1" maxOccurs="unbounded" /> What about... <xs:element ref="bar" minOccurs="2" maxOccurs="4" />

Choice compositor like "|" in DTD DTD Schema <!ELEMENT foo (bar | baz)> Schema <xs:element name="foo"> <xs:complexType> <xs:choice> <xs:element ref="bar" /> <xs:element ref="baz" /> </xs:choice> </xs:complexType> </xs:element>

All compositor no simple DTD equivalent DTD Schema <!ELEMENT foo ( (bar, baz?) | (baz, bar?) > Schema <xs:element name="foo"> <xs:complexType> <xs:all> <xs:element ref="bar" /> <xs:element ref="baz" /> </xs:all> </xs:complexType> </xs:element>

Nesting Compositors can be combined DTD Schema <!ELEMENT foo ( (bar | baz) , (thud | grunt) )> Schema <xs:element name="foo"> <xs:complexType> <xs:sequence> <xs:choice> <xs:element ref="bar" /> <xs:element ref="baz" /> </xs:choice> <xs:element ref="thud" /> <xs:element ref="grunt" /> </xs:sequence> </xs:complexType> </xs:element>

Exercise <!ELEMENT personal-info (person-name, job-title)> <!ELEMENT address (street, city, state, zip)> <!ELEMENT street (#PCDATA)>

Local naming Suppose we want to reuse an element name Example different place in the structure Example <!ELEMENT url-catalog (link*)> <!ELEMENT link (link, description?)> not a legal DTD schema? legal local naming permitted maybe not wise, though

Using namespaces what namespace it is using Schema must say to use schema namespace what namespace it is defining targetNamespace what namespace it is using Document must say that it is using the Schema Instance namespace what namespace(s) it is using what prefix(es) are used where to find the relevant schemas for each namespace

Ugly Schema root element Document root element <xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://josquin.cs.depaul.edu/~rburke/namespaces/business-card" xmlns="http://josquin.cs.depaul.edu/~rburke/namespaces/business-card"> Document root element <business-card xmlns="http://josquin.cs.depaul.edu/~rburke/namespaces/business-card" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://josquin.cs.depaul.edu/~rburke/namespaces/business-card biz-card.xsd">

Attributes DTD attribute types Schema Declaration CDATA, enumeration, token Schema can be any of the basic or derived types can also be user-defined types Declaration <xs:attribute name="x" type="xs:string" use="required" default="abc" />

Attribute declaration Part of complex type follows compositor (one exception) Declaration <xs:attribute name="foo" type="xs:positiveInteger" /> What if the attribute is a more complex type itself? we'll get to that

Exception: simple content If an element has "simple content" no compositor used instead simpleContent element and extension to declare type of the content

Example <xs:element name="student"> <xs:complexType> <!ELEMENT student (#PCDATA)> <!ATTLIST student id CDATA #REQUIRED> <xs:element name="student"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="id" type="xs:string" use="required"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>

How to read this student is a complex type its content is simple it is not simply a renaming of an existing type because of the attribute its content is simple being of only one type string but with an attribute id of type string which is required

Exercise <!ATTLIST personal-info id ID #IMPLIED> <!ELEMENT contact (#PCDATA)> <!ATTLIST contact type (email | fax | phone | web) #REQUIRED> <!ELEMENT text (#PCDATA)> <!ATTLIST text type (endorsement | motto | services) #REQUIRED>

Named types We can name a complex type Example use it wherever a built-in type would work Example <xs:complexType name="barBaz"> <xs:sequence> <xs:element ref="bar" /> <xs:element ref="baz" /> </xs:sequence> </xs:complexType> <xs:element name="foo" type="barBaz"/>

Built-in types Part of the schema language Base types Derived types 19 fundamental types Examples: string, decimal Derived types 25 more types that use the base types Examples: ID, positiveInteger

Built-in types, cont'd

User-defined types Any use of complexType can be turned into a user-defined type usually called "standalone" Simple types can be derived from the built-in types

Standalone types A type can stand outside of an element definition must have a name <xs:complexType name="bar-n-baz"> <xs:sequence> <xs:element ref="bar" /> <xs:element ref="baz" /> </xs:sequence> </xs:complexType> Used in element definition <xs:element name="foo" type="bar-n-baz" />

Mixed content Can specify that an element has mixed content <xs:complexType name="bar-n-baz" mixed="true"> <xs:sequence> <xs:element ref="bar" /> <xs:element ref="baz" /> </xs:sequence> </xs:complexType>

Mixed content, cont'd Schema cannot control where the text appears If this is legal <foo>text here <bar>thud</bar><baz>grunt</baz></foo> So is this <foo><bar>thud</bar>more text<baz>grunt</baz>still more</foo>

Deriving types DTDs do not allow type restrictions beyond enumeration, CDATA, token for attributes PCDATA for content Schemas have built-in types also capability to create your own

Derivation operations list sequence of values union combine two types allowing either restriction placing limits on the legal values

List Must be separated by spaces <xs:element name="partList"> <xs:simpleType> <xs:list itemType="partNo" /> </xs:simpleType> </xs:element> <partList>PN334-04 PN223-89 PQ1112-03</partList> Must be separated by spaces probably more useful to do this with document structure partList -> partNo*

Union Allows data of either type to be used Example Database situation <xs:simpleType name="partNumberField"> <xs:union memberTypes="partNumberType noPartNum" /> </xs:simpleType> Database situation null is a possible value

Restriction Most useful Allow design to state exactly what values are legal prices must be non-negative SSN must follow a certain pattern in-stock must yes or no etc.

Restriction, cont'd Restrict a base type according to "facets" Different facets available for different data types

Facets

Example: enumeration <xs:simpleType name="grade"> <xs:restriction base="xs:string"> <xs:enumeration value="A"/> <xs:enumeration value="B"/> <xs:enumeration value="C"/> <xs:enumeration value="D"/> <xs:enumeration value="F"/> <xs:enumeration value="I"/> </xs:restriction> </xs:simpleType>

Example: numeric <xs:simpleType name="drinkingAge"> <xs:restriction base="xs:positiveInteger"> <xs:minInclusive value="21"/> </xs:restriction> </xs:simpleType>

Example: pattern Regular expressions again derived from perl <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="([A-D]|F|I)(\+|\-)?" /> </xs:restriction> </xs:simpleType>

Inheritance facet restrictions are inherited For example new type derivations must honor them but can restrict them further but new derivations can alter other facets For example monetary type fractionDigits facet = 2 loan amount type monetary type + maxValue = 100000 car loan amount loan amount type + maxValue = 30000

Complex Types Possible to derive from complex types Use complexContent i.e. elements Use complexContent Possibilities extension restriction elements attributes

Exercise <!ELEMENT state (#PCDATA)> <!ATTLIST contact type (email | fax | phone | web) #REQUIRED> <!ELEMENT street (#PCDATA)>

Design decisions Attribute vs element Level of granularity Naming Schema structure

Attribute vs element Some specific rules General principle Element ID must be attribute General principle data vs metadata Element for document content Attribute for information about content Not always easy to tell!

Element Consists of document content Will be shown to a human user Contains substructure Sequence may be important Could be very long Presence depends on other values

Attribute (Opposite of above) Must be from an enumeration of values Also consistency

Level of granularity How detailed to model the data? Very detailed more work to markup more detail in expressing the schema exceptions must be handled Less detailed easier to mark up easier to schematize document contents less accessible

Element content granularity Fine grained model salutation, first name, middle name, last name, appellation Coarse grained model name Tradeoff search / sort / organized document creation

Levels vs recursion Named levels Recursion Tradeoff <chapter> <section> <subsection> <subsubsection> Recursion Tradeoff ability to rearrange transparency of markup

Naming Case convention Multiple words UPPERCASE IS BAD lowercase better Multiple words CapCase camelCase Underline_Convention

Structure Nested Flat Type-based "russian doll" schema looks like the document small schema only Flat elements defined at global level references used in complex type definitions Type-based "venetian blind" all schema complex in type defintions one global element