Schemas1 XML Schema More Powerful. Schemas 2 DTD – Schema - Relax An XML schema is a description of a type of XML document, typically expressed in terms.

Slides:



Advertisements
Similar presentations
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
Advertisements

XML 6.5 XML Schema (XSD) 6. What is XML Schema? The origin of schema  XML Schema documents are used to define and validate the content and structure.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
1 XML DTD & XML Schema Monica Farrow G30
Copyright © [2002]. Roger L. Costello. All Rights Reserved. 1 XML Schemas (Primer)
Copyright © [2001]. Roger L. Costello. All Rights Reserved. 1 XML Schemas (Primer)
1 XML Schema – Part 2 More on Schema Types & Derivation Abstact types & type substitution Uniqueness & Keys Additional schema mechanisms - include & import.
An Introduction to XML Schema CSCI 7818 by Ming Rutar.
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
XML Schemas Lecture 10, 07/10/02. Acknowledgements A great portion of this presentation has been borrowed from Roger Costello’s excellent presentation.
Copyright (c) [2001]. Roger L. Costello. All Rights Reserved. 1 … more on XML Schemas Roger L. Costello XML Technologies Course.
XML Simple Types CSPP51038 shortcourse. Simple Types Recall that simple types are composed of text-only values. All attributes are of simple type Elements.
DECO 3002 Advanced Technology Integrated Design Computing Studio Tutorial 6 – XML Schema School of Architecture, Design Science and Planning Faculty of.
1 XML Schemas Marco Mesiti This Presentation has been extracted from Roger L. Costello (XML Technologies Course)
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
ΑΝΑΠΑΡΑΣΤΑΣΗ ΓΝΩΣΗΣ ΣΤΟΝ ΠΑΓΚΟΣΜΙΟ ΙΣΤΟ XML Schema
XML Schemas. “Schemas” is a general term--DTDs are a form of XML schemas –According to the dictionary, a schema is “a structured framework or plan” When.
Copyright © [2001]. Roger L. Costello. All Rights Reserved. 1 XML Schemas (Primer)
XML Schema Notes Lecture 13, 07/16/02. (see example05)
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
XML Schema – Part 1 1.Introduction to XML-Schema 2.Schema basics 3.Mechanisms (strategies) for Designing Schema 4.Creating your own Datatypes.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Copyright (c) [2001]. Roger L. Costello. All Rights Reserved. 1 … more on XML Schemas Roger L. Costello XML Technologies Course.
Introduction to XML This material is based heavily on the tutorial by the same name at
Processing of structured documents Spring 2003, Part 3 Helena Ahonen-Myka.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy.
Copyright © [2002]. Roger L. Costello. All Rights Reserved. 1 XML Schemas (Primer)
XML Schema Vinod Kumar Kayartaya. What is XML Schema?  XML Schema is an XML based alternative to DTD  An XML schema describes the structure of an XML.
Creating Extensible Content Models XML Schemas: Best Practices A set of guidelines for designing XML Schemas Created by discussions on xml-dev.
1 XML Schemas. 2 Useful Links Schema tutorial links:

Dr. Azeddine Chikh IS446: Internet Software Development.
Copyright © [2001]. Roger L. Costello. All Rights Reserved. 1 XML Schemas (Primer)
Neminath Simmachandran
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Schemas Ellen Pearlman Eileen Mullin Programming the Web Using XML.
1 XML Schemas Modified version of: Roger L. Costello XML Technologies Course (Part 1) (Structures)
XML and friends Part 2 - XML Schema ELAG 2001 workshop 8 Jan Erik Kofoed © BIBSYS Library Automation.
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Li Tak Sing COMPS311F. XML Schemas XML Schema is a more powerful alternative to DTD to describe XML document structures. The XML Schema language is also.
Creating Data Schemas Presentation by Chad Borer 2/6/2006.
Ceng 520 XML Schemas IntroductionXML Schemas 2 Part 0: Introduction Why XML Schema?
1 XML Schemas (Primer) (Structures) (Datatypes)
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Copyright © [2002]. Roger L. Costello. All Rights Reserved. 1 XML Schemas Modified version of: Roger L. Costello XML Technologies Course (Part 1)
New Perspectives on XML, 2nd Edition
1 XML Schemas. 2 Topics What are Schemas? NameSpaces Elements Attributes Data Types Derivations Keys.
XML. 2 XML- Some Links XML Tutorials – Some Links me=htmlhttp://
XML Schema. Why Schema? To define a class of XML documents Serve same purpose as DTD “Instance document" used for XML document conforming to schema.
XML – Part III. The Element … This type of element either has the element content or the mixed content (child element and data) The attributes of the.
An Introduction to XML Sandeep Bhattaram
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
XSD: XML Schema Language Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Schema Definition (XSD). Definition of a Schema It is a model for describing the structure and content of data The XML Schema was developed as a content.
Lecture 0 W3C XML Schema. Topics Status Motivation Simple type vs. complex type.
Copyright © [2001]. Roger L. Costello. All Rights Reserved. 1 XML Schemas (Primer)
XML Validation III Schemas + RELAX NG Robin Burke ECT 360.
XML Schema – Simple Type Web site:
4 Copyright © 2004, Oracle. All rights reserved. Validating XML by Using XML Schema.
1 XML Schemas (Primer) (Structures) (Datatypes)
XML Schemas Dr. Awad Khalil Computer Science Department AUC.
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
THE DATATYPES OF XML SCHEMA A Practical Introduction
Presentation transcript:

Schemas1 XML Schema More Powerful

Schemas 2 DTD – Schema - Relax An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. An XML schema provides a view of the document type at a relatively high level of abstraction.XML There are languages developed specifically to express XML schemas. The Document Type Definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two other very popular, more expressive XML schema languages are XML Schema (W3C) and RELAX NG.Document Type DefinitionXML Schema (W3C)RELAX NG

Schemas 3 Schemas Benefits of Schema Explaining Schema Namespace Inlined Schema Datatypes 58. Summary of Defining Advanced 142. Stratecy of Defining Schemantics

Schemas 4 It's all about integrated business processes An XML vocabulary for expressing your data's business rules RosettaNet PIPs

Schemas 5 Data Model With XML Schemas you specify how your XML data will be organized, and the datatypes of your data. That is, with XML Schemas you model how your data is to be represented in an instance document. A Contract Organizations agree to structure their XML documents in conformance with an XML Schema. Thus, the XML Schema acts as a contract between the organizations. A rich source of metadata An XML Schema document contains lots of data about the data in the XML instance documents, such as the datatype of the data, the data's range of values, how the data is related to another piece of data (parent/child, sibling relationship), i.e., XML Schemas contain metadata What are XML Schemas?

Schemas 6 With the support for data types: It is easier to describe permissible document content It is easier to validate the correctness of data It is easier to work with data from a database It is easier to define data facets (restrictions on data) It is easier to define data patterns (data formats) It is easier to convert data between different data types XML Schemas Benefits

Schemas 7 Example Is this data valid? To be valid, it must meet these constraints (data business rules): 1. The location must be comprised of a latitude, followed by a longitude, followed by an indication of the uncertainty of the lat/lon measurements. 2. The latitude must be a decimal with a value between -90 to The longitude must be a decimal with a value between -180 to For both latitude and longitude the number of digits to the right of the decimal point must be exactly six digits. 5. The value of uncertainty must be a non-negative integer 6. The uncertainty units must be either meters or feet. We can express all these data constraints using XML Schemas

Schemas 8 Validating your data check that the latitude is between -90 and +90 -check that the longitude is between -180 and check that the fraction digits is 6 for lat and lon... XML Schema validator Data is ok!

Schemas 9 Purpose of XML Schemas Idea is to specify:  the structure of instance documents "this element contains these elements, which contains these other elements …"  the datatype of each element/attribute "this element is an integer with the range 0 to 15,000"

Schemas 10 DTDs  It's a different syntax You write your XML (instance) document using one syntax and the DTD using another syntax --> bad, inconsistent  Limited datatype capability DTDs support a very limited capability for specifying datatypes. You can't, for example, express "I want the element to hold an integer with a range of 0 to 15,000" Desire a set of datatypes compatible with those found in databases  DTD supports 10 datatypes; XML Schemas supports 44+ datatypes DTD Limitations

Schemas 11 Advantages of Schemas Schemas’ advantages over DTDs:  Enhanced datatypes 44+ versus 10 Can create your own datatypes e.g. type must follow this pattern: ddd-dddd, where 'd' represents a digit"  Written in the same syntax as instance documents less syntax to remember  Object-oriented'ish Can extend or restrict a type ( derive new type definitions on the basis of old ones)  Can express sets, i.e., can define the child elements to occur in any order  Can specify element content as being unique (keys on content) and uniqueness within a region  Can define multiple elements with the same name but different content  Can define elements with nil content  Can define substitutable elements - e.g., the "Book" element is substitutable for the "Publication" element.

Schemas 12 BookStore.dtd

Schemas 13 ATTLIST ELEMENT ID #PCDATA NMTOKEN ENTITY CDATA BookStore Book Title Author Date ISBN Publisher This is the vocabulary that DTDs provide to define your new vocabulary DTD Vocabulary

Schemas 14 element complexType schema sequence string integer boolean BookStore Book Title Author Date ISBN Publisher (targetNamespace) This is the vocabulary that XML Schemas provide to define your new vocabulary One difference between XML Schemas and DTDs is that the XML Schema vocabulary is associated with a name (namespace). Likewise, the new vocabulary that you define must be associated with a name (namespace). Schema Vocabularity

Schemas 15 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> NetStore.xsd xsd = Xml-Schema Definition Example A

Schemas 16 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> <!ELEMENT Book (Title, Author, Date, ISBN, Publisher)> Explanation A1

Schemas 17 xsd:schema < xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> All XML Schemas have "schema" as the root element. Explanation A2

Schemas 18 xmlns:xsd=" <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> The elements and datatypes that are used to construct schemas - schema - element - complexType - sequence - string come from the namespace Explanation A3

Schemas 19 element complexType schema sequence string integer boolean XMLSchema Namespace

Schemas 20 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Says that the elements defined by this schema - BookStore - Book - Title - Author - Date - ISBN - Publisher are to go in this namespace Explanation A4

Schemas 21 BookStore Book Title Author Date ISBN Publisher (targetNamespace) Target namespace

Schemas 22 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> ref="Book" This is referencing a Book element declaration. The Book in what namespace? Since there is no namespace qualifier it is referencing the Book element in the default namespace, which is the targetNamespace! Thus, this is a reference to the Book element declaration in this schema. The default namespace is which is the targetNamespace! Explanation A5

Schemas 23 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> This is a directive to any instance documents which conform to this schema: Any elements used by the instance document which were declared in this schema must be namespace qualified. Explanation A6

Schemas 24 Referencing a schema in an XML instance document <BookStore xmlns =" xmlns:xsi=" xsi:schemaLocation=" BookStore.xsd"> My Life and Times Paul McCartney July, McMillin Publishing First, using a default namespace declaration, tell the schema-validator that all of the elements used in this instance document come from the namespace. 2. Second, with schemaLocation tell the schema-validator that the namespace is defined by BookStore.xsd (i.e., schemaLocation contains a pair of values). 3. Third, tell the schema-validator that the schemaLocation attribute we are using is the one in the XMLSchema-instance namespace

Schemas 25 Referencing a schema in an XML instance document NetStore.xml NetStore.xsd targetNamespace= " schemaLocation= " NetStore.xsd" - defines elements in namespace - uses elements from namespace A schema defines a new vocabulary. Instance documents use that new vocabulary.

Schemas 26 Note multiple levels of checking NetStore.xmlNetStore.xsd XMLSchema.xsd (schema-for-schemas) Validate that the xml document conforms to the rules described in NetStore.xsd Validate that NetStore.xsd is a valid schema document, i.e., it conforms to the rules described in the schema-for-schemas

Schemas 27 Default Value for minOccurs and maxOccurs The default value for minOccurs is "1" The default value for maxOccurs is "1" Equivalent!

Schemas 28 Qualify XMLSchema, Default targetNamespace In the first example, we explicitly qualified all elements from the XML Schema namespace. The targetNamespace was the default namespace. BookStore Book Title Author Date ISBN Publisher (targetNamespace) element complexType schema sequence string integer boolean

Schemas 29 Alternatively (equivalently), we can design our schema so that XMLSchema is the default namespace. (Example B) BookStore Book Title Author Date ISBN Publisher (targetNamespace) element complexType schema sequence string integer boolean Default XML Schema, Qualify targetNamespace

Schemas 30 xmlns=" <schema xmlns=" targetNamespace=" xmlns:bk=" elementFormDefault="qualified"> (see example02) Note that is the default namespace. Consequently, there are no namespace qualifiers on - schema - element - complexType - sequence - string Example B

Schemas 31 <schema xmlns=" targetNamespace=" xmlns:bk=" elementFormDefault="qualified"> Here we are referencing a Book element. Where is that Book element defined? In what namespace? The bk: prefix indicates what namespace this element is in. bk: has been set to be the same as the targetNamespace. Explanation B1

Schemas 32 "bk:" References the targetNamespace BookStore Book Title Author Date ISBN Publisher (targetNamespace) bk element complexType schema sequence string integer boolean Consequently, bk:Book refers to the Book element in the targetNamespace.

Schemas 33 Inlining Element Declarations In the previous examples we declared an element and then we ref’ed to that element declaration. Alternatively, we can inline the element declarations. On the following slide is an alternate (equivalent) way of representing the schema shown previously, using inlined element declarations.

Schemas 34 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Note that we have moved all the element declarations inline, and we are no longer refering to the element declarations. This results in a much more compact schema! Example C: Inlined

Schemas 35 Anonymous types (no name) <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Explanation C1

Schemas 36 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> The advantage of splitting out Book's element declarations and wrapping them in a named type is that now this type can be reused by other elements. Example D: Named types

Schemas 37 A simple type (e.g., xsd:string) or the name of a complexType (e.g., BookPublication) … 1 2 A nonnegative integer A nonnegative integer or "unbounded" Note: minOccurs and maxOccurs can only be used in nested (local) element declarations. Summary A of Declaring Elements

Schemas 38 The date Datatype A built-in datatype (i.e., schema validators know about this datatype) This datatype is used to represent a specific day (year-month-day) Elements declared to be of type date must follow this form: CCYY- MM-DD  range for CC is:  range for YY is:  range for MM is:  range for DD is: if month is if month is 2 and the gYear is a leap gYear if month is 4, 6, 9, or if month is 1, 3, 5, 7, 8, 10, or 12  Example: represents May 31, 1999

Schemas 39 The gYear Datatype A built-in datatype (Gregorian calendar year) Elements declared to be of type gYear must follow this form: CCYY  range for CC is:  range for YY is:  Example: 1999 indicates the gYear 1999

Schemas 40 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Here we are defining a new (user-defined) data- type, called ISBNType. Declaring Date to be of type gYear, and ISBN to be of type ISBNType (defined above) Example E

Schemas 41 "I hereby declare a new type called ISBNType. It is a restricted form of the string type. Elements declared of this type must conform to one of the following patterns: - First Pattern: 1 digit followed by a dash followed by 5 digits followed by another dash followed by 3 digits followed by another dash followed by 1 more digit, or - Second Pattern: 1 digit followed by a dash followed by 3 digits followed by another dash followed by 5 digits followed by another dash followed by 1 more digit, or - Third Pattern: 1 digit followed by a dash followed by 2 digits followed by another dash followed by 6 digits followed by another dash followed by 1 more digit." These patterns are specified using Regular Expressions. Explanation E1

Schemas 42 Equivalent Expressions The vertical bar means "or"

Schemas 43 or ? When do you use the complexType element and when do you use the simpleType element?  Use the complexType element when you want to define child elements and/or attributes of an element  Use the simpleType element when you want to create a new type that is a refinement of a built-in type (string, date, gYear, etc)

Schemas 44 Built-in Datatypes Primitive Datatypes  string  boolean  decimal  float  double  duration  dateTime  time  date  gYearMonth  gYear  gMonthDay Atomic, built-in  "Hello World"  {true, false, 1, 0}  7.08  12.56E3, 12, 12560, 0, -0, INF, -INF, NAN  P1Y2M3DT10H30M12.3S  format: CCYY-MM-DDThh-mm-ss  format: hh:mm:ss.sss  format: CCYY-MM-DD  format: CCYY-MM  format: CCYY  format: --MM-D D Note: 'T' is the date/time separator INF = infinity NAN = not-a-number

Schemas 45 Built-in Datatypes (cont.) Primitive Datatypes  gDay  gMonth  hexBinary  base64Binary  anyURI  QName  NOTATION Atomic, built-in  format: ---DD (note the 3 dashes)  format: --MM--  a hex string  a base64 string   a namespace qualified name  a NOTATION from the XML spec

Schemas 46 Derived types  normalizedString  token  language  IDREFS  ENTITIES  NMTOKEN  NMTOKENS  Name  NCName  ID  IDREF  ENTITY  integer  nonPositiveInteger Subtype of primitive datatype  A string without tabs, line feeds, or carriage returns  String w/o tabs, l/f, leading/trailing spaces, consecutive spaces  any valid xml:lang value, e.g., EN, FR,...  must be used only with attributes  part (no namespace qualifier)  must be used only with attributes  456  negative infinity to 0 Built in Datatypes (cont.)

Schemas 47 Derived types  negativeInteger  long  int  short  byte  nonNegativeInteger  unsignedLong  unsignedInt  unsignedShort  unsignedByte  positiveInteger Subtype of primitive datatype  negative infinity to -1  to  to  to  -127 to 128  0 to infinity  0 to  0 to  0 to  0 to 255  1 to infinity Note: the following types can only be used with attributes (which we will discuss later): ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, and ENTITIES. Built-in Datatypes (cont.)

Schemas 48 Creating own Datatypes A new datatype can be defined from an existing datatype (called the "base" type) by specifying values for one or more of the optional facets for the base type. Example. The string primitive datatype has six optional facets:  length  minLength  maxLength  pattern  enumeration  whitespace (legal values: preserve, replace, collapse)

Schemas 49 New Datatype by Specifying Facet Values 1. This creates a new datatype called 'TelephoneNumber'. 2. Elements of this type can hold string values, 3. But the string length must be exactly 8 characters long and 4. The string must follow the pattern: ddd-dddd, where 'd' represents a 'digit'. (Obviously, in this example the regular expression makes the length facet redundant.)

Schemas 50 Another Example This creates a new type called shape. An element declared to be of this type must have either the value circle, or triangle, or square.

Schemas 51 Facets of the integer Datatype The integer datatype has 8 optional facets:  totalDigits  pattern  whitespace  enumeration  maxInclusive  maxExclusive  minInclusive  minExclusive

Schemas 52 Example This creates a new datatype called 'EarthSurfaceElevation'. Elements declared to be of this type can hold an integer. However, the integer is restricted to have a value between and 29035, inclusive.

Schemas 53 General Form of Creating a New Datatype by Specifying Facet Values … Facets: - length - minlength - maxlength - pattern - enumeration - minInclusive - maxInclusive - minExclusive - maxExclusive... Sources: - string - boolean - number - float - double - duration - dateTime - time...

Schemas 54 Multiple Facets - "and" them together, or "or" them together? An element declared to be of type TelephoneNumber must be a string of length=8 and the string must follow the pattern: 3 digits, dash, 4 digits. An element declared to be of type shape must be a string with a value of either circle, or triangle, or square. Patterns, enumerations => "or" them together. All other facets => "and" them together

Schemas 55 Creating a simpleType from another simpleType Thus far we have created a simpleType using one of the built-in datatypes as our base type. However, we can create a simpleType that uses another simpleType as the base.

Schemas 56 This simpleType uses EarthSurfaceElevation as its base type. Example F

Schemas 57 Fixing a Facet Value Sometimes when we define a simpleType we want to require that one (or more) facet have an unchanging value. That is to make the facet a constant. simpleTypes which derive from this simpleType may not change this facet.

Schemas 58 Element Containing a User-Defined Simple Type Example. Create a schema element declaration for an elevation element. Declare the elevation element to be an integer with a range to Here's one way of declaring the elevation element:

Schemas 59 Element Containing a User-Defined Simple Type (cont.) Here's an alternative method for declaring elevation: The simpleType definition is defined inline, it is an anonymous simpleType definition. The disadvantage of this approach is that this simpleType may not be reused by other elements.

Schemas 60 … 1 2 … 3 Summary B of Declaring Elements

Schemas 61 Annotating Schemas The element is used for documenting the schema, both for humans and for programs.  Use for providing a comment to humans  Use for providing a comment to programs The content is any well-formed XML Note that annotations have no effect on schema validation The following constraint is not expressible with XML Schema: The value of element A should be greater than the value of element B. So, we need to use a separate tool (e.g., Schematron) to check this constraint. We will express this constraint in the appinfo section (below). A should be greater than B

Schemas 62 Where Can You Put Annotations? You cannot put annotations at just any random location in the schema. Here are the rules for where an annotation element can go:  annotations may occur before and after any global component  annotations may occur only at the beginning of non-global components

Schemas 63 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Can put annotations only at these locations Suppose that you want to annotate, say, the Date element declaration. What do we do? Example G

Schemas 64 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> How to annotate the Date element! Inline the annotation within the Date element declaration. Explanation G1

Schemas 65 Two Optional Attributes for the documentation Element In the previous example we showed with no attributes. Actually, it can have two attributes:  source: this attribute contains a URL to a file which contains supplemental information  xml:lang: this attribute specifies the language that the documentation was written in

Schemas 66 In the previous example was showed with no attributes. Actually, it can have one attribute:  source: this attribute contains a URL to a file which contains supplemental information Appinfo

Schemas 67 XML Schema Validate XML documents Automatic GUI generation Automatic API generation Semantic Web??? Smart Editor Ways of Using Schema

Schemas 68 Describing Metadata using Schemas XML Schema Strategy - two documents are used to provide metadata:  a schema document specifies the properties (metadata) for a class of resources (objects);  each instance document provides specific values for the properties.

Schemas 69 XML Schema: Specifies the Properties for a Class of Resources "For the class of Book resources, we identify five properties - Title, Author, Date, ISBN, and Publisher"

Schemas 70 Regular Expressions Recall that the string data type has a pattern facet. The value of a pattern facet is a regular expression. Below are some examples of regular expressions: Regular Expression - Chapter \d - Chapter \d - a*b - [xyz]b - a?b - a+b - [a-c]x Example - Chapter 1 - b, ab, aab, aaab, … - xb, yb, zb - b, ab - ab, aab, aaab, … - ax, bx, cx

Schemas 71 Regular Expressions (cont.) Regular Expression  [a-c]x  [-ac]x  [ac-]x  [^0-9]x  \Dx  Chapter\s\d  (ho){2} there  (ho\s){2} there .abc  (a|b)+x Example  ax, bx, cx  -x, ax, cx  ax, cx, -x  any non-digit char followed by x  Chapter followed by a blank followed by a digit  hoho there  any (one) char followed by abc  ax, bx, aax, bbx, abx, bax,...

Schemas 72 Regular Expressions (cont.) a{1,3}x a{2,}x \w\s\w ax, aax, aaax aax, aaax, aaaax, … word character (alphanumeric plus dash) followed by a space followed by a word character [a-zA-Z-[Ol]]* A string comprised of any lower and upper case letters, except "O" and "l" \. The period "." (Without the backward slash the period means "any character")

Schemas 73 Regular Expressions (cont.) \n \r \t \\ \| \- \^ \? \* \+ \{ \} \( \) \[ \] linefeed carriage return tab The backward slash \ The vertical bar | The hyphen - The caret ^ The question mark ? The asterisk * The plus sign + The open curly brace { The close curly brace } The open paren ( The close paren ) The open square bracket [ The close square bracket ]

Schemas 74 Regular Expressions (concluded) \p{L} \p{Lu} \p{Ll} \p{N} \p{Nd} \p{P} \p{Sc} A letter, from any language An uppercase letter, from any language A lowercase letter, from any language A number - Roman, fractions, etc A digit from any language A punctuation symbol A currency sign, from any language "currency sign from any language, followed by one or more digits from any language, optionally followed by a period and two digits from any language" $45.99 ¥300

Schemas 75 Example R.E. [1-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5] 0 to to to to 255 This regular expression restricts a string to have values between 0 and 255. … Such a R.E. might be useful in describing an IP address...

Schemas 76 IP Datatype Definition <xsd:pattern value="(([1-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3} ([1-9]?[0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])"> Datatype for representing IP addresses. Examples, , , etc. This datatype restricts each field of the IP address to have a value between zero and 255, i.e., [0-255].[0-255].[0-255].[0-255] Note: in the value attribute (above) the regular expression has been split over two lines. This is for readability purposes only. In practice the R.E. would all be on one line.

Schemas 77 Derived Types We can do a form of subclassing complexType definitions. We call this "derived types"  derive by extension: extend the parent complexType with more elements  derive by restriction: create a type which is a subset of the base type. There are two ways to subset the elements: redefine a base type element to have a restricted range of values, or redefine a base type element to have a more restricted number of occurrences.

Schemas 78 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Note that BookPublication extends the Publication type, i.e., we are doing Derive by Extension (see example06) Example H

Schemas 79 Elements declared to be of type BookPublication will have 5 child elements - Title, Author, Date, ISBN, and Publisher. Note that the elements in the derived type are appended to the elements in the base type. Explanation H1

Schemas 80 Title Author Date Publication ISBN Publisher BookPublication Explanation H2

Schemas 81 Deleting an element in the base type Note that in this subtype we have eliminated the Author element, i.e., the subtype is just comprised of an unbounded number of Title elements followed by a single Date element. If the base type has an element with minOccurs="0", and the subtype wishes to not have that element, then it can simply leave it out.

Schemas 82 Prohibiting Derivations Sometimes we may want to create a type and disallow all derivations of it, or just disallow extension derivations, or disallow restriction derivations.  Rationale: "For example, I may create a complexType and make it publicly available for others to use. However, I don't want them to extend it with their proprietary extensions or subset it to remove, say, copyright information." (Jon Cleaver) Publication cannot be extended nor restricted Publication cannot be restricted Publication cannot be extended

Schemas 83 Terminology: Declaration vs Definition In a schema:  You declare elements and attributes. Schema components that are declared are those that have a representation in an XML instance document.  You define components that are used just within the schema document(s). Schema components that are defined are those that have no representation in an XML instance document. Declarations: - element declarations - attribute declarations Definitions: - type (simple, complex) definitions - attribute group definitions - model group definitions

Schemas 84 Terminology: Global versus Local Global element declarations, global type definitions:  These are element declarations/type definitions that are immediate children of Local element declarations, local type definitions:  These are element declarations/type definitions that are nested within other elements/types.

Schemas 85 Global type definition Global element declaration Local element declarations Local type definition <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Example I

Schemas 86 Element Substitution In daily conversation there are several ways to express something.  In Boston we use the words "T" and "subway" interchangeably. For example, "we took the T into town", or "we took the subway into town". Thus, "T" and "subway" are substitutable. Which one is used may depend upon what part of the state you live in, what mood you're in, or any number of factors. We would like to be able to express this substitutability in XML Schemas.  That is, we would like to be able to declare in a schema an element called "subway", an element called "T", and state that "T"may be substituted for "subway". Instance documents can then use either or, depending on their preference.

Schemas 87 substitutionGroup We can define a group of substitutable elements (called a substitutionGroup) by declaring an element (called the head) and then declaring other elements which state that they are substitutable for the head element. subway is the head element T is substitutable for subway So what's the big deal? - Anywhere a head element can be used in an instance document, any member of the substitutionGroup can be substituted!

Schemas 88 Red Line Schema: Instance doc: Red Line Alternative instance doc (substitute T for subway): This example shows the element being substituted with the element. Example J

Schemas 89 Red Line Schema: Instance doc: Linea Roja Alternative instance doc (customized for our Spanish clients): Explanation J1

Schemas 90 BookType and MagazineType Derive from PublicationType PublicationType BookTypeMagazineType In order for Book and Magazine to be in a substitutionGroup with Publication, their type (BookType and MagazineType, respectively) must be the same as, or derived from Publication's type (PublicationType)

Schemas 91 Example K

Schemas 92 Illusions The Adventures of a Reluctant Messiah Richard Bach Dell Publishing Co. Natural Health 1999 The First and Last Freedom J. Krishnamurti Harper & Row can contain any element in the substitutionGroup with Publication! XML ex K

Schemas 93 <!ATTLIST Book Category (autobiography | non-fiction | fiction) #REQUIRED InStock (true | false) "false" Reviewer CDATA " "> BookStore.dtd Example L: Attributes

Schemas 94 (see example07) InStock (true | false) "false" Reviewer CDATA " " Category (autobiography | non-fiction | fiction) #REQUIRED Example M

Schemas 95 "Instance documents are required to have the Category attribute (as indicated by use="required"). The value of Category must be either autobiography, non-fiction, or fiction (as specified by the enumeration facets)." Note: attributes can only have simpleTypes (i.e., attributes cannot have child elements). Explanation M1

Schemas 96 Summary of Declaring Attributes required optional prohibited Do not use the "use" attribute if you use either default or fixed. xsd:string xsd:integer xsd:boolean... … 1 2

Schemas 97 Inlining Attributes On the next slide is another way of expressing the last example - the attributes are inlined within the Book declaration rather than being separately defined in an attributeGroup. (I only show a portion of the schema - the Book element declaration.)

Schemas 98 (see example08) Example N

Schemas 99 Notes about Attributes The attribute declarations always come last, after the element declarations. The attributes are always with respect to the element that they are defined (nested) within. … "bar and boo are attributes of foo"

Schemas 100 These attributes apply to the element they are nested within (Book) That is, Book has three attributes - Category, InStock, and Reviewer. cont.

Schemas 101 Element with Simple Content and Attributes Example. Consider this: 5440 The elevation element has these two constraints: - it has a simple (integer) content - it has an attribute called units How do we declare elevation? (see next slide)

Schemas elevation contains an attribute. - therefore, we must use 2. However, elevation does not contain child elements (which is what we generally use to indicate). Instead, elevation contains simpleContent. 3. We wish to extend the simpleContent (an integer) with an attribute cont.

Schemas 103 elevation - use Stronger Datatype In the declaration for elevation we allowed it to hold any integer. Further, we allowed the units attribute to hold any string. Let's restrict elevation to hold an integer with a range ,000 and let's restrict units to hold either the string "feet" or the string "meters"

Schemas 104 cont.

Schemas 105 Summary of Declaring Elements 1. Element with Simple Content. Declaring an element using a built-in type: Declaring an element using a user-defined simpleType: An alternative formulation of the above shapes example is to inline the simpleType definition:

Schemas 106 Summary of Declaring Elements (cont.) 2. Element Contains Child Elements Defining the child elements inline: An alternate formulation of the above Person example is to create a named complexType and then use that type:

Schemas 107 Summary of Declaring Elements (cont.) 3. Element Contains a complexType that is an Extension of another complexType

Schemas Element Contains a complexType that is a Restriction of another complexType Summary of Declaring Elements (cont.)

Schemas 109 Summary of Declaring Elements (concluded) 5. Element Contains Simple Content and Attributes Example. Large, green, sour

Schemas 110 complexContent versus simpleContent With complexContent you extend or restrict a complexType With simpleContent you extend or restrict a simpleType … X must be a complexType … Y must be a simpleType versus Do Lab 8.b, 8.c

Schemas 111 group Element The group element enables you to group together element declarations. Note: the group element is just for grouping together element declarations, no attribute declarations allowed!

Schemas 112 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> cont.

Schemas 113 Another example showing the use of the element cont.

Schemas 114 Expressing Alternates DTD: XML Schema: <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> (see example10) Note: the choice is an exclusive-or, that is, transportation can contain only one element - either train, or plane, or automobile.

Schemas 115 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> DTD: XML Schema: Notes: 1. An element can fix its value, using the fixed attribute. 2. When you don't specify a value for minOccurs, it defaults to "1". Same for maxOccurs. See the last example (transportation) where we used a element with no minOccurs or maxOccurs. (see example 11) Expressing Repeatable Choice

Schemas 116 fixed/default Element Values When you declare an element you can give it a fixed or default value.  Then, in the instance document, you can leave the element empty. … 0 or equivalently: … red or equivalently:

Schemas 117 Using and <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> DTD: XML Schema:

Schemas 118 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> XML Schema: Problem: create an element, Book, which contains Author, Title, Date, ISBN, and Publisher, in any order (Note: this is very difficult and ugly with DTDs). means that Book must contain all five child elements, but they may occur in any order. (see example 12) Expressing Any Order

Schemas 119 Constraints on using Elements declared within must have a maxOccurs value of "1" (minOccurs can be either "0" or "1") If a complexType uses and it extends another type, then that parent type must have empty content. The element cannot be nested within either,, or another The contents of must be just elements. It cannot contain or

Schemas 120 Empty Element <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Schema: Instance doc (snippet): DTD:

Schemas 121 No targetNamespace (noNamespaceSchemaLocation) Sometimes you may wish to create a schema but without putting the elements within a namespace. The targetNamespace attribute is actually an optional attribute of. Thus, if you don’t want to specify a namespace for your schema then simply don’t use the targetNamespace attribute. Consequences of having no namespace  1. In the instance document don’t namespace qualify the elements.  2. In the instance document, instead of using schemaLocation use noNamespaceSchemaLocation.

Schemas 122 Note that there is no targetNamespace attribute, and note that there is no longer a default namespace. <xsd:schema xmlns:xsd=" elementFormDefault="qualified"> cont.

Schemas 123 <BookStore xmlns:xsi=" xsi:noNamespaceSchemaLocation= "BookStore.xsd"> My Life and Times Paul McCartney McMillin Publishing … (see example14) 1. Note that there is no default namespace declaration. So, none of the elements are associated with a namespace. 2. Note that we do not use xsi:schemaLocation (since it requires a pair of values - a namespace and a URL to the schema for that namespace). Instead, we use xsi:noNamespaceSchemaLocation. cont.

Schemas 124 Assembling an Instance Document from Multiple Schema Documents An instance document may be composed of elements from multiple schemas. Validation can apply to the entire XML instance document, or to a single element.

Schemas 125 <Library xmlns:xsi=" xsi:schemaLocation= " Book.xsd Employee.xsd"> My Life and Times Paul McCartney Macmillan Publishing Illusions The Adventures of a Reluctant Messiah Richard Bach Dell Publishing Co. The First and Last Freedom J. Krishnamurti Harper & Row John Doe Sally Smith Validating against two schemas The elements are defined in Book.xsd, and the elements are defined in Employee.xsd. The,, and elements are not defined in any schema! 1. A schema validator will validate each Book element against Book.xsd. 2. It will validate each Employee element against Employee.xsd. 3. It will not validate the other elements. cont.

Schemas 126 Assembling a Schema from Multiple Schema Documents The include element allows you to access components in other schemas  All the schemas you include must have the same namespace as your schema (i.e., the schema that is doing the include)  The net effect of include is as though you had typed all the definitions directly into the containing schema … LibraryBook.xsd LibraryEmployee.xsd Library.xsd

Schemas 127 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Library.xsd These are referencing element declarations in the other schemas. Nice! Library.xsd

Schemas 128 Assembling a Schema from Multiple Schema Documents with Different Namespaces The import element allows you to access elements and types in a different namespace <xsd:import namespace="A" schemaLocation="A.xsd"/> <xsd:import namespace="B" schemaLocation="B.xsd"/> … Namespace A A.xsd Namespace B B.xsd C.xsd

Schemas 129 Camera Schema Camera.xsd Nikon.xsd Olympus.xsd Pentax.xsd Ex: Multiple Schemas

Schemas 130 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns:nikon=" xmlns:olympus=" xmlns:pentax=" elementFormDefault="qualified"> <xsd:import namespace=" schemaLocation="Nikon.xsd"/> <xsd:import namespace=" schemaLocation="Olympus.xsd"/> <xsd:import namespace=" schemaLocation="Pentax.xsd"/> Camera.xsd (see example 18) These import elements give us access to the components in these other schemas. Here I am using the body_type that is defined in the Nikon namespace Ex cont.

Schemas 131 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Nikon.xsd <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Olympus.xsd <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Pentax.xsd Ex. cont.

Schemas 132 <c:camera xmlns:c=" xmlns:nikon=" xmlns:olympus=" xmlns:pentax=" xmlns:xsi=" xsi:schemaLocation= " Camera.xsd Nikon.xsd Olympus.xsd Pentax.xsd"> Ergonomically designed casing for easy handling 300mm 1.2 1/10,000 sec to 100 sec The Camera instance uses elements from the Nikon, Olympus, and Pentax namespaces. Camera.xml Ex. cont.

Schemas 133 Creating Lists There are times when you will want an element to contain a list of values, e.g., "The contents of the Numbers element is a list of numbers". Example: For a document containing a Lottery drawing we might have How do we declare the element Numbers... (1) To contain a list of integers, and (2) Each integer is restricted to be between 1 and 99, and (3) The total number of integers in the list is exactly six.

Schemas 134 <LotteryDrawings xmlns=" xmlns:xsi=" xsi:schemaLocation= " Lottery.xsd"> July July July Lottery.xml (see example19) cont.

Schemas 135 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> Lottery.xsd cont.

Schemas 136 LotteryNumbers --> Need Stronger Datatyping The list in the previous schema has two problems:  It allows to contain an arbitrarily long list  The numbers in the list may be any positiveInteger We need to:  Restrict the list to length value="6"  Restrict the numbers to maxInclusive value="99"

Schemas 137 <xsd:schema xmlns:xsd=" targetNamespace=" xmlns=" elementFormDefault="qualified"> cont.

Schemas 138 NumbersList is a list where the type of each item is OneToNintyNine. LotteryNumbers restricts NumbersList to a length of six (i.e., an element declared to be of type LotteryNumbers must hold a list of numbers, between 1 and 99, and the length of the list must be exactly six). cont.

Schemas 139 Alternatively, This is read as: "We are creating a new type called LotteryNumbers. It is a restriction. At this point we can either use the base attribute or a simpleType child element to indicate the type that we are restricting (you cannot use both the base attribute and the simpleType child element). We want to restrict the type that is a list of OneToNintyNine. We will restrict that type to a length of 6."

Schemas simpleType that declares a list type: where the datatype OneToNintyNine is declared as: 4. An alternate form of the above, where the list's datatype is specified using an inlined simpleType: Summary of Declaring simpleTypes

Schemas simpleType that declares a union type: where the datatype UnboundedType is declared as: 6. An alternate form of the above, where the datatype UnboundedType is specified using an inline simpleType: cont.

Schemas 142 any Element The element enables the instance document author to extend his/her document with elements not specified by the schema. Now an instance document author can optionally extend (after ) the content of elements with any element.

Schemas 143 Strategy for Defining Semantics of your XML Elements Capture the semantics in the XML Schema  Describe the semantics within the element  Adopt the convention that every element and attribute have an annotation which provides information on the meaning Advantages:  The XML Schema will capture the data structure, meta-data, and relationships between the elements  Use of strong typing will capture much of the data content  The annotations can capture definitions and other explanatory information  The structure of the "definitions" will always be consistent with the structure used in the schema since they are linked  Since the schema itself is an XML document, we can use XSLT to extract the annotations and transform the "semantic" information into a format suitable for human consumption

Schemas 144 Cont.