Download presentation
Presentation is loading. Please wait.
2
XML Documents and Schema in greater depth
3
In one sense XML is … A language neutral way of representing structured data A language neutral way of representing structured data Analogy to serialized object is easiest to understand in this context Analogy to serialized object is easiest to understand in this context Great intermediate data format for applications to talk cross-platform, cross-language, etc. Great intermediate data format for applications to talk cross-platform, cross-language, etc.
4
Equivalently, XML is A flexible format for describing any kind of data (document). A flexible format for describing any kind of data (document). like HTML like HTML But you define can whatever tags you want for your application. But you define can whatever tags you want for your application. Actually more like SGML Actually more like SGML HTML is really a Document Type in SGML (Standard Generalized Markup Language) HTML is really a Document Type in SGML (Standard Generalized Markup Language) A self-describing format: A self-describing format: an XML document gives complete information about what fields values are associated with an XML document gives complete information about what fields values are associated with an application doesn’t have to infer the field names from the order. an application doesn’t have to infer the field names from the order. It just describes a document It just describes a document Doesn't say what it means. Doesn't say what it means. Doesn't tell how to display it. Doesn't tell how to display it.
5
Some sample XML documents
6
Article example Direct Marketer Offended by Term 'Junk Mail' Joe Garden Tim Harrod Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of Savings, took umbrage Monday at the use of the term "junk mail." http://www.theonion.com/archive/3-11-01.html
7
Order / Whitespace Note that element order is important, but whitespace is not. This is the same as far as the xml parser is concerned: Direct Marketer Offended by Term 'Junk Mail' Joe Garden Tim Harrod Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of Savings, took umbrage Monday at the use of the term "junk mail." http://www.theonion.com/archive/3-11-01.html
8
Molecule Example <CML> <ATOMS> H O H H O H </ATOMS><BONDS> 1 2 1 2 2 3 2 3 1 1 1 1 </BONDS></MOL></CML>
9
Rooms example 10 10 <equipmentList> Projector Projector </room> 5 5 No Roof No Roof </rooms>
10
Suggestion Try building each of those documents in XMLSpy. Try building each of those documents in XMLSpy. Note: it is not required to create a schema to do this. Just create new XML document and start building. Note: it is not required to create a schema to do this. Just create new XML document and start building.
11
Dissecting an XML Document
12
Things that can appear in an XML document ELEMENTS: simple, complex, empty, or mixed content; attributes. ELEMENTS: simple, complex, empty, or mixed content; attributes. The XML declaration The XML declaration Processing instructions(PIs) Processing instructions(PIs) Most common is Most common is Comments Comments
13
Begin Tags End Tags Tags Attributes H O H H O H 1 2 1 2 2 3 2 3 1 1 1 1 Parts of an XML document Declaration Attribute Values An XML element is everything from (including) the element's start tag to (including) the element's end tag.
14
XML and Trees Tags give the structure of a document. They divide the document up into Elements, starting at the top most element, the root element. The stuff inside an element is its content – content can Tags give the structure of a document. They divide the document up into Elements, starting at the top most element, the root element. The stuff inside an element is its content – content can include other elements along with ‘character data’ CML MOL ATOMSBONDS ARRAY HOH 122311 Root element CDATA sections
15
XML and Trees H O H H O H 1 2 1 2 2 3 2 3 1 1 1 1 CML MOL ATOMSBONDS ARRAY HOH 122311 Root element Data sections
16
XML and Trees rooms room capacity equipmentlist equipment capacity room features feature 10 projector 5 No Roof
17
More detail on elements
18
Element relationships My First XML Introduction to XML What is HTML What is XML XML Syntax Elements must have a closing tag Elements must be properly nested Book is the root element. Title, prod, and chapter are child elements of book. Book is the parent element of title, prod, and chapter. Title, prod, and chapter are siblings (or sister elements) because they have the same parent.
19
Element content Elements can have different content types. An XML element is everything from (including) the element's start tag to (including) the element's end tag. An element can have element content, mixed content, simple content, or empty content. An element can also have attributes. In the previous example, book has element content, because it contains other elements. Chapter has mixed content because it contains both text and other elements. Para has simple content (or text content) because it contains only text. Prod has empty content, because it carries no information.
20
Element naming XML elements must follow these naming rules: Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml..) Names cannot contain spaces Take care when you "invent" element names and follow these simple rules: Any name can be used, no words are reserved, but the idea is to make names descriptive. Names with an underscore separator are nice. Examples:,.
21
Element naming, cont. Avoid "-" and "." in names. For example, if you name something "first-name,“ it could be a mess if your software tries to subtract name from first. Or if you name something "first.name," your software may think that "name" is a property of the object "first." Element names can be as long as you like, but don't exaggerate. Names should be short and simple, like this: not like this:. XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents. Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support them. The ":" should not be used in element names because it is reserved to be used for something called namespaces (more later).
22
Well formed XML
23
Well-formed vs Valid Recall that an XML document is said to be well-formed if it obeys basic semantic and syntactic constraints. Recall that an XML document is said to be well-formed if it obeys basic semantic and syntactic constraints. This is different from a valid XML document, which (as we will see in more depth) properly matches a schema. This is different from a valid XML document, which (as we will see in more depth) properly matches a schema.
24
Rules for Well-Formed XML An XML document is considered well-formed if it obeys the following rules: An XML document is considered well-formed if it obeys the following rules: There must be one element that contains all others (root element) There must be one element that contains all others (root element) All tags must be balanced All tags must be balanced...... Tags must be nested properly: Tags must be nested properly: This is OK This is OK This is definitely NOT OK This is definitely NOT OK Text is case-sensitive so Text is case-sensitive so This is not ok, even though we do it all the time in HTML! This is not ok, even though we do it all the time in HTML!
25
More Rules for Well-Formed XML The attributes in a tag must be in quotes The attributes in a tag must be in quotes Comments are allowed Comments are allowed Must begin with Must begin with Special characters must be escaped: the most common are Special characters must be escaped: the most common are & & x < y+2x x < y+2x An XML document that obeys these rules is Well-Formed An XML document that obeys these rules is Well-Formed
26
Creating XML There are many XML editors. There are many XML editors. Xeena Xeena XMLSpy XMLSpy Xeena on the CSPP machines Xeena on the CSPP machines Like HTML, text editors are frequently the only thing available or the only thing that produces what you want Like HTML, text editors are frequently the only thing available or the only thing that produces what you want Test in IE6 or NetScape 7.0 Test in IE6 or NetScape 7.0
27
Next Step XML Schema
28
XML allows any sort of tag you want. XML allows any sort of tag you want. In a given application, you want to fix a vocabulary -- what tags make sense. In a given application, you want to fix a vocabulary -- what tags make sense. Use a Schema to define an XML dialect Use a Schema to define an XML dialect MusicXML, VoiceXML, ADXML, etc. MusicXML, VoiceXML, ADXML, etc. Restrict documents to those tags. Restrict documents to those tags. Anyone who has your Schema can validate their document to see if it obeys the rules of the dialect. Anyone who has your Schema can validate their document to see if it obeys the rules of the dialect.
29
Schema determine … Schema determine … What sort of elements can appear in the document. What sort of elements can appear in the document. What elements MUST appear What elements MUST appear Which elements can appear as part of another element Which elements can appear as part of another element What attributes can appear or must appear What attributes can appear or must appear What kind of values can/must be in an attribute. What kind of values can/must be in an attribute.
30
Rooms XML Schema <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
31
Bookings XML Schema Note that there are four global types in this document!
32
Bookings, cont.
33
Bookings, cont.
34
An Example Bookings Document 2003 April 1 Democratic Party Green Room Republican Party Red Room
35
XML Schema (Document Type Definition) A Schema (or the older DTD) is a specification: it specifies the language that you speak. A Schema (or the older DTD) is a specification: it specifies the language that you speak. Check the DTDs for musicxml, adxml, etc. that are available off the course webpage Check the DTDs for musicxml, adxml, etc. that are available off the course webpage These give you the basic structure of each of these applications. These give you the basic structure of each of these applications. Not many schemas available, but much better Not many schemas available, but much better As we said before, like a user-defined type in a programming language. Also somewhat analogous to a database schema As we said before, like a user-defined type in a programming language. Also somewhat analogous to a database schema says what are the components that can appear says what are the components that can appear gives default values and restrictions. gives default values and restrictions.
36
Dissecting Schema
37
What’s in a Schema? A Schema is an XML document (a DTD is not) A Schema is an XML document (a DTD is not) Because it is an XML document, it must have a root element Because it is an XML document, it must have a root element The root element is The root element is Within the root element, there can be Within the root element, there can be Any number and combination of Any number and combination of Inclusions Inclusions Imports Imports Re-definitions Re-definitions Annotations Annotations Followed by any number and combinations of Followed by any number and combinations of Simple and complex data type definitions Simple and complex data type definitions Element and attribute definitions Element and attribute definitions Model group definitions Model group definitions Annotations Annotations
38
Structure of a Schema <schema>.................. </schema>
39
Simple Types
40
Elements What is a Simple Element? What is a Simple Element? A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes. A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes. Can also add restrictions (facets) to a data type in order to limit its content, and you can require the data to match a defined pattern. Can also add restrictions (facets) to a data type in order to limit its content, and you can require the data to match a defined pattern.
41
Example Simple Element The syntax for defining a simple element is: where xxx is the name of the element and yyy is the data type of the element. Here are some XML elements: Refsnes 34 1968-03-27 And here are the corresponding simple element definitions:
42
Common XML Schema Data Types XML Schema has a lot of built-in data types. Here is a list of the most common types: XML Schema has a lot of built-in data types. Here is a list of the most common types: xs:string xs:string xs:decimal xs:decimal xs:integer xs:integer xs:boolean xs:boolean xs:date xs:date xs:time xs:time
43
Declare Default and Fixed Values for Simple Elements Simple elements can have a default value OR a fixed value set. Simple elements can have a default value OR a fixed value set. A default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red": A default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red": A fixed value is also automatically assigned to the element. You cannot specify another value. In the following example the fixed value is "red": A fixed value is also automatically assigned to the element. You cannot specify another value. In the following example the fixed value is "red":
44
Attributes (Another simple type) All attributes are declared as simple types. All attributes are declared as simple types. Only complex elements can have attributes! Only complex elements can have attributes!
45
What is an Attribute? Simple elements cannot have attributes. Simple elements cannot have attributes. If an element has attributes, it is considered to be of complex type. If an element has attributes, it is considered to be of complex type. But the attribute itself is always declared as a simple type. But the attribute itself is always declared as a simple type. This means that an element with attributes always has a complex type definition. This means that an element with attributes always has a complex type definition.
46
How to Define an Attribute The syntax for defining an attribute is: The syntax for defining an attribute is: where xxx is the name of the attribute and yyy is the data type of the attribute. Here are an XML element with an attribute: where xxx is the name of the attribute and yyy is the data type of the attribute. Here are an XML element with an attribute: Smith Smith And here are a corresponding simple attribute definition: And here are a corresponding simple attribute definition:
47
Declare Default and Fixed Values for Attributes Attributes can have a default value OR a fixed value specified. Attributes can have a default value OR a fixed value specified. A default value is automatically assigned to the attribute when no other value is specified. In the following example the default value is "EN": A default value is automatically assigned to the attribute when no other value is specified. In the following example the default value is "EN": A fixed value is also automatically assigned to the attribute. You cannot specify another value. In the following example the fixed value is "EN": A fixed value is also automatically assigned to the attribute. You cannot specify another value. In the following example the fixed value is "EN":
48
Creating Optional and Required Attributes All attributes are optional by default. To explicitly specify that the attribute is optional, use the "use" attribute: All attributes are optional by default. To explicitly specify that the attribute is optional, use the "use" attribute: To make an attribute required:
49
Restrictions As we will see later, simple types can have ranges put on their values As we will see later, simple types can have ranges put on their values These are known as restrictions These are known as restrictions
50
Complex Types
51
Complex Elements A complex element is an XML element that contains other elements and/or attributes. A complex element is an XML element that contains other elements and/or attributes. There are four kinds of complex elements: There are four kinds of complex elements: empty elements empty elements elements that contain only other elements elements that contain only other elements elements that contain only text elements that contain only text elements that contain both other elements and text elements that contain both other elements and text Note: Each of these elements may contain attributes as well! Note: Each of these elements may contain attributes as well!
52
Examples of Complex XML Elements A complex XML element, "product", which is empty: A complex XML element, "product", which is empty: A complex XML element, "employee", which contains only other elements: A complex XML element, "employee", which contains only other elements: John John Smith Smith A complex XML element, "food", which contains only text: A complex XML element, "food", which contains only text: Ice cream Ice cream
53
Examples, cont. A complex XML element, "description", which contains both elements and text: A complex XML element, "description", which contains both elements and text: It happened on 03.03.99.... It happened on 03.03.99....
54
An Example XML Schema <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="feature" type="xs:string" maxOccurs="unbounded"/>
55
Referencing XML Schema in XML documents
56
Sample Schema header The element may contain some attributes. A schema declaration often looks something like this: The element may contain some attributes. A schema declaration often looks something like this:............
57
Schema headers, cont. The following fragment: The following fragment: xmlns:xs=http://www.w3.org/2001/XMLSchema xmlns:xs=http://www.w3.org/2001/XMLSchemahttp://www.w3.org/2001/XMLSchema indicates that the elements and data types used in the schema (schema, element, complexType, sequence, string, boolean, etc.) come from the "http://www.w3.org/2001/XMLSchema" namespace. indicates that the elements and data types used in the schema (schema, element, complexType, sequence, string, boolean, etc.) come from the "http://www.w3.org/2001/XMLSchema" namespace. It also specifies that the elements and data types that come from the "http://www.w3.org/2001/XMLSchema" namespace should be prefixed with xs: !! It also specifies that the elements and data types that come from the "http://www.w3.org/2001/XMLSchema" namespace should be prefixed with xs: !!
58
Schema header, cont. This fragment: This fragment: targetNamespace=http://www.w3schools.com targetNamespace=http://www.w3schools.comhttp://www.w3schools.com indicates that the elements defined by this schema (note, to, from, heading, body.) come from the "http://www.w3schools.com" namespace. indicates that the elements defined by this schema (note, to, from, heading, body.) come from the "http://www.w3schools.com" namespace. This fragment: This fragment: xmlns=http://www.w3schools.com xmlns=http://www.w3schools.comhttp://www.w3schools.com indicates that the default namespace is "http://www.w3schools.com". indicates that the default namespace is "http://www.w3schools.com". This fragment: This fragment: elementFormDefault="qualified“ elementFormDefault="qualified“ indicates that any elements used by the XML instance document which were declared in this schema must be namespace qualified. indicates that any elements used by the XML instance document which were declared in this schema must be namespace qualified.
59
Referencing schema in XML This XML document has a reference to an XML Schema: This XML document has a reference to an XML Schema: Tove Tove Jani Jani Reminder Reminder Don't forget me this weekend! Don't forget me this weekend!
60
Referencing schema in xml, cont. The following fragment: The following fragment: xmlns=http://www.w3schools.com xmlns=http://www.w3schools.comhttp://www.w3schools.com specifies the default namespace declaration. specifies the default namespace declaration. This declaration tells the schema-validator that all the elements used in this XML document are declared in the "http://www.w3schools.com" namespace. This declaration tells the schema-validator that all the elements used in this XML document are declared in the "http://www.w3schools.com" namespace.
61
… Once you have the XML Schema Instance namespace available: Once you have the XML Schema Instance namespace available: xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:xsi=http://www.w3.org/2001/XMLSchema-instancehttp://www.w3.org/2001/XMLSchema-instance you can use the schemaLocation attribute. This attribute has two values. The first value is the namespace to use. The second value is the location of the XML schema to use for that namespace: you can use the schemaLocation attribute. This attribute has two values. The first value is the namespace to use. The second value is the location of the XML schema to use for that namespace: xsi:schemaLocation="http://www.w3schools.com note.xsd" xsi:schemaLocation="http://www.w3schools.com note.xsd"
63
Using References
64
You don't have to have the content of an element defined in the nested fashion as just shown You don't have to have the content of an element defined in the nested fashion as just shown <xs:sequence> You can define the element elsewhere and use a reference t o it instead You can define the element elsewhere and use a reference t o it instead </xs:sequence></xs:complexType></xs:element> …</xs:element>
65
Rooms Schema using References <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> elementFormDefault="qualified" attributeFormDefault="unqualified"> </xs:element></xs:schema>
66
A Rooms Schema using References (Cont.) <xs:element name="feature" type="xs:string" maxOccurs="unbounded"/> maxOccurs="unbounded"/> </xs:element></xs:schema>
67
A Rooms Schema using References (Graphical)
68
A Rooms Schema using References (Graphical Cont.)
70
Types Both elements and attributes have types, which are defined in the Schema. One can reuse types by giving them names. <xsd:complexType><xsd:sequence> </xsd:sequence></xsd:complexType></xsd:element> <xsd:sequence> </xsd:sequence></xsd:complexType></xsd:element> OR
71
Other XML Schema Features Foreign key facility (uses Xpath) Foreign key facility (uses Xpath) Rich datatype facility Rich datatype facility Build up datatypes by inheritance Don’t need to list all of the attributes (can say "these attributes plus others"). Restrict strings using regular expressions Namespace aware. Namespace aware. Can restrict location of an element based on a namespaces
72
Restrictions
73
Datatype Restrictions A DTD can only say that price can be any non-markup text. Like this translated to Schemas A DTD can only say that price can be any non-markup text. Like this translated to Schemas But in Schema you can do better: But in Schema you can do better: Or even, make your own restrictions Or even, make your own restrictions </xsd:simpleType>
74
Restriction Ranges The restrictions must be "derived" from a base type, so it's object based The restrictions must be "derived" from a base type, so it's object based <xs:simpleType> </xs:restriction></xs:simpleType></xs:element> Preceding "derived" from "integer" Preceding "derived" from "integer" Has 2 restrictions (called "facets") Has 2 restrictions (called "facets") The first says that it must be greater than 41 The first says that it must be greater than 41 The second says that it must be less than 43 The second says that it must be less than 43 XML file is "42" XML file is "42" 42 42
75
FacetDescription enumeration Defines a list of acceptable values fractionDigits The maximum number of decimal places allowed. >=0 length The exact number of characters or list items allowed. >=0 maxExclusive The upper bounds for numeric values (the value must be less than the value specified) maxInclusive The upper bounds for numeric values (the value must be less than or equal to the value specified) maxLength The maximum number of characters or list items allowed. >=0 minExclusive The lower bounds for numeric values (the value must be greater than the value specified) minInclusive The lower bounds for numeric values (the value must be greater than or equal to the value specified) minLength The minimum number of characters or list items allowed >=0 pattern The sequence of acceptable characters based on a regular expression totalDigits The exact number of digits allowed. >0 whiteSpace Specifies how white space (line feeds, tabs, spaces, and carriage returns) is handled
76
Enumeration Facet <xs:simpleType> </xs:restriction></xs:simpleType></xs:element>
77
Patterns (Regular Expressions) One interesting facet is the pattern, which allows restrictions based on a regular expression One interesting facet is the pattern, which allows restrictions based on a regular expression This regular expression specifies a normal word of one or more characters: This regular expression specifies a normal word of one or more characters: </xs:restriction></xs:simpleType></xs:element>
78
Patterns (Regular Expressions) Individual characters may be repeated a specific number of times in the regular expression. Individual characters may be repeated a specific number of times in the regular expression. The following regular expression restricts the string to exactly 8 alpha-numeric characters: The following regular expression restricts the string to exactly 8 alpha-numeric characters: <xs:simpleType> </xs:restriction></xs:simpleType></xs:element>
79
Whitespace facet The "whitespace" facet controls how whitespace in the element will be processed The "whitespace" facet controls how whitespace in the element will be processed There are three possible values to the whitespace facet There are three possible values to the whitespace facet "preserve" causes the processor to keep all whitespace as-is "preserve" causes the processor to keep all whitespace as-is "replace" causes the processor to replace all whitespace characters (tabs, carriage returns, line feeds, spaces) with space characters "replace" causes the processor to replace all whitespace characters (tabs, carriage returns, line feeds, spaces) with space characters "collapse" causes the processor to replace all strings of whitespace characters (tabs, carriage returns, line feeds, spaces) with a single space character "collapse" causes the processor to replace all strings of whitespace characters (tabs, carriage returns, line feeds, spaces) with a single space character<xs:simpleType> </xs:restriction></xs:simpleType>
80
Both elements and attributes have types, which are defined in the Schema. One can reuse types by giving them names. Addr.xsd: <xsd:complexType><xsd:sequence> </xsd:sequence></xsd:complexType></xsd:element> <xsd:sequence> </xsd:sequence></xsd:complexType> Types OR
81
Types The usage in the XML file is identical: The usage in the XML file is identical: 1108 E. 58th St. 1108 E. 58th St. Ryerson 155 Ryerson 155 60637 60637 1108 E. 58th St. 1108 E. 58th St. Ryerson 155 Ryerson 155 60637 60637 </PurchaseOrder>
82
Type Extensions A third way of creating a complex type is to extend another complex type (like OO inheritance) A third way of creating a complex type is to extend another complex type (like OO inheritance) <xs:sequence> </xs:sequence></xs:complexType> <xs:complexContent> <xs:sequence> </xs:sequence></xs:extension></xs:complexContent></xs:complexType>
83
Type Extensions (use) To use a type that is an extension of another, it is as though it were all defined in a single type To use a type that is an extension of another, it is as though it were all defined in a single type <FirstName>King</FirstName><LastName>Arthur</LastName> Round Table Round Table <City>Camelot</City><Country>England</Country></Employee>
84
Simple Content in Complex Type If a type contains only simple content (text and attributes), a element can be put inside the If a type contains only simple content (text and attributes), a element can be put inside the must have either a or a must have either a or a This example is from the (Bridge of Death) Episode Dialog: This example is from the (Bridge of Death) Episode Dialog: <xs:complexType> <xs:attribute name="speaker" <xs:attribute name="speaker" type="xs:string" use="required"/> </xs:complexType></xs:element>
85
Model Groups Model Groups are used to define an element that has Model Groups are used to define an element that has mixed content (elements and text mixed) mixed content (elements and text mixed) element content element content Model Groups can be Model Groups can be all all the elements specified must all be there, but in any order the elements specified must all be there, but in any order choice choice any of the elements specified may or may not be there any of the elements specified may or may not be there sequence sequence all of the elements specified must appear in the specified order all of the elements specified must appear in the specified order
86
"All" Model Group The following schema specifies 3 elements and mixed content The following schema specifies 3 elements and mixed content </xs:all></xs:complexType></xs:element> The following XML file is valid in the above schema The following XML file is valid in the above schema Title: The Holy Grail Title: The Holy Grail Published: Moose Published: Moose Author: Monty Python Author: Monty Python </BookCover>
87
Attributes Attributes <xs:attribute name="speaker" <xs:attribute name="speaker" type="xs:string" use="required"/> … The attribute declaration is part of the type of the element.
88
Attributes <xsd:complexType><xsd:sequence> </xsd:sequence> <xsd:simpleType> </xsd:restriction></xsd:simpleType></xsd:attribute></xsd:complexType></xsd:element> If an attribute type is more complicated than a basic type, then we spell out the type in a type declaration.
89
Optional and Required Attributes All attributes are optional by default. To explicitly specify that the attribute is optional, use the "use" attribute: All attributes are optional by default. To explicitly specify that the attribute is optional, use the "use" attribute: To make an attribute required: To make an attribute required:
90
Referencing an XML Schema Can apply different validation rules to different elements in the document! need example also schema include & import
91
Other XML Schema Features Foreign key facility (uses Xpath) Foreign key facility (uses Xpath) Rich datatype facility Rich datatype facility Build up datatypes by inheritance Don’t need to list all of the attributes (can say “these attributes plus others). Restrict strings using regular expressions Namespace aware. Namespace aware. Can restrict location of an element based on a namespaces
92
XML Schema Status Became a W3C recommendation Spring 2001 Became a W3C recommendation Spring 2001 World domination expected imminently. World domination expected imminently. Supported in Xalan. Supported in Xalan. Supported in XML spy and other editor/validators. Supported in XML spy and other editor/validators. On the other hand: On the other hand: More complex than DTDs. More complex than DTDs. Ultra verbose. Ultra verbose.
93
Validating a Schema By using Xeena or XMLspy or XML Notepad. By using Xeena or XMLspy or XML Notepad. When publishing hand-written XML docs, this is the way to go. When publishing hand-written XML docs, this is the way to go. By using a Java program that performs validation. By using a Java program that performs validation. When validating on-the-fly, must do it this way When validating on-the-fly, must do it this way
94
Some guidelines for Schema design
95
Designing a Schema Analogous to database schema design --- look for intuitive names Analogous to database schema design --- look for intuitive names Can start with an E-R diagram, and then convert Can start with an E-R diagram, and then convert Attributes to Attributes Attributes to Attributes Subobjects to Subelements Subobjects to Subelements Relationships to IDREFS Relationships to IDREFS Normalization? Still makes sense to avoid repetition whenever possible– Normalization? Still makes sense to avoid repetition whenever possible– If you have an Enrolment document, only list Ids of students, not their names. If you have an Enrolment document, only list Ids of students, not their names. Store names in a separate document Store names in a separate document Leave it to tools to connect them Leave it to tools to connect them
96
Designing a Schema (cont.) Difficulties: Difficulties: Many more degrees of freedom than with database schemas: Many more degrees of freedom than with database schemas: e.g. one can associate information with something by including it as an attribute or a subelement. e.g. one can associate information with something by including it as an attribute or a subelement. <ADDRESS> Martin Sheen Martin Sheen … 4145 4145 </ADDRESS> ELEMENTS are more extensible – use when there is a possibility that more substructure will be added. ELEMENTS are more extensible – use when there is a possibility that more substructure will be added. ATTRIBUTES are easier to search on. ATTRIBUTES are easier to search on.
97
“Rules” for Designing a Schema Never leave structure out. The following is definitely a bad idea: Never leave structure out. The following is definitely a bad idea: Martin Sheen 1222 Alameda Drive, Carmel, CA 40145 Martin Sheen 1222 Alameda Drive, Carmel, CA 40145 Better would be: Better would be: Or: Or:<ADDRESS><name><first>Martin</first><last>Sheen</last></name><street> 1222 Alameda Drive 1222 Alameda Drive </street><city>Carmel</city><state>CA</state><zip>40145</zip></ADDRESS>
98
More“Rules” for Designing a Schema When to use Elements (instead of attributes) When to use Elements (instead of attributes) Do not put large text blocks inside an attribute Do not put large text blocks inside an attribute (Bad Idea) <book type=“memoir” content=“Bravely bold Sir Robin rode forth from Camelot. (Bad Idea) <book type=“memoir” content=“Bravely bold Sir Robin rode forth from Camelot. He was not afraid to die, O brave Sir Robin. He was not at all afraid to be killed in nasty ways, Brave, brave, brave, brave Sir Robin! He was not in the least bit scared to be mashed into a pulp, Or to have his eyes gouged out and his elbows broken, To have his kneecaps split and his body burned away And his limbs all hacked and mangled, brave Sir Robin! His head smashed in and his heart cut out And his liver removed and his bowels unplugged…”> Elements are more flexible, so use an Element if you think you might have to add more substructure later on. Elements are more flexible, so use an Element if you think you might have to add more substructure later on.
99
More “Rules” for Designing Schemas More on when to use Elements (instead of Attributes) More on when to use Elements (instead of Attributes) Use an embedded element when the information you are recording is a constituent part of the parent element Use an embedded element when the information you are recording is a constituent part of the parent element one's head and one's height are both inherent to a human being, one's head and one's height are both inherent to a human being, you can't be a conventionally structured human being without having a head and having a height you can't be a conventionally structured human being without having a head and having a height One's head is a constituent part and one's height isn't -- you can cut off my head, but not my height One's head is a constituent part and one's height isn't -- you can cut off my head, but not my height use embedded elements for complex structure validation (obvious) use embedded elements for complex structure validation (obvious) use embedded elements when you need to show order (attributes are not ordered) use embedded elements when you need to show order (attributes are not ordered)
100
More “Rules” for Designing Schemas When to use Attributes instead of Elements When to use Attributes instead of Elements use an attribute when the information is inherent to the parent but not a constituent part (height instead of head) use an attribute when the information is inherent to the parent but not a constituent part (height instead of head) use attributes to stress the one-to-one relationship among pieces of information use attributes to stress the one-to-one relationship among pieces of information to stress that the element represents a tuple of information to stress that the element represents a tuple of information dangerous rule, though dangerous rule, though Leads to the extreme formulation that a element can have a TITLE= attribute Leads to the extreme formulation that a element can have a TITLE= attribute And then to the conclusion that it really ought to have a CONTENT= attribute too And then to the conclusion that it really ought to have a CONTENT= attribute too Then you find yourself writing the entire document as an empty element with an attribute value as long as the Quest for the Holy Grail Then you find yourself writing the entire document as an empty element with an attribute value as long as the Quest for the Holy Grail use attributes for simple datatype validation (obviously) use attributes for simple datatype validation (obviously)
101
Schema Notes Fully supported (now) in XML-Spy Fully supported (now) in XML-Spy Unknown if supported in Xalan, but probably Unknown if supported in Xalan, but probably Fully supported in Xerxes DOM & SAX Fully supported in Xerxes DOM & SAX validation works validation works Unknown if supported in JAXM/JAXP Unknown if supported in JAXM/JAXP Java JDK 1.4 does not support schemas Java JDK 1.4 does not support schemas Nice set of schemas at Nice set of schemas at http://www.griphyn.org/working_group s/VDS/
102
Structuring a DTD or document with Entities The previous constructs are enough to create a full DTD. However, if there are many elements, this would lead to huge dtd files. The previous constructs are enough to create a full DTD. However, if there are many elements, this would lead to huge dtd files. Need a facility to Need a facility to Reuse the same description of an element more than once. Reuse the same description of an element more than once. Abbreviate commonly used definitions. Abbreviate commonly used definitions. Import definitions from other files. Import definitions from other files. Import data that could never fit into one file (e.g. binary) Import data that could never fit into one file (e.g. binary) You do this using Entities. An entity is a subunit of an xml document that is given a fixed abbreviation. You do this using Entities. An entity is a subunit of an xml document that is given a fixed abbreviation. An entity declaration declares the abbreviation. An entity declaration declares the abbreviation. An entity reference &entname; %entname; is a use (or reference) of the abbreviation. An entity reference &entname; %entname; is a use (or reference) of the abbreviation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.