Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Schema Languages Dongwon Lee, Ph.D.

Similar presentations


Presentation on theme: "XML Schema Languages Dongwon Lee, Ph.D."— Presentation transcript:

1 XML Schema Languages Dongwon Lee, Ph.D.
The Pennsylvania State University IST 516 / Fall 2010

2 In the Last Class

3 Today DUE: Proj #1 Planning Doc Assignment: Lab #1 Team Based
Individual

4 Course Objectives Understand the purpose of using schemas
Study regular expression as the framework to formalize schema languages Understand details of DTD and XML Schema Learn the concept of schema validation

5 Outline Schema Language in general Content Model DTD XML Schema (XSD)
Schema Validation

6 Motivation The company “Nittany Vacations, LLC” wants to standardize all internal documents using the XML-based format  nvML Gather requirements from all employees Informal description (ie, narrative) of how nvML should look like Q How to describe formally and unambiguously? How to validate an nvML document?

7 Motivation: Schema = Rules
XML Schemas is all about expressing rules: Rules about what data is allowed Rules about how the data must be organized Rules about the relationships between data

8 Example modified from Roger L. Costello’s slides @ xfront.com
Motivation: sep-9.xml <Vacation date=“ ” guide-by=“Lee”> <Trip segment="1" mode="air"> <Transportation>airplane<Transportation> </Trip> <Trip segment="2" mode="water"> <Transportation>boat</Transportation> <Trip segment="3" mode="ground"> <Transportation>car</Transportation> </Vacation> Example modified from Roger L. Costello’s xfront.com

9 Motivation: Validate Validate the XML document against the XML Schema
<Vacation date=“ ” guide-by=“Lee”> <Segment id="1" mode="air"> <Transportation>airplane</Transportation> </Segment> <Segment id="2" mode="water"> <Transportation>boat</Transportation> <Segment id="3" mode="ground"> <Transportation>car</Transportation> </Vacation> Validate the XML document against the XML Schema nvML.dtd or nvML.xsd XML Schema = RULES Rule 1: A vacation has segments. Rule 2: Each segment is uniquely identified. Rule 3: There are three modes of transportation: air, water, gound. Rule 4: Each segment has a mode of transportation. Rule 5: Each segment must identify the specific mode used.

10 Schema Languages Schema: a formal description of structures / constraints Eg, relational schema describes tables, attributes, keys, .. Schema Language: a formal language to describe schemas Eg, SQL DDL for relational model CREATE TABLE employees ( id INTEGER PRIMARY KEY, first_name CHAR(50) NULL, last_name CHAR(75) NOT NULL, dateofbirth DATE NULL );

11 Rules in Formal Schema Lang.
Why bother formalizing the syntax with a schema? A formal definition provides a precise but human-readable reference Schema processing can be done with existing implementations One’s own tools for own language can benefit: by piping input documents through a schema processor, one can assume that the input is valid and defaults have been inserted

12 Schema Processing

13 Requirements for Schema Lang.
Expressiveness Efficiency Comprehensibility

14 Regular Expressions (RE)
Commonly used to describe sequences of characters or elements in schema languages RE to capture content models Σ: a finite Alphabet α in Σ: set only containing the character α α ?: matches zero or one α α *: matches zero or more α’s α +: matches one ore more α’s α β: matches the concatenation of α and β α | β: matches the union of α and β

15 RE Examples a|b* denotes {ε, a, b, bb, bbb, ...}
(a|b)* denotes the set of all strings with no symbols other than a and b, including the empty string: {ε, a, b, aa, ab, ba, bb, aaa, ...} ab*(c|ε) denotes the set of strings starting with a, then zero or more bs and finally optionally a c: {a, ac, ab, abc, abb, abbc, ...}

16 RE Examples Valid integers: Valid contents of table element in XHTML:
0 | -? (1|2|3|4|5|6|7|8|9) (1|2|3|4|5|6|7|8|9) * Valid contents of table element in XHTML: caption ? (col * | colgroups *) thead ? tfoot ? (tbody * | tr *)

17 Which Schema Language? Many proposals competing for acceptance
W3C Proposals: DTD, XML Data, DCD, DDML, SOX, XML-Schema, … Non-W3C Proposals: Assertion Grammars, Schematron, DSD, TREX, RELAX, XDuce, RELAX-NG, … Different applications have different needs from a schema language

18 By Rick Jelliffe

19 Expressive Power (content model)
DTD XML-Schema XDuce, RELAX-NG “Taxonomy of XML Schema Languages using Formal Language Theory”, Makoto Murata, Dongwon Lee, Murali Mani, Kohsuke Kawaguchi, In ACM Trans. on Internet Technology (TOIT), Vol. 5, No. 4, page 1-45, November 2005

20 Closure (content model)
DTD XML-Schema XDuce, RELAX-NG Closed under INTERSECT Closed under INTERSECT, UNION, DIFFERENCE

21 DTD: Document Type Definition
XML DTD is a subset of SGML DTD XML DTD is the standard XML Schema Language of the past (and present maybe…) It is one of the simplest and least expressive schema languages proposed for XML model It does not use XML tag notation, but use its own weird notation It cannot express relatively complex constraint (eg, key with scope) well It is being replaced by XML-Schema of W3C and RELAX-NG of OASIS

22 DTD: Elements <!ELEMENT element-name content-model>
Associates a content model to all elements of the given name content models EMPTY: no content is allowed ANY: any content is allowed Mixed content: (#PCDATA | e1 | … | en)* arbitrary sequence of character data and listed elements

23 DTD: Elements Eg: “Name” element consists of an
optional FirstName, followed by mandatory LastName elements, where Both are text string <!ELEMENT Name (FirstName? , LastName) <!ELEMENT FirstName (#PCDATA)> <!ELEMENT LastName (#PCDATA)

24 DTD: Attributes <!ATTLIST element-name attr-name attr-type attr-default ...> Declares which attributes are allowed or required in which elements attribute types: CDATA: any value is allowed (the default) (value|...): enumeration of allowed values ID, IDREF, IDREFS: ID attribute values must be unique (contain "element identity"), IDREF attribute values must match some ID (reference to an element) ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: consider them obsolete…

25 DTD: Attributes Attribute defaults:
#REQUIRED: the attribute must be explicitly provided #IMPLIED: attribute is optional, no default provided "value": if not explicitly provided, this value inserted by default #FIXED "value": as above, but only this value is allowed

26 DTD: Attributes Eg: “Name” element consists of an
optional FirstName, followed by mandatory LastName attributes, where Both are text string <!ELEMENT Name (EMPTY)> <!ATTLIST Name FirstName CDATA #IMPLIED LastName CDATA #REQUIRED>

27 DTD: Attributes ID vs. IDREF/IDREFS
ID: document-wide unique ID (like key in DB) IDREF: referring attribute (like foreign key in DB) <!ELEMENT employee (…)> <!ATTLIST employee eID ID #REQUIRED boss IDREF #IMPLIED> <employee eID=“a”>…</>…. <employee eID=“b” boss=“a”>…</>

28 DTD Example <?xml version="1.0"?>
XML document that conforms to “event.dtd” <?xml version="1.0"?> <!DOCTYPE event SYSTEM “../../dir/event.dtd” <event eID=“sigmod02”> <acronym>SIGMOD</acronym> <society>ACM</society> <url> <loc> <city>Madison</city> <state>WI</state> </loc> <year>2002</year> </event>

29 DTD: Example // event.dtd
<!ELEMENT event (acronym, society*,url?, loc, year)> <!ATTLIST event eID ID #REQUIRED> <!ELEMENT acronym (#PCDATA)> <!ELEMENT society (#PCDATA)> <!ELEMENT url (#PCDATA)> <!ELEMENT loc (city, state)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT year (#PCDATA)>

30 DTD: Type Declaration <?xml version="1.0"?>
Associate a DTD schema with an XML document At the beginning lines of an XML document // DTD File: event.dtd <!ELEMENT …. > // XMLFile: event.xml <?xml version="1.0"?> <!DOCTYPE event SYSTEM “

31 Exercise: Citation Authors Papers Publish *AID Name Title *PID Area
Venues Attend Appear *Vname Year City 31

32 Exercise: Citation To RDBMS: Authors(*AID, Name, Title)
Papers(*PID, Area, *Vname) Venues(*Vname, Year, City) Publish(*AID, *PID) Attend(*AID, *Pname) Appear(*PID, *Vname) 32

33 Exercise: Citation Relational  XML: Create a dummy root element
Make entities as 1st-level children Make columns of entities as attributes of those Relationship as attributes No violation of 1st normal form for many-many relationship like RDBMS One => IDREF, Many => IDREFS 33

34 Exercise: Citation “Lee” publishes two papers “p1” and “p2” which appear in venues “X” and “Y” in 2006, respectively, and attend only “Y”. “p2” is co-authored by “John” who attends “X”. <Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’/> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’/> <Paper PID=‘p1’ Area=‘DB’/> <Paper PID=‘p2’ Area=‘DB’/> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> 34

35 Exercise: Citation “Lee” publishes two papers “p1” and “p2” which appear in venues “X” and “Y” in 2006, respectively, and attend only “Y”. “p2” is co-authored by “John” who attends “X”. <Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’/> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’/> <Paper PID=‘p1’ Area=‘DB’ Vname=‘X’ /> <Paper PID=‘p2’ Area=‘DB’ Vname=‘Y’ /> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> 35

36 Exercise: Citation “Lee” publishes two papers “p1” and “p2” which appear in venues “X” and “Y” in 2006, respectively, and attend only “Y”. “p2” is co-authored by “John” who attends “X”. <Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’ Publish=‘p1 p2’ Attend=‘Y’ /> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’ Publish=‘p2’ Attend=‘X’ /> <Paper PID=‘p1’ Area=‘DB’ Vname=‘X’ /> <Paper PID=‘p2’ Area=‘DB’ Vname=‘Y’ /> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> 36

37 Exercise: Citation <Dummy> </Dummy>
<Author AID=‘1’ Name=‘Lee’ Title=‘Prof.’ Publish=‘p1 p2’ Attend=‘Y’ /> <Author AID=‘2’ Name=‘John’ Title=‘Prof.’ Publish=‘p2’ Attend=‘X’ /> <Paper PID=‘p1’ Area=‘DB’ Vname=‘X’ /> <Paper PID=‘p2’ Area=‘DB’ Vname=‘Y’ /> <Venue Vname=‘X’ Year=‘2006’ … /> <Venue Vname=‘Y’ Year=‘2006’ … /> </Dummy> 37

38 Exercise: Citation <!ELEMENT Dummy (Author*|Paper*|Venue*)>
<!ELEMENT Author EMPTY> <!ATTLIST Author AID ID #REQUIRED Name CDATA #IMPLIED Title CDATA #IMPLIED Publish IDREFS #IMPLIED Attend IDREFS #IMPLIED> <!ELEMENT Paper EMPTY> <!ATTLIST Paper PID ID #REQUIRED Area CDATA #IMPLIED Vname IDREF #REQUIRED> <!ELEMENT Venue EMPTY> <!ATTLIST Venue Vname ID #REQUIRED Year CDATA #IMPLIED City CDATA #IMPLIED> 38

39 XML Schema New XML schema language from W3C Successor of DTD
Unlike DTD, XML Schema is in XML syntax <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

40 XML Schema vs. DTD: What’s New
XML Schemas are extensible to future additions XML Schema V 1.0  1.1  … XML Schemas are richer and more powerful than DTDs XML Schemas are written in XML No <!ELEMENT …> or <!ATTLIST ..> notation XML Schemas support data types XML Schemas support namespaces

41 New: Data Types XML Schema support data types. Easier to:
Describe allowable document content Validate the correctness of data Work with data from a database Define data facets (restrictions on data) Define data patterns (data formats) Convert data between different data types Eg, <date type="date"> </date> Ensures a mutual understanding of the content The XML data type "date" requires the format “YYYY-MM-DD”

42 New: in XML Notation XML Schema uses XML notation
<> and </> XML Schema file itself IS an XML file, too No need to learn a new language No need to use new tools Use an XML editor to edit XML Schema files Use XML parser to parse XML Schema files Manipulate an XML Schema using DOM Transform an XML Schema with XSLT

43 New: Extensibility XML Schema is extensible because XML is extensible
XML Schema lets you: Reuse your schema in other schemas Create your own data types derived from the standard types  Inheritance Reference multiple schemas in the same document

44 Well-Formed: Not Enough
Well-Formed: a document conforms to XML syntax rules such as: Begin with XML decl. One unique root Case-sensitive Matching Start / End tags Properly nested Well-formed documents can still contain semantic errors or inconsistencies  Need VALID documents according to schema

45 note.xml <?xml version="1.0"?> // Reference to schema goes here
<to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

46 note.dtd <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

47 note.xml with Reference to DTD
<?xml version="1.0"?> <!DOCTYPE note SYSTEM " <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

48 note.xsd <?xml version="1.0"?> <xs:schema xmlns:xs= “ targetNamespace= “ xmlns= “ elementFormDefault= "qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

49 <schema> element
<?xml version="1.0"?> <xs:schema xmlns:xs = “ targetNamespace = “ xmlns = “ elementFormDefault= "qualified"> </xs:schema> <schema> element is the root element of every XML Schema

50 <schema> element
<?xml version="1.0"?> <xs:schema xmlns:xs = “ targetNamespace = “ xmlns = “ elementFormDefault= "qualified"> </xs:schema> Elements & data types in this schema file come from namespace They are to be prefixed with “xs:”

51 <schema> element
<?xml version="1.0"?> <xs:schema xmlns:xs = “ targetNamespace = “ xmlns = “ elementFormDefault= "qualified"> </xs:schema> Indicates that the elements defined by this schema (eg, note, to, from, heading, body.) come from the target namespace

52 <schema> element
<?xml version="1.0"?> <xs:schema xmlns:xs = “ targetNamespace = “ xmlns = “ elementFormDefault= "qualified"> </xs:schema> Default namespace

53 <schema> element
<?xml version="1.0"?> <xs:schema xmlns:xs = “ targetNamespace = “ xmlns = “ elementFormDefault= "qualified"> </xs:schema> Any elements used by the XML instance document which were declared in this schema must be namespace qualified.

54 note.xml with Reference to XML Schema
<?xml version="1.0"?> <note xmlns=" xmlns:xsi=" xsi:schemaLocation=" note.xsd”> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

55 note.xml with Reference to XML Schema
<?xml version="1.0"?> <note xmlns=" xmlns:xsi=" xsi:schemaLocation=" note.xsd”> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Default namespace for the “note.xml” file Tell schema validator that all the elements used in “note.xml” file are declared in this namespace

56 note.xml with Reference to XML Schema
<?xml version="1.0"?> <note xmlns=" xmlns:xsi=" xsi:schemaLocation=" note.xsd”> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Once the XML Schema Instance namespace is available  can use schemaLocation attribute

57 note.xml with Reference to XML Schema
<?xml version="1.0"?> <note xmlns=" xmlns:xsi=" xsi:schemaLocation=" note.xsd”> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> First value: the namespace to use Second value: the name/location of the XML schema to use for that namespace

58 Main Features XML Schema defines elements Simple elements:
contains only “text” No sub-elements or attributes “text” can be of different types Types from XML schema built-in Eg, boolean, string, date User-defined types Can add restrictions (facets) to a data type to limit its content

59 Simple Element Common built-in types in XML Schema:
<xs:element name="xxx" type="yyy"/> “xxx”: the name of the element “yyy”: the data type of the element Common built-in types in XML Schema: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time

60 Simple Element Some simple XML elements:
<lastname>Refsnes</lastname> <age>36</age> <dateborn> </dateborn> Corresponding simple element definitions: <xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/>

61 Simple Element Simple elements may have a default value OR a fixed value specified. Default value is automatically assigned to the element when no other value is specified <xs:element name="color" type="xs:string" default="red"/> Fixed value is also automatically assigned to the element, and one cannot specify another value <xs:element name="color" type="xs:string" fixed="red"/>

62 <xs:attribute>
The syntax for defining an attribute is: <xs:attribute name="xxx" type="yyy"/> Where xxx is the name of the attribute and yyy specifies the data type of the attribute. Simple elements can’t have attributes!

63 <xs:attribute>
An XML element with an attribute: <lastname lang="EN">Smith</lastname> Corresponding attribute definition: <xs:attribute name="lang" type="xs:string"/> Attributes can have default or fixed values. If the attribute is required, add use=“required”

64 Conforming to Types When an XML element or attribute has a data type defined, it puts restrictions on the element's or attribute's content If an XML element is of type "xs:date" and contains a string like "Hello World", the element will not validate With XML Schemas, you can also add your own restrictions to your XML elements and attributes

65 Constraining User-Defined Types
Defines an element called "age" with a restriction The value of age cannot be lower than 0 or greater than 120 <xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> </xs:restriction> </xs:simpleType> </xs:element>

66 Constraining User-Defined Types
Defines an element called "car" with a restriction The only acceptable values are: Audi, Golf, BMW: <xs:element name="car" type="carType"/> <xs:simpleType name="carType"> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> Note: In this case the type "carType" can be used by other elements because it is not a part of the "car" element.

67 Complex Element What is a Complex Element?
A complex element is an XML element that contains other elements and/or attributes. There are four kinds of complex elements: empty elements elements that contain only other elements elements that contain only text elements that contain both other elements and text Note: Each of these elements may contain attributes as well!

68 Complex Element: Type 1 A complex XML element, "product", which is empty: <product pid="1345"/>

69 Complex Element: Type 2 A complex XML element, "employee", which contains only other elements: <employee> <firstname>John</firstname> <lastname>Smith</lastname> </employee>

70 Complex Element: Type 3 A complex XML element, "food", which contains only text: <food type="dessert">Ice cream</food>

71 Complex Element: Type 4 A complex XML element, "description", which contains both elements and text: <description> It happened on <date lang="norwegian"> </date> .... </description>

72 Eg, Define a Complex Element
Type 2: element with only sub-elements <employee> <firstname>John</firstname> <lastname>Smith</lastname> </employee>

73 Eg, Define a Complex Element
Method 1: no re-use foreseen <xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname“ </xs:sequence> </xs:complexType> </xs:element>

74 Eg, Define a Complex Element
Method 2: can reuse “myInfo” type <xs:element name="employee” type=“myInfo”> <xs:complexType name=“myInfo”> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname“ </xs:sequence> </xs:complexType> </xs:element>

75 Eg, Define a Complex Element
Method 2: 3 elements can reuse “myInfo” type <xs:element name="employee" type="myInfo"/> <xs:element name="student" type="myInfo"/> <xs:element name="member" type="myInfo"/> <xs:complexType name="myInfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>

76 Indicators Order Occurrence Group All: in any order, only once
Choice: either A or B occur Sequence: appear in a specific order Occurrence maxOccurs minOccurs Group Group Name attributeGroup Name

77 Indicators Example <?xml version="1.0" encoding="ISO "?> <persons xmlns:xsi=" xsi:noNamespaceSchemaLocation="family.xsd” <person> <full_name>Hege Refsnes</full_name> <child_name>Cecilie</child_name> </person> <person> <full_name>Tove Refsnes</full_name> <child_name>Hege</child_name> <child_name>Stale</child_name> <child_name>Jim</child_name> <child_name>Borge</child_name> </person> <person> <full_name>Stale Refsnes</full_name> </person> </persons>

78 Indicators Example <?xml version="1.0" encoding="ISO "?> <xs:schema xmlns:xs=" elementFormDefault="qualified"> <xs:element name="persons">   <xs:complexType>     <xs:sequence>       <xs:element name="person" maxOccurs="unbounded">         <xs:complexType>           <xs:sequence>             <xs:element name="full_name" type="xs:string"/>             <xs:element name="child_name" type="xs:string"             minOccurs="0" maxOccurs="5"/>           </xs:sequence>         </xs:complexType>       </xs:element>     </xs:sequence>   </xs:complexType> </xs:element> </xs:schema>

79 DTD vs. XML Schema <!ELEMENT e1 ((e2,e3?)+|e4)>
<element name=“e1”> <complexType> <choice> <sequence maxOccurs=“unbounded”> <element ref=“e2”/> <element ref=“e3” minOccurs=“0”/> </sequence> <element ref=“e4”> </choice> </complexType> </element>

80 Schema Validation

81 Schema Validation: DTD

82 Schema Validation: XML Schema

83 Schema Editors Many open-source or shareware on the Web Eg, XMLSpy
Visual schema writing (DTD, XML Schema) And More Xpath Xquery XSLT WSDL SOAP

84 Your Project #1 If your project is to use XML data at some point
Then you should define your schema using either DTD or XML Schema And validate all your XML data according to your schema Include this aspect as part of your project #1 tasks

85 Lab #1 (DUE: Sep. 15) Individual Lab
Given XML files, infer DTD and XML Schema Validate them using W3C’s schema validator Accessibly from the Web Submit DTD and XML Schema files Screenshots showing validation succeeded

86 References An Introduction to XML and Web Technologies, Anders Møller and Michael I. Schwartzbach, Addison-Wesley, 2006 W3Schools XML Schema Tutorial Much of slides for the XML Schema (2nd half) are modified from W3Schools materials Zvon XML Schema Tutorial


Download ppt "XML Schema Languages Dongwon Lee, Ph.D."

Similar presentations


Ads by Google