Download presentation
Presentation is loading. Please wait.
Published byEthan Sharp Modified over 9 years ago
1
XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH
2
Introduction XML: Extensible Markup Language XML: Extensible Markup Language Defined by the WWW Consortium (W3C) Defined by the WWW Consortium (W3C) Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML Documents have tags giving extra information about sections of the document Documents have tags giving extra information about sections of the document E.g. XML Introduction … E.g. XML Introduction … Extensible, unlike HTML Extensible, unlike HTML Users can add new tags, and separately specify how the tag should be handled for display Users can add new tags, and separately specify how the tag should be handled for display
3
XML Introduction (Cont.) The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents. The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents. Much of the use of XML has been in data exchange applications, not as a replacement for HTML Much of the use of XML has been in data exchange applications, not as a replacement for HTML Tags make data (relatively) self-documenting Tags make data (relatively) self-documenting E.g. E.g. A-101 A-101 Downtown Downtown 500 500 A-101 A-101 Johnson Johnson
4
XML: Motivation Data interchange is critical in today’s networked world Data interchange is critical in today’s networked world Examples: Examples: Banking: funds transfer Banking: funds transfer Order processing (especially inter-company orders) Order processing (especially inter-company orders) Scientific data Scientific data Chemistry: ChemML, … Chemistry: ChemML, … Genetics: BSML (Bio-Sequence Markup Language), … Genetics: BSML (Bio-Sequence Markup Language), … Paper flow of information between organizations is being replaced by electronic flow of information Paper flow of information between organizations is being replaced by electronic flow of information Each application area has its own set of standards for representing information Each application area has its own set of standards for representing information XML has become the basis for all new generation data interchange formats XML has become the basis for all new generation data interchange formats
5
XML Motivation (Cont.) Earlier generation formats were based on plain text with line headers indicating the meaning of fields Earlier generation formats were based on plain text with line headers indicating the meaning of fields Similar in concept to email headers Similar in concept to email headers Does not allow for nested structures, no standard “type” language Does not allow for nested structures, no standard “type” language Tied too closely to low level document structure (lines, spaces, etc) Tied too closely to low level document structure (lines, spaces, etc) Each XML based standard defines what are valid elements, using Each XML based standard defines what are valid elements, using XML type specification languages to specify the syntax XML type specification languages to specify the syntax DTD (Document Type Definition) DTD (Document Type Definition) XML Schema XML Schema Plus textual descriptions of the semantics Plus textual descriptions of the semantics XML allows new tags to be defined as required XML allows new tags to be defined as required However, this may be constrained by DTDs However, this may be constrained by DTDs A wide variety of tools is available for parsing, browsing and querying XML documents/data A wide variety of tools is available for parsing, browsing and querying XML documents/data
6
Comparison with Relational Data Inefficient: tags, which in effect represent schema information, are repeated Inefficient: tags, which in effect represent schema information, are repeated Better than relational tuples as a data-exchange format Better than relational tuples as a data-exchange format Unlike relational tuples, XML data is self-documenting due to presence of tags Unlike relational tuples, XML data is self-documenting due to presence of tags Non-rigid format: tags can be added Non-rigid format: tags can be added Allows nested structures Allows nested structures Wide acceptance, not only in database systems, but also in browsers, tools, and applications Wide acceptance, not only in database systems, but also in browsers, tools, and applications
7
Structure of XML Data Tag: label for a section of data Tag: label for a section of data Element: section of data beginning with and ending with matching Element: section of data beginning with and ending with matching Elements must be properly nested Elements must be properly nested Proper nesting Proper nesting … …. … …. Improper nesting Improper nesting … …. … …. Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. Every document must have a single top-level element Every document must have a single top-level element
8
Example of Nested Elements Hayes Hayes Main Main Harrison Harrison A-102 A-102 Perryridge Perryridge 400 400 …....
9
Motivation for Nesting Nesting of data is useful in data transfer Nesting of data is useful in data transfer Example: elements representing customer_id, customer_name, and address nested within an order element Example: elements representing customer_id, customer_name, and address nested within an order element Nesting is not supported, or discouraged, in relational databases Nesting is not supported, or discouraged, in relational databases With multiple orders, customer name and address are stored redundantly With multiple orders, customer name and address are stored redundantly normalization replaces nested structures in each order by foreign key into table storing customer name and address information normalization replaces nested structures in each order by foreign key into table storing customer name and address information Nesting is supported in object-relational databases Nesting is supported in object-relational databases But nesting is appropriate when transferring data But nesting is appropriate when transferring data External application does not have direct access to data referenced by a foreign key External application does not have direct access to data referenced by a foreign key
10
Structure of XML Data (Cont.) Mixture of text with sub-elements is legal in XML. Mixture of text with sub-elements is legal in XML. Example: Example: This account is seldom used any more. This account is seldom used any more. A-102 A-102 Perryridge Perryridge 400 400 Useful for document markup, but discouraged for data representation Useful for document markup, but discouraged for data representation
11
Attributes Elements can have attributes Elements can have attributes A-102 A-102 Perryridge Perryridge 400 400 Attributes are specified by name=value pairs inside the starting tag of an element Attributes are specified by name=value pairs inside the starting tag of an element An element may have several attributes, but each attribute name can only occur once An element may have several attributes, but each attribute name can only occur once
12
Attributes vs. Subelements Distinction between subelement and attribute Distinction between subelement and attribute In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents In the context of data representation, the difference is unclear and may be confusing In the context of data representation, the difference is unclear and may be confusing Same information can be represented in two ways Same information can be represented in two ways …. …. A-101 … A-101 … Suggestion: use attributes for identifiers of elements, and use subelements for contents Suggestion: use attributes for identifiers of elements, and use subelements for contents
13
Namespaces XML data has to be exchanged between organizations XML data has to be exchanged between organizations Same tag name may have different meaning in different organizations, causing confusion on exchanged documents Same tag name may have different meaning in different organizations, causing confusion on exchanged documents Specifying a unique string as an element name avoids confusion Specifying a unique string as an element name avoids confusion Better solution: use unique-name:element-name Better solution: use unique-name:element-name Avoid using long unique names all over document by using XML Namespaces Avoid using long unique names all over document by using XML Namespaces … …http://www.FirstBank.com Downtown Downtown Brooklyn Brooklyn … …</bank>
14
More on XML Syntax Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below … ]]> … ]]> Here, and are treated as just strings CDATA stands for “character data”
15
XML Document Schema Database schemas constrain what information can be stored, and the data types of stored values Database schemas constrain what information can be stored, and the data types of stored values XML documents are not required to have an associated schema XML documents are not required to have an associated schema However, schemas are very important for XML data exchange However, schemas are very important for XML data exchange Otherwise, a site cannot automatically interpret data received from another site Otherwise, a site cannot automatically interpret data received from another site Two mechanisms for specifying XML schema Two mechanisms for specifying XML schema Document Type Definition (DTD) Document Type Definition (DTD) Widely used Widely used XML Schema XML Schema Newer, increasing use Newer, increasing use
16
Document Type Definition (DTD) The type of an XML document can be specified using a DTD The type of an XML document can be specified using a DTD DTD constraints structure of XML data DTD constraints structure of XML data What elements can occur What elements can occur What attributes can/must an element have What attributes can/must an element have What subelements can/must occur inside each element, and how many times. What subelements can/must occur inside each element, and how many times. DTD does not constrain data types DTD does not constrain data types All values represented as strings in XML All values represented as strings in XML DTD syntax DTD syntax
17
Element Specification in DTD Subelements can be specified as Subelements can be specified as names of elements, or names of elements, or #PCDATA (parsed character data), i.e., character strings #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a subelement) EMPTY (no subelements) or ANY (anything can be a subelement) Example Example Subelement specification may have regular expressions Subelement specification may have regular expressions Notation: Notation: “|” - alternatives “|” - alternatives “+” - 1 or more occurrences “+” - 1 or more occurrences “*” - 0 or more occurrences “*” - 0 or more occurrences
18
Bank DTD <!DOCTYPE bank [ ]>
19
Attribute Specification in DTD Attribute specification : for each attribute Attribute specification : for each attribute Name Name Type of attribute Type of attribute CDATA CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) more on this later more on this later Whether Whether mandatory (#REQUIRED) mandatory (#REQUIRED) has a default value (value), has a default value (value), or neither (#IMPLIED) or neither (#IMPLIED) Examples Examples <!ATTLIST customer <!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED >
20
IDs and IDREFs An element can have at most one attribute of type ID An element can have at most one attribute of type ID The ID attribute value of each element in an XML document must be distinct The ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier Thus the ID attribute value is an object identifier An attribute of type IDREF must contain the ID value of an element in the same document An attribute of type IDREF must contain the ID value of an element in the same document An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document
21
Bank DTD with Attributes Bank DTD with ID and IDREF attribute types. Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[ <!DOCTYPE bank-2[ <!ATTLIST account <!ATTLIST account account_number ID # REQUIRED account_number ID # REQUIRED owners IDREFS # REQUIRED> owners IDREFS # REQUIRED> <!ELEMENT customer(customer_name, customer_street, <!ELEMENT customer(customer_name, customer_street, customer_city)> customer_city)> <!ATTLIST customer <!ATTLIST customer customer_id ID # REQUIRED customer_id ID # REQUIRED accounts IDREFS # REQUIRED> accounts IDREFS # REQUIRED> … declarations for branch, balance, customer_name, customer_street and customer_city ]> … declarations for branch, balance, customer_name, customer_street and customer_city ]>
22
XML data with ID and IDREF attributes <bank-2> Downtown Downtown 500 500 </account> Joe Joe Monroe Monroe Madison Madison </customer> Mary Mary Erin Erin Newark Newark </customer></bank-2>
23
Limitations of DTDs No typing of text elements and attributes No typing of text elements and attributes All values are strings, no integers, reals, etc. All values are strings, no integers, reals, etc. Difficult to specify unordered sets of subelements Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (unlike in the document- layout environment from which XML evolved) Order is usually irrelevant in databases (unlike in the document- layout environment from which XML evolved) (A | B)* allows specification of an unordered set, but (A | B)* allows specification of an unordered set, but Cannot ensure that each of A and B occurs only once Cannot ensure that each of A and B occurs only once IDs and IDREFs are untyped IDs and IDREFs are untyped The owners attribute of an account may contain a reference to another account, which is meaningless The owners attribute of an account may contain a reference to another account, which is meaningless owners attribute should ideally be constrained to refer to customer elements owners attribute should ideally be constrained to refer to customer elements
24
Tree Model of XML Data Query and transformation languages are based on a tree model of XML data Query and transformation languages are based on a tree model of XML data An XML document is modeled as a tree, with nodes corresponding to elements and attributes An XML document is modeled as a tree, with nodes corresponding to elements and attributes Element nodes have child nodes, which can be attributes or subelements Element nodes have child nodes, which can be attributes or subelements Text in an element is modeled as a text node child of the element Text in an element is modeled as a text node child of the element Children of a node are ordered according to their order in the XML document Children of a node are ordered according to their order in the XML document Element and attribute nodes (except for the root node) have a single parent, which is an element node Element and attribute nodes (except for the root node) have a single parent, which is an element node The root node has a single child, which is the root element of the document The root node has a single child, which is the root element of the document Example Example Example
25
XPath XPath is used to address (select) parts of documents using path expressions XPath is used to address (select) parts of documents using path expressions A path expression is a sequence of steps separated by “/” A path expression is a sequence of steps separated by “/” Think of file names in a directory hierarchy Think of file names in a directory hierarchy Result of path expression: set of values that along with their containing elements/attributes match the specified path Result of path expression: set of values that along with their containing elements/attributes match the specified path E.g. /bank/customer/customer_name evaluated on the bank data we saw earlier returns E.g. /bank/customer/customer_name evaluated on the bank data we saw earlier returnsbank data bank data <customer_name>Hayes</customer_name><customer_name>Johnson</customer_name> E.g. /bank/customer/customer_name/text( ) E.g. /bank/customer/customer_name/text( ) returns the same names, but without the enclosing tags returns the same names, but without the enclosing tags
26
XPath (Cont.) The initial “/” denotes root of the document (above the top-level tag) The initial “/” denotes root of the document (above the top-level tag) Path expressions are evaluated left to right Path expressions are evaluated left to right Each step operates on the set of instances produced by the previous step Each step operates on the set of instances produced by the previous step Selection predicates may follow any step in a path, in [ ] Selection predicates may follow any step in a path, in [ ] E.g. /bank/customer/account[balance > 400] E.g. /bank/customer/account[balance > 400] returns account elements with a balance value greater than 400 returns account elements with a balance value greater than 400 /bank/customer/account[balance] returns account elements containing a balance subelement /bank/customer/account[balance] returns account elements containing a balance subelement Attributes are accessed using “@” Attributes are accessed using “@” E.g. /bank/customer/account[balance > 400]/@account_number E.g. /bank/customer/account[balance > 400]/@account_number returns the account numbers of accounts with balance > 400 returns the account numbers of accounts with balance > 400 Here we assume account_number is an attribute Here we assume account_number is an attribute Otherwise /bank/customer/account[balance > 400]/account_number Otherwise /bank/customer/account[balance > 400]/account_number IDREF attributes are not dereferenced automatically (more on this later) IDREF attributes are not dereferenced automatically (more on this later)
27
Functions in XPath XPath provides several functions XPath provides several functions The function count() at the end of a path counts the number of elements in the set generated by the path The function count() at the end of a path counts the number of elements in the set generated by the path E.g. /bank/customer/[count(./account) > 1] E.g. /bank/customer/[count(./account) > 1] Returns customer with > 1 accounts Returns customer with > 1 accounts Also function for testing position (1, 2,..) of node w.r.t. siblings Also function for testing position (1, 2,..) of node w.r.t. siblings Boolean connectives and and or and function not() can be used in predicates Boolean connectives and and or and function not() can be used in predicates IDREFs can be referenced using function id() IDREFs can be referenced using function id() id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks E.g. /bank/customer/account/id(@owner) E.g. /bank/customer/account/id(@owner) returns all customers referred to from the owners attribute of account elements. returns all customers referred to from the owners attribute of account elements.
28
More XPath Example Element AA with two ancestors Element AA with two ancestors /*/*/AA /*/*/AA First BB element of AA element First BB element of AA element /AA/BB[1] /AA/BB[1] All the CC elements of the BB elements which has an sub-element A with value ‘3’ All the CC elements of the BB elements which has an sub-element A with value ‘3’ /BB[A=‘3’]/CC /BB[A=‘3’]/CC Any elements AA or elements CC of elements BB Any elements AA or elements CC of elements BB //AA | /BB/CC //AA | /BB/CC
29
Even More XPath Example Select all sub-elements of elements BB of elements AA Select all sub-elements of elements BB of elements AA /BB/AA/* /BB/AA/* When you do not know the sub-elements When you do not know the sub-elements Different from /BB/AA Different from /BB/AA Select all attributes named ‘aa’ Select all attributes named ‘aa’ //@aa //@aa //@aa Select all CITIES elements with an attribute named aa Select all CITIES elements with an attribute named aa //CITIES[@aa] //CITIES[@aa] Select all CITIES elements with an attribute named aa with value ‘123’ Select all CITIES elements with an attribute named aa with value ‘123’ //CITIES[@aa = ‘123’] //CITIES[@aa = ‘123’]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.