Download presentation
Presentation is loading. Please wait.
Published byIris Hollie Bates Modified over 9 years ago
1
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery
2
2 What is XML? Extensible Markup Language Extensible Markup Language Structured markup Structured markup Simplified SGML Simplified SGML Next-generation HTML Next-generation HTML W3C Recommendation (spec) W3C Recommendation (spec) World Wide Web Consortium World Wide Web Consortium
3
3 Family Tree SGML (1985) HTML (1993) XML (1998) GML (1969)
4
4 HTML Example HTML example HTML example HTML example HTML example This is an example of HTML markup codes. This is an example of HTML markup codes. Example Example Example
5
5 HTML and XML HTML: HTML: content and presentation are mixed, structure? content and presentation are mixed, structure? Tags, e.g.,, are fixed and specify presentation Tags, e.g.,, are fixed and specify presentation XML: XML: Content, presentation, and structure are separated Content, presentation, and structure are separated User can define new tags with meaningful annotation User can define new tags with meaningful annotation
6
6 Basic Syntax Starts with XML declaration Starts with XML declaration Rest of document inside the "root element" Rest of document inside the "root element"<TEI.2>…</TEI.2><state> Texas Texas TX TX </state>
7
7 Two Kinds of XML Standalone Standalone Using Document Type Definition (DTD) Using Document Type Definition (DTD) DTD is the meta-data to describe available tags DTD is the meta-data to describe available tags <!DOCTYPE state[ <!DOCTYPE state[ ]> ]>
8
8 HTML is an application of XML Available tags, e.g. are used to describe presentation Available tags, e.g. are used to describe presentation Where is the DTD of HTML? Where is the DTD of HTML?
9
9 Well-formed vs. Valid XML must be well-formed XML must be well-formed correct syntax correct syntax tags match, tags nest, all characters legal tags match, tags nest, all characters legal parser must reject if not well-formed parser must reject if not well-formed XML may be valid with respect to a DTD (Document Type Definition) XML may be valid with respect to a DTD (Document Type Definition) tags are used correctly tags are used correctly tags are all declared tags are all declared attributes are declared attributes are declared
10
10 Validity Checking Checks everything specified in a DTD Checks everything specified in a DTD Can't check text (currency, spelling) Can't check text (currency, spelling) Checks against DTD: this is a valid memo, book, bibliography,... Checks against DTD: this is a valid memo, book, bibliography,...
11
11 XML Syntax The XML declaration The XML declaration Elements Elements Entities Entities Text Text Declarations and Notations Declarations and Notations Processing Instructions Processing Instructions Comments Comments
12
12 The XML Declaration At very beginning of file At very beginning of file Officially optional, but always use it Officially optional, but always use it Can declare version, encoding, standalone Can declare version, encoding, standalone Must be in that order Must be in that order Each is optional Each is optional Must declare other encodings Must declare other encodings
13
13 Elements Basic building block of XML Basic building block of XML Star and end tag Star and end tag Nico Nico Attributes: Attributes: May be abbreviated by: May be abbreviated by: Elements can be arbitrary nested to describe very rich information structure Elements can be arbitrary nested to describe very rich information structure
14
14 Elements and Attributes Attributes can parameterize an element Attributes can parameterize an element Texas Texas TX TX Can be represented by sub-element Can be represented by sub-element<state> Southen Southen Texas Texas TX TX
15
15 Attribute Syntax Name can be any Unicode character, digit, or '.', '-', '_' Name can be any Unicode character, digit, or '.', '-', '_' Cannot repeat: Cannot repeat: same attribute name can not appear more than once in an element same attribute name can not appear more than once in an element Order doesn't matter Order doesn't matter Values must be quoted (single or double) Values must be quoted (single or double) Values may not contain "<" Values may not contain "<" Values may have defaults in DTD Values may have defaults in DTD
16
16 Attributes and Sub-elements A matter of preference A matter of preference Main differences: Main differences: Attribute name can not repeat in the same element Attribute name can not repeat in the same element Sub-element can repreat Sub-element can repreat Attribute values are always string data Attribute values are always string data Sub-elements can have further sub-elements Sub-elements can have further sub-elements
17
17 Special Attributes id has unique identifier for element id has unique identifier for element idref references an id idref references an id Texas TX DAL Dallas
18
18 A unit of text A unit of text Five predefined entities Five predefined entities & (&) '(‘) <( ) "(“) & (&) '(‘) <( ) "(“) Define your own in DTD Define your own in DTD Use numeric character references Use numeric character references € € Entities
19
19 Text Character strings Character strings Use predefined entities (< & …) Use predefined entities (< & …) XML Example: < (>) &(&) <( ) &(&) <(<) CDATA ("character data") section for raw text without using entities CDATA ("character data") section for raw text without using entities
20
20 Declarations Allow validity checking Allow validity checking Optional Optional May be internal (in document), external, or both May be internal (in document), external, or both DTD (Document Type Definition) is all active declarations DTD (Document Type Definition) is all active declarations Use existing DTDs when possible Use existing DTDs when possible
21
21 External DTD Most common Most common Use DOCTYPE declaration before root element Use DOCTYPE declaration before root element Hello, world! Hello, world!
22
22 Internal (standalone) DTD For custom documents For custom documents Also uses DOCTYPE declaration Also uses DOCTYPE declaration ]> Hello, world! ]> Hello, world! Specify in XML declaration Specify in XML declaration
23
23 External plus Internal DTD Usually to declare entities Usually to declare entities Use DOCTYPE declaration before root element Use DOCTYPE declaration before root element ]> Hello, world! ]> Hello, world!
24
24 Element Type Declarations Declare name Declare name Declare allowed content Declare allowed content
25
25 Attribute List Declarations Declare attributes for an element Declare attributes for an element Declare value types Declare value types Declare defaults Declare defaults
26
26 Entity Declarations
27
27 Processing Instructions Instructions to applications Instructions to applications fonts? fonts? security? security? correctness checks? correctness checks? Linking to a style sheet Linking to a style sheet Instructions to indexing robots Instructions to indexing robots
28
28 Comments Like HTML and SGML Like HTML and SGML Anything is OK inside a comment Anything is OK inside a comment & are elements --> declaration goes here --> & are elements --> declaration goes here -->
29
29 What is a DTD? "Document Type Definition" "Document Type Definition" Bunch of XML declarations Bunch of XML declarations Usually external to document Usually external to document Designed for some purpose (use one that matches your needs) Designed for some purpose (use one that matches your needs) Best left to experts Best left to experts
30
30 A Bug Report Document xmltron 1.1 RTE 4.0 1999-11-03 doesn’t work at all none yet
31
31 Make a Document Type ]>... Doctype and root element must match
32
32 Declarations for Elements ]>
33
33 Declaration for Root Element is optional, others required and must be in this order.
34
34 Declarations for Attribures "CDATA" instead of "PCDATA" means it isn't "parsed" for entities
35
35 Declarations for Attributes "CDATA" instead of "PCDATA" means it isn't "parsed" for entities (no markup) "CDATA" instead of "PCDATA" means it isn't "parsed" for entities (no markup) #IMPLIED means optional (value implied by document) #IMPLIED means optional (value implied by document) separate ATTLIST declarations for the same element are OK separate ATTLIST declarations for the same element are OK internal ATTLIST declarations override external internal ATTLIST declarations override external
36
36 documents = contents + style Extensible Stylesheet Language (XSL) Extensible Stylesheet Language (XSL) Specifications still in draft Specifications still in draft But implementations keeping pace But implementations keeping pace
37
37 <PARTS> Computer Parts Computer Parts Motherboard Motherboard ASUS ASUS P3B-F P3B-F 123.00 123.00 Video Card Video Card ATI ATI All-in-Wonder Pro All-in-Wonder Pro 160.00 160.00 Sound Card Sound Card Creative Labs Creative Labs Sound Blaster Live Sound Blaster Live 80.00 80.00 inch Monitor inch Monitor LG Electronics LG Electronics 995E 995E 290.00 290.00 </PARTS> Using a cascading style sheet, we will see Using a cascading style sheet, we will see Using a cascading style sheet, we will see Using a cascading style sheet, we will see
38
38 XPath Used to access part of XML document Used to access part of XML document Compact, non-XML syntax Compact, non-XML syntax Use a pattern expression to identify nodes in an XML document Use a pattern expression to identify nodes in an XML document Have a library of standard functions Have a library of standard functions W3C Standard W3C Standard W3C Standard W3C Standard
39
39 XPath Example Sample XML Sample XML Sample XML Sample XML The root element The root element /STATES /STATES The SCODE of all STATE elements of STATES element The SCODE of all STATE elements of STATES element /STATES/STATE/SCODE /STATES/STATE/SCODE All the CAPTIAL element with a CNAME sub-element of the STATE element of the STATES element All the CAPTIAL element with a CNAME sub-element of the STATE element of the STATES element /STATES/STATE/CAPITAL[CNAME=‘Atlanta’] /STATES/STATE/CAPITAL[CNAME=‘Atlanta’] All CITIES elements in the XML document All CITIES elements in the XML document //CITIES //CITIES
40
40 More XPath Example Element AA with two ancestors Element AA with two ancestors /*/*/AA /*/*/AA First BB element of AA element First BB element of AA element /AA/BB[1] /AA/BB[1] All the CC elements of the BB elements which has an sub-element A with value ‘3’ All the CC elements of the BB elements which has an sub-element A with value ‘3’ /BB[A=‘3’]/CC /BB[A=‘3’]/CC Any elements AA or elements CC of elements BB Any elements AA or elements CC of elements BB //AA | /BB/CC //AA | /BB/CC
41
41 Even More XPath Example Select all sub-elements of elements BB of elements AA Select all sub-elements of elements BB of elements AA /BB/AA/* /BB/AA/* When you do not know the sub-elements When you do not know the sub-elements Different from /BB/AA Different from /BB/AA Select all attributes named ‘aa’ Select all attributes named ‘aa’ //@aa //@aa //@aa Select all CITIES elements with an attribute named aa Select all CITIES elements with an attribute named aa //CITIES[@aa] //CITIES[@aa] Select all CITIES elements with an attribute named aa with value ‘123’ Select all CITIES elements with an attribute named aa with value ‘123’ //CITIES[@aa = ‘123’] //CITIES[@aa = ‘123’]
42
42 Axis Context node Context node Evaluation of XPath is from left to right Evaluation of XPath is from left to right The context node the current node (set) being evaluated The context node the current node (set) being evaluated Axis Axis Specifies the relationship of the resulting nodes relative to context node Specifies the relationship of the resulting nodes relative to context node Example: Example: /child::AA – children of AA, abbreviated by /AA /child::AA – children of AA, abbreviated by /AA //AA/ancestor::BB – BB elements who are ancestor of any AA elements //AA/ancestor::BB – BB elements who are ancestor of any AA elements
43
43 Axes ancestor: //BBB/ancestor::* ancestor: //BBB/ancestor::*
44
44 Axes ancestor: //BBB/ancestor::DDD ancestor: //BBB/ancestor::DDD
45
45 Axes attribute: Contains all attributes of the current node attribute: Contains all attributes of the current node //BBB/attribute::* – abbreviated by //@ //BBB/attribute::* – abbreviated by //@ //BBB/attribute::bb //BBB/attribute::bb
46
46 Axes child child /AAA/DDD/child::BBB – child can be omitted for abbreviation /AAA/DDD/child::BBB – child can be omitted for abbreviation
47
47 Axes descendant descendant /AAA/descendent::* /AAA/descendent::* /AAA/descendent::CCC ? /AAA/descendent::CCC ?
48
48 Axes parent parent //BBB/parent::* //BBB/parent::* //BBB/parent::DDD ? //BBB/parent::DDD ?
49
49 Axes descendant-or-self descendant-or-self following following following-sibling following-sibling preceding: preceding: preceding-sibling preceding-sibling self self
50
50 Predicates Filters a element set Filters a element set A predicate is placed inside square brackets ( [ ] ) A predicate is placed inside square brackets ( [ ] ) Example: // BBB[position() mod 2 = 0 ] Example: // BBB[position() mod 2 = 0 ]
51
51 Predicates //BBB[@aa=’31’] //BBB[@aa=’31’] Is it different from //BBB/attribute::bb? Is it different from //BBB/attribute::bb?
52
52 XQuery XQuery is a general purpose query language for XML data XQuery is a general purpose query language for XML data XQuery uses a for … let … where.. result … syntax for SQL from where SQL where result SQL select let allows temporary variables, and has no equivalent in SQL XQuery uses a for … let … where.. result … syntax for SQL from where SQL where result SQL select let allows temporary variables, and has no equivalent in SQL
53
53 FLWR Syntax in XQuery Simple FLWR expression in XQuery Simple FLWR expression in XQuery find all accounts with balance > 400, with each result enclosed in an.. tag for $x in /bank-2/account let $acctno := $x/@account-number where $x/balance > 400 return $acctno find all accounts with balance > 400, with each result enclosed in an.. tag for $x in /bank-2/account let $acctno := $x/@account-number where $x/balance > 400 return $acctno
54
54 Path Expressions and Functions The function distinct( ) can be used to removed duplicates in path expression results The function distinct( ) can be used to removed duplicates in path expression results The function document(name) returns root of named document The function document(name) returns root of named document E.g. document(“bank-2.xml”)/bank-2/account E.g. document(“bank-2.xml”)/bank-2/account Aggregate functions such as sum( ) and count( ) can be applied to path expression results Aggregate functions such as sum( ) and count( ) can be applied to path expression results
55
55 Joins Joins are specified in a manner very similar to SQL for $a in /bank/account, Joins are specified in a manner very similar to SQL for $a in /bank/account, $c in /bank/customer, $c in /bank/customer, $d in /bank/depositor $d in /bank/depositor where $a/account-number = $d/account- number and $c/customer-name = $d/customer- name where $a/account-number = $d/account- number and $c/customer-name = $d/customer- name return $c $a return $c $a
56
56 The same query can be expressed with the selections specified as XPath selections: The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account-number = $a/account- number and customer-name = $c/customer- name ] for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account-number = $a/account- number and customer-name = $c/customer- name ] return $c $a return $c $a
57
57 Changing Nesting Structure for $c in /bank/customer for $c in /bank/customer return return $c/* $c/* for $d in /bank/depositor[customer-name = $c/customer-name], for $d in /bank/depositor[customer-name = $c/customer-name], $a in /bank/account[account- number=$d/account-number] $a in /bank/account[account- number=$d/account-number] return $a return $a
58
58 XQuery Path Expressions $c/text() gives text content of an element without any subelements/tags $c/text() gives text content of an element without any subelements/tags XQuery path expressions support the “–>” operator for dereferencing IDREFs XQuery path expressions support the “–>” operator for dereferencing IDREFs Equivalent to the id( ) function of XPath, but simpler to use Equivalent to the id( ) function of XPath, but simpler to use Can be applied to a set of IDREFs to get a set of results Can be applied to a set of IDREFs to get a set of results June 2001 version of standard has changed “–>” to “=>” June 2001 version of standard has changed “–>” to “=>”
59
59 Sorting in XQuery Sortby clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer return $c/* sortby(name) Sortby clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer return $c/* sortby(name)
60
60 Can sort at multiple levels of nesting (sort by customer- name, and by account-number within each customer) Can sort at multiple levels of nesting (sort by customer- name, and by account-number within each customer) for $c in /bank/customer return $c/* for $d in /bank/depositor[customer- name=$c/customer-name], $a in /bank/account[account- number=$d/account-number] return $a/* sortby(account-number) sortby(customer-name) for $c in /bank/customer return $c/* for $d in /bank/depositor[customer- name=$c/customer-name], $a in /bank/account[account- number=$d/account-number] return $a/* sortby(account-number) sortby(customer-name)
61
61 Application Program Interface There are two standard application program interfaces to XML data: There are two standard application program interfaces to XML data: SAX (Simple API for XML) SAX (Simple API for XML) Based on parser model, user provides event handlers for parsing events Based on parser model, user provides event handlers for parsing events E.g. start of element, end of element E.g. start of element, end of element Not suitable for database applications Not suitable for database applications DOM (Document Object Model) DOM (Document Object Model) XML data is parsed into a tree representation XML data is parsed into a tree representation Variety of functions provided for traversing the DOM tree Variety of functions provided for traversing the DOM tree E.g.: Java DOM API provides Node class with methods getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … E.g.: Java DOM API provides Node class with methods getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … Also provides functions for updating DOM tree Also provides functions for updating DOM tree
62
62 Storage of XML Data XML data can be stored in XML data can be stored in Non-relational data stores Non-relational data stores Flat files Flat files Natural for storing XML Natural for storing XML But has all problems discussed in Chapter 1 (no concurrency, no recovery, …) But has all problems discussed in Chapter 1 (no concurrency, no recovery, …) XML database XML database Database built specifically for storing XML data, supporting DOM model and declarative querying Database built specifically for storing XML data, supporting DOM model and declarative querying Currently no commercial-grade systems Currently no commercial-grade systems Relational databases Relational databases Data must be translated into relational form Data must be translated into relational form Advantage: mature database systems Advantage: mature database systems Disadvantages: overhead of translating data and queries Disadvantages: overhead of translating data and queries
63
63 Storage of XML in Relational Databases Alternatives: Alternatives: String Representation String Representation Tree Representation Tree Representation Map to relations Map to relations
64
64 String Representation Store each top level element as a string field of a tuple in a relational database Store each top level element as a string field of a tuple in a relational database Use a single relation to store all elements, or Use a single relation to store all elements, or Use a separate relation for each top-level element type Use a separate relation for each top-level element type E.g. account, customer, depositor relations E.g. account, customer, depositor relations Each with a string-valued attribute to store the element Each with a string-valued attribute to store the element Indexing: Indexing: Store values of subelements/attributes to be indexed as extra fields of the relation, and build indices on these fields Store values of subelements/attributes to be indexed as extra fields of the relation, and build indices on these fields E.g. customer-name or account-number E.g. customer-name or account-number Oracle 9 supports function indices which use the result of a function as the key value. Oracle 9 supports function indices which use the result of a function as the key value. The function should return the value of the required subelement/attribute The function should return the value of the required subelement/attribute
65
65 String Representation (Cont.) Benefits: Benefits: Can store any XML data even without DTD Can store any XML data even without DTD As long as there are many top-level elements in a document, strings are small compared to full document As long as there are many top-level elements in a document, strings are small compared to full document Allows fast access to individual elements. Allows fast access to individual elements. Drawback: Need to parse strings to access values inside the elements Drawback: Need to parse strings to access values inside the elements Parsing is slow. Parsing is slow.
66
66 Tree Representation Tree representation: model XML data as tree and store using relations nodes(id, type, label, value) child (child-id, parent-id) Tree representation: model XML data as tree and store using relations nodes(id, type, label, value) child (child-id, parent-id) Each element/attribute is given a unique identifier Each element/attribute is given a unique identifier Type indicates element/attribute Type indicates element/attribute Label specifies the tag name of the element/name of attribute Label specifies the tag name of the element/name of attribute Value is the text value of the element/attribute Value is the text value of the element/attribute The relation child notes the parent-child relationships in the tree The relation child notes the parent-child relationships in the tree Can add an extra attribute to child to record ordering of children Can add an extra attribute to child to record ordering of children bank (id:1) customer (id:2) account (id: 5) customer-name (id: 3) account-number (id: 7)
67
67 Tree Representation (Cont.) Benefit: Can store any XML data, even without DTD Benefit: Can store any XML data, even without DTD Drawbacks: Drawbacks: Data is broken up into too many pieces, increasing space overheads Data is broken up into too many pieces, increasing space overheads Even simple queries require a large number of joins, which can be slow Even simple queries require a large number of joins, which can be slow
68
68 Mapping XML Data to Relations Map to relations Map to relations If DTD of document is known, can map data to relations If DTD of document is known, can map data to relations A relation is created for each element type A relation is created for each element type Elements (of type #PCDATA), and attributes are mapped to attributes of relations Elements (of type #PCDATA), and attributes are mapped to attributes of relations More details on next slide … More details on next slide … Benefits: Benefits: Efficient storage Efficient storage Can translate XML queries into SQL, execute efficiently, and then translate SQL results back to XML Can translate XML queries into SQL, execute efficiently, and then translate SQL results back to XML Drawbacks: need to know DTD, translation overheads still present Drawbacks: need to know DTD, translation overheads still present
69
69 Mapping XML Data to Relations (Cont.) Relation created for each element type contains Relation created for each element type contains An id attribute to store a unique id for each element An id attribute to store a unique id for each element A relation attribute corresponding to each element attribute A relation attribute corresponding to each element attribute A parent-id attribute to keep track of parent element A parent-id attribute to keep track of parent element As in the tree representation As in the tree representation Position information (i th child) can be store too Position information (i th child) can be store too All subelements that occur only once can become relation attributes All subelements that occur only once can become relation attributes For text-valued subelements, store the text as attribute value For text-valued subelements, store the text as attribute value For complex subelements, can store the id of the subelement For complex subelements, can store the id of the subelement Subelements that can occur multiple times represented in a separate table Subelements that can occur multiple times represented in a separate table Similar to handling of multivalued attributes when converting ER diagrams to tables Similar to handling of multivalued attributes when converting ER diagrams to tables
70
70 Mapping XML Data to Relations (Cont.) E.g. For bank-1 DTD with account elements nested within customer elements, create relations E.g. For bank-1 DTD with account elements nested within customer elements, create relationsbank-1 customer(id, parent-id, customer-name, customer- stret, customer-city) customer(id, parent-id, customer-name, customer- stret, customer-city) parent-id can be dropped here since parent is the sole root element parent-id can be dropped here since parent is the sole root element All other attributes were subelements of type #PCDATA, and occur only once All other attributes were subelements of type #PCDATA, and occur only once account (id, parent-id, account-number, branch- name, balance) account (id, parent-id, account-number, branch- name, balance) parent-id keeps track of which customer an account occurs under parent-id keeps track of which customer an account occurs under Same account may be represented many times with different parents Same account may be represented many times with different parents
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.