Dr. Alexandra I. Cristea XML
2 XML history Inception: circa 1996 The Extensible Markup Language (XML) became a W3C Recommendation 10. February It’s being used currently in very many places – see HESAHESA
3 What is XML? XML stands for EXtensible Markup Language XML was designed to describe data XML is more of a standard and supporting structure than a standalone programming language XML is a markup language much like HTML – wrong!: meta-language
4 How does XML work? XML tags are not predefined. You must define your own tags XML uses a Document Type Definition (DTD) or an XML Schema to describe the data XML with a DTD or XML Schema is designed to be self-descriptive
5 XML is Free and Extensible XML tags are not predefined. You must "invent" your own tags. The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like,, etc.). XHTML is XML but not vice-versa.
6 XML does not DO anything XML was created to structure, store and to send information John Jane Reminder Don't forget the book!
7 Main Difference XML, HTML XML was designed to carry data. XML is not a replacement for HTML. XML and HTML were designed with different goals: –XML was designed to describe data and to focus on what data is. –HTML was designed to display data and to focus on how data looks. HTML is about displaying information, while XML is about describing information. Syntax: XML is well formed, just like XHTML
8 XML is a Complement to HTML XML is not a replacement for HTML. –In Web development XML is used to describe the data, while HTML is used to format and display the same data. XML is a cross-platform, software and hardware independent tool for transmitting information.
9 Benefits XML extensibility and structured nature of XML allows it to be used for communication between different systems from one source of XML-based information you can format and distribute it via a multitude of different channels – XSL files act as templates, allowing a single stylesheet to be used to format multiple pages or the same content for multiple distribution channels
10 XML in Web Development XML is everywhere. the XML standard has been developed quickly and a large number of software vendors have adopted it. XML might be the most common tool for all data manipulation and data transmission.
11 XML Can be Used to Create New Languages XML is the mother of WAP and WML. –WAP: standard for web browser for mobile devices –The Wireless Markup Language (WML), used to markup Internet applications for handheld devices like mobile phones, is written in XML. And many others … search for more as homework
12 Question: When should I use XML? Answer: When you need a buzzword in your resume.
13 Viewing XML to view XML documents hierarchically or view their output, you need an XML parser and processor. there are a number of these tools available: See examples at: Please note, however: XML was not designed to display data.
14 The basic XML flow
XML-based languages RSS Twitter API MathML SVG SOAP WSDL Microsoft Office (pptx, docx, xlsx) Open Office XML SMIL RDF 15
16 XML Rules 1.Every start-tag must have a matching end-tag. 2.Tags cannot overlap. Proper nesting is required. 3.XML documents can only have one root element. 4.Element names must obey the following XML naming conventions: a)Names must start with letters or the "_" character. Names cannot start with numbers of punctuation characters. b)After the first character, numbers and punctuation characters are allowed.
17 XML Rules (cont.) c)Names cannot contain spaces. d)Names should not contain the ":" character as it is a "reserved" character. e)Names cannot start with the letters "xml" in any combination of case. f)The element name must come directly after the "<" without any spaces between them. 5.XML is case sensitive. 6.XML preserves white space within text. 7.Elements may contain attributes. If an attribute is present, it must have a value, even if it is an empty string "".
18 Spot the error! Tove Jani
19 Spot the error! Tove Jani
20 With XML, CR / LF is converted to LF Windows: CR + LF Unix: LF Macintosh: CR
21 There is Nothing Special About XML plain text w XML tags Software that can handle plain text can also handle XML. In an XML-aware application, the XML tags can be handled specially: –Visibility, –Functional meaning, etc.
22 Is this an error? Tove Jani Don't forget me this weekend! Reminder
23 XML Elements have Relationships Elements are related as parents and children. Root element / Parents Children / Siblings
24 Elements An element consists of all the information from the beginning of a start-tag to the end of an end-tag including everything in between. E.g. from (X)HTML, all of the following would be the equivalent of one element, named h1: This is a heading. –Where, is the start tag, is the end tag, and the content is in between. Each XML document has a root element within which all other elements are nested.
25 Examples See at: – urses/CS253/2009/books.xmlhttp:// urses/CS253/2009/books.xml – –Search more by yourself and familiarize yourself with the syntax!
26 XML Attributes XML elements can have attributes. From HTML you will remember this: The SRC attribute provides additional information about the IMG element.
27 Attributes versus Elements Anna Smith female Anna Smith
28 Comments same as in any other languages with line(s) of code whose sole purpose is to provide the developer, and anyone reading the code in the future, information about the code.
29 XML Validation: Well Formed-ness An XML document is well formed, if all the XML rules are obeyed. (with 7 XML rules as defined in slides 16-17)
30 XML declaration Every XML document begins with a declaration (not mandatory, good practice) Or, using optional attributes:
31 Document Type Definition (DTD) which tags and attributes are allowed, where they can be placed, and whether or not they can be nested within a given document.
32 Document Type Declaration (DOCTYPE) Root document element URL to DTD (external subset via a system identifier)
33 Internal vs External DTD declaration Internal: ]> External, public:
34 Valid XML Documents A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD): Tom Jane Reminder Don't forget me this weekend!
35 Validator Also at:
36 Internal DTD <!DOCTYPE note [ ]> Tove Jani Reminder Don't forget me this weekend!
37 External DTD >> saved as file Note.dtd
38 XML Schema (XSD) XML Schema is an XML based alternative to DTD. W3C supports an alternative to DTD called XML Schema:
39 Displaying your XML Files with CSS? It is possible to use CSS to format an XML document. Example: XML file: The CD catalogThe CD catalog style sheet: The CSS fileThe CSS file product: The CD catalog formatted with the CSS fileThe CD catalog formatted with the CSS file Below is a fraction of the XML file. The second line,, links the XML file to the CSS file
40 Displaying XML with XSL XSL is the preferred style sheet language of XML. XSL (the eXtensible Stylesheet Language) is far more sophisticated than CSS. examples: –View the XML file, the XSL style sheet, and View the result.View the XML filethe XSL style sheetView the result
41 XML Conclusions We have learned: –XML history –What it is –How it works –Differences to (X)HTML –XML flow –XML Rules –XML Elements, Relationships, Attributes, Comments –Well-formed-ness concept –XML supporting frame: XML Schema or DTD –Generics on displaying XML
42 Why an XML Editor? XML Schema to define XML structures and data types XSLT to transform XML data SOAP to exchange XML data between applications WSDL to describe web services RDF to describe web resources XPath and XQuery to access XML data SMIL to define graphics Altova's XMLSpy –30 days free trial – –
43 Next: –We look at how to access elements and attributes inside the XML –This can be done via … –XPATH
44 Previously we looked at: –XML Next: –XPath –Namespaces
45 XPath
46 XPath XPath is a syntax for defining parts of an XML document XPath uses path expressions to navigate in XML documents XPath contains a library of standard functions XPath is a major element in XSLT XPath is a W3C recommendation, thus a Standard (16. November 1999 )
47 XPath Path Expressions Uses path expressions to select nodes or node-sets in an XML document. –These path expressions look very much like the expressions you see when you work with a traditional computer file system.
48 XPath Standard Functions over 100 built-in functions. –string values, –numeric values, –date and time comparison, –node and QName manipulation, –sequence manipulation, –Boolean values, –and more.
49 XPath Terminology Nodes Atomic values Items (atomic values or nodes) Relationships of nodes –Parent –Children –Siblings –Ancestors –Descendants
50 XPath Nodes 7 kinds of nodes: 1.element, 2.attribute, 3.text, 4.namespace, 5.processing-instruction, 6.comment, and 7.document (root) nodes. XML documents are treated as trees of nodes. The root of the tree is called the document node (or root node).
51 Nodes Examples Harry Potter J K. Rowling Document (root) nodeElement node Attribute node
52 Atomic values Examples* Harry Potter J K. Rowling *nodes with no children or parent
53 Selecting nodes ExpressionDescription nodenameSelects all child nodes with this name / Selects from the root node // Selects nodes in the document from the current node down that match the selection no matter where they are. Selects the current node.. Selects the parent of the current Selects attributes
54 Examples of selecting nodes Path ExpressionResult bookstoreSelects all the bookstore elements /bookstoreSelects the root element bookstore Note: If the path starts with a slash ( / ) it always represents an absolute path to an element! bookstore/bookSelects all book elements that are children of bookstore //bookSelects all book elements no matter where they are in the document bookstore//bookSelects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element all attributes that are named lang
55 Predicates Predicates are used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
56 Example predicates Path ExpressionResult /bookstore/book[1]Selects the first book element that is the child of the bookstore element /bookstore/book[last()]Selects the last book element that is the child of the bookstore element /bookstore/book[last()-1]Selects the last but one book element that is the child of the bookstore element /bookstore/book[position()<3]Selects the first two book elements that are children of the bookstore element
57 Example predicates – cont. Path ExpressionResult all the title elements that have an attribute named lang all the title elements that have an attribute named lang with a value of 'eng' /bookstore/book[price>35.00] /bookstore/book[price>35.00]/title Selects all the book elements of the bookstore element that have a price element with a value greater than Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00
58 Selecting Unknown Nodes WildcardDescription *Matches any element any attribute node node()Matches any node of any kind
59 Example: selecting several paths Path ExpressionResult //book/title | //book/priceSelects all the title as well as price elements of all book elements //title | //price /bookstore/book/title | //price Selects all the title as well as price elements in the document Selects all the title elements of the book element of the bookstore element as well as all the price elements in the document
60 XPath Axes self childparent ancestordescendant ancestor-or-selfdescendant-or- self preceding-siblingfollowing-sibling precedingfollowing attribute namespace
61 axisname::nodetest[predicate] //DDD/parent::*
62 axisname::nodetest[predicate] //BBB/child::* Note: /AAA is equivalent to /child::AAA
63 More examples Check basics, //, *, predicates, attributes, functions (new ones: count, name, normalize-space, starts- with, contains, string-length, floor, ceiling), axes, operators (mod)predicatesattributescountnamenormalize-spacestarts- withcontainsstring-lengthfloorceilingaxes operators (mod) –Note: The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document. (see example)example
64 XPath Conclusion We have learned: –XPath definition –Path expressions –Standard functions –Terminology –Predicates –Location paths –Axes –Some operators
65 Before we go on, one more thing about XML: XML Namespaces
66 Naming ambiguity
67 The Idea to Solve it Assign a URI (~ URL) to every sub- language: –E.g., for XHTML 1.0: Qnames: Qualify element names with URIs: –{ Web Naming and Addressing Overview (URIs, URLs,...)
68 The actual solution Namespace declarations bind URIs to prefixes: Default namespace (no prefix) declared with: xmlns=“…” Lexical Scope Attribute names can also be prefixed
69 Applying namespaces
70 Next we look at how to query XML This can be done, to some extent, as we have seen, within XSLT, but the main language developed for this purpose is …
71 Previously we looked at: –XPath –Namespaces Next: –XQuery
72 Xquery
73 What is XQuery? XQuery is the language for querying XML data XQuery for XML is like SQL for databases XQuery is built on XPath expressions XQuery is defined by the W3C XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.) XQuery is a W3C recommendation (Jan 2007; latest 14 Dec 2010) thus a standard
74 XQuery - Examples of Use Extract information to use in a Web Service Generate summary reports Transform XML data to XHTML Search Web documents for relevant information
75 XQuery compared to XPath XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and operators. XQuery 1.0 is a strict superset of XPath 2.0 XPath 2.0 expression is directly an XQuery 1.0 expression (a query) The extra expressive power is the ability to: –Join information from different sources and –Generate new XML fragments
76 Xquery ‘compilers’ Download: Or try out at*: Syntax check at:
77 XQuery query makeup Prolog –Like XPath, XQuery expressions are evaluated relatively to a context –explicitly provided by a prolog (header) ~ header with definitions Body –The actual query Generate Join Select
78 XQuery Ex.: Prolog + Query
79 XQuery Prolog (i.e., header(s)) Settings define various parameters for the XQuery processor language, such as: xquery version "1.0"; declare base-uri " declare default element namespace " declare namespace xs= " import module " at "logo.xq"; declare variable $x as xs:integer := 7; declare function addLogo($root as node()) as node()*{ }; (: etc :)
Module definition xquery version “1.0”; module namespace mylib = “ declare variable $mylib:foo as xs:string := “foo”; declare function mylib:foobar() as xs:string { concat ($mylib:foo, “bar”) }; 80
81 Body: Constructors Direct constructors in Xquery: my fragment –Evaluates to the given XML fragment
82 Explicit constructors computed constructors
83 Variable bindings (implicit constructors) {$name} {$job} {$deptno} {$SGMLspecialist }
84 How to Select Nodes with XQuery? Functions –XQuery uses functions to extract data from XML documents. (X)Path Expressions –XQuery uses path expressions to navigate through elements in an XML document. Predicates –XQuery uses predicates to limit the extracted data from XML documents.
85 Functions doc() –function to open a file Example: –doc("books.xml") Note: A call to a function can appear where an expression may appear.
86 Path Expressions Example: select all the title elements in the "books.xml" file: doc("books.xml")/bookstore/book/title
87 Predicates Example: select all the book elements under the bookstore element that have a price element with a value that is less than 30 : doc("books.xml")/bookstore/book[price<30]
88 At a glance: function, path, predicate
89 FLWOR For, Let, Where, Order by, Return = main engine ~ SQL syntax (SFW(GH)O) ~ programs and function calls
90 FLWOR by comparison with Path expressions select all the title elements under the book elements that are under the bookstore element that have a price element with a value that is higher than 30. Path expression : doc("books.xml")/bookstore/book[price>30]/title FLWOR expression : for $x in doc("books.xml")/bookstore/book where $x/price>30 return $x/title
91 Sorting in FLWOR for $x in doc("books.xml")/bookstore/book where $x/price>30 order by $x/title return $x/title
92 Present the Result In an HTML List { for $x in doc("books.xml")/bookstore/book/title order by $x return {$x} }
93 Result HTML List Everyday Italian Harry Potter Learning XML XQuery Kick Start
94 Eliminate element (here: title) { for $x in doc("books.xml")/bookstore/book/title order by $x return {data($x)} (: also text{} :) }
95 New result HTML List Everyday Italian Harry Potter Learning XML XQuery Kick Start
96 Another FLWOR Expression { for $s in doc("students.xml")//student let $m := $s/major where count($m) ge 2 order by return { $s/name/text()} }
97 The Difference between for and let
98 The Difference between for and let := in
99 The Difference between for and let
100 The Difference between for and let
101 FLWOR Basic Building Blocks
102 General rules for and let may be used many times in any order only one where is allowed many different sorting criteria can be specified (descending, ascending, etc.)
103 Reversing order Reverses the order of a sequence, for nodes or atomic values reverse (( 1, 2, 3)) -> 321
104 Joining documents for $p in doc(" for $n in doc("neighbors.xml")//neighbor[ssn = $p/ssn] return { $p/ssn } { $n/name } { $p/income }
105 Two-way join in a where Clause for $item in doc(“ord.xml”)//item, $product in doc(“cat.xml”)//product where = $product/number return <item name=“{$product/name}” />
106 Aggregating Make summary calculations on grouped data Functions: –sum, avg, max, min, count
107 Conditionals for $b in doc(“bib.xml”)/book return {$b/title} {if ( count($b/author) and others ) }
108 Nesting Conditional Expressions Conditional expressions can be nested ‘else if’ functionality is provided if ( count($b/author) = 1 ) then $b/author else if (count($b/author) = 2 )then (:.. :) else ( $b/author[1], and others )
109 Logical Expressions and, or operators: –and has precedence over or –Parentheses can change precedence if ($isDiscounted and ($discount > 5 or $discount < 0 ) ) then 5 else $discount not function for negations: if (not($isDiscounted)) then 0 else $discount
110 XQuery Built-in Functions XQuery function namespace URI is: default prefix: fn:. E.g.: fn:string() or fn:concat(). fn: is the default prefix of the namespace, the function names does not need to be prefixed when called.
111 Built-in Functions String-related –substring, contains, matches, concat, normalize- space, tokenize Date-related –current-date, month-from-date, adjust-time-to- timezone Number-related –round, avg, sum, ceiling Sequence-related –index-of, insert-before, reverse, subsequence, distinct-values
112 Built-in Functions (2) Node-related –data, empty, exists, id, idref Name-related –local-name, in-scope-prefixes, QName, resolve- QName Error handling and trapping –error, trace, exactly-one Document and URI-related –collection, doc, root, base-uri
113 Function calls doc("books.xml")//book[substring(title,1,5)='Harry'] let $name := (substring($booktitle,1,4)) {upper-case($booktitle)}
114 for $x in doc(" stea/courses/CS253/2009/books.xml")// book/title for $y in data($x) for $name in (substring($y,1,4)) return $name
115 User Defined Functions declare function prefix:function_name($parameter AS datatype) AS returnDatatype { (:...function code here... :) };
116 User-defined Functions declare function depth($e AS xsd:integer) AS xsd:integer { if (empty($e/*) then 1 else max(for $c in $e/* return depth($c)) ) +1 }; (: usage :) for $b in doc(“bib.xml”)/book return depth($b)
117 Existential and Universal Quantifiers for $b in doc(“bib.xml”)/book where some $author in $b/author satisfies $author/text() = “Ullman” return $b for $b in doc(“bib.xml”)/book where every $author in $b/author satisfies $author/text() = “Ullman” return $b Return books where all authors are “Ullman” Return books where at least one author is “Ullman”
119 Comparisons Value comparisons Eq, ne, lt, le, gt, ge Used to compare individual values Each operand must be a single atomic value (or a node containing a single atomic value) General comparisons =, !=,, >= Can be used with sequences of multiple items
120 Example
121 XQuery Syntax Declarative, functional language ~ SQL Nested expressions Case sensitive White spaces: –Tabs, space, CR, LF –Ignored between language constructs –Significant in quoted strings No special EOL character
122 Keywords and names Keywords and operators –Case-sensitive, generally lower case –May have several meanings depending on the context E.g. “*” or “in” –No reserved words All names must be valid XML names –variables, functions, elements, attributes –Can be associated with a namespace
123 XQuery gives you a choice: Path Expressions: –If you just want to copy certain elements and attributes as is FLWOR Expressions: –Allow sorting –Allow adding elements/attributes –Verbose, but can be clearer
124 XQuery tools XStylus Studio oad.html (free trial version) oad.html –See also short XQuery intro at:
125 Other info: –XQuery on Distributed Resources –Extensions for generic programming with XML
126 XQuery on Distributed Sources
132 XML and programming XSLT, XPath and XQuery provide tools for specialized tasks. But many applications are not covered: –domain-specific tools for concrete XML languages –general tools that nobody has thought of yet
133 XML in general-purpose programming languages parse XML documents into XML trees navigate through XML trees construct XML trees output XML trees as XML documents DOM and SAX are corresponding APIs that are language independent and supported by numerous languages. JDOM is an API that is tailored to Java.DOMSAXJDOM
134 XQuery Conclusion We have learned: –XQuery definition –Usage scenarios –Comparison w. XSLT and XPath –Capabilities –Functions, path expressions and predicates –FLWOR