Dongwon Lee, Ph.D. IST 516 Fall 2011 /*/*/self::* XPath Dongwon Lee, Ph.D. IST 516 Fall 2011
XPath Path-based XML query language V1.0 – 1999: http://www.w3.org/TR/xpath V2.0 – 2003: http://www.w3.org/TR/xpath20/ Functional, strongly-typed query language http://www.w3schools.com/xpath/xpath_intro.asp
Apps of XPath XQuery: a full-blown query language for XML for $x in doc("books.xml")/bookstore/book where $x/price>30 order by $x/title return $x/title XPointer/XLink: a standard way to create hyperlinks in XML <book title="Harry Potter"> <description xlink:type="simple" xlink:href="http://book.com/images/HPotter.gif" xlink:show="new"> As his fifth year at Hogwarts School of Witchcraft and Wizardry approaches, 15-year-old Harry Potter is....... </description> </book>
Apps of XPath XSLT: a style sheet language of XML that can transform XML from one to another format <xsl:stylesheet version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="artist"/></td> </tr> </xsl:for-each> </xsl:template> </xsl:stylesheet>
XPath vs. SQL XPath SQL XML Model Trees Hierarchy Order Relational Model Tables Flat Orderless (except ORDER-BY)
XPath vs. XQuery XPath XQery XML Model Trees Hierarchy Order Can do all XPath does but not vice versa Turing-Complete general purpose PL Can retrieve, update, and transform XML data FLWOR expression
XPath Expression Expression (basic building block) returns one of the 4 objects: node-set (an unordered collection of nodes without duplicates) boolean (true or false) number (a floating-point number) string (a sequence of characters) . . .
processing-instruction XPath Nodes processing-instruction <?xml version="1.0” encoding="UTF-8”?> <note xmlns="http://pike.psu.edu" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=“note.xsd”> <to>Tove</to> <!-- <from>Jani</from> --> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> namespace document comment text Nodes: 7 types element, attribute, text, namespace processing-instruction, comment, document
axis :: node-test [predicate] Location Step Location Steps are evaluated in order from left to right Absolute: /step/step/… Relative: step/step/… Axis: Specifies the node relationship Node Test: specifies node type and name Predicate: Instructions to filter nodes Preferred – Faster to evaluate axis :: node-test [predicate]
1. Axis / selects the root of the node hierarchy Forward Axis <document/> as the default root of XML document Forward Axis child::, descendent::, attribute::, self::, descendent-or-self::, following-sibling::, following:: Backward Axis ancestor::, preceding-sibling::, preceding::, ancestor-or-self:: Relative to the current context (Axis::context) child::emp: “emp” is the child element of current node attribute::date: “date” is the attribute of current node
Node Relationships Parent Ancestors Self Sibling Descendants Child Courses Undergrad Room Instructor Name Office Phone Parent Child Descendants Ancestors Grandchild Graduate Sibling Self
1. Axis Abbreviation Descendent-or-self::node() // child:: / attribute:: @ self::node() . parent:: .. Eg /child::doc/descendent::chapter /doc//chapter //doc/attribute::type //doc/@type
2. Node Test node(): matches all nodes text(): matches all text nodes ElementName: matches all elements of type ‘ElementName’ *: matches all elements @*: matches all attributes
2. Node Test * (wildcard) is often used to match unknown XML elements /catalog/cd/*: all the child elements of all the cd elements of the catalog element /* : all children of the root <document/> /*/*: all grandchildren of the root <document/> //*: all elements of the XML document
3. Predicate Path-expresson[ filtering condition ] Path-expression that satisfies the filtering condition Eg //doc [@type=‘PDF’] finds all <doc> elements whose attribute “type” values are ‘PDF’ This returns <doc> elements, not its attributes “type” Filtering condition does not affect the returned answers (ie, projection) of XPath It just adds more constraints to satisfy
Location Step Examples
Examples of usage
/Courses/*[child::Room=‘110 IST’] IST Example What IST Classes are in Room IST 110? /Courses/*[child::Room=‘110 IST’] Original XML Result <Courses> <Undergrad ID=“IST462”> <Room>110 IST</Room> <Instructor /> <TA>Robert Luo</TA> </Undergrad> <Graduate ID=“IST597”> <Room>210 IST</Room> </Graduate> </Courses>
/Courses/*/TA/parent::* IST Example What IST courses have TA’s? /Courses/*/TA/parent::* Original XML Result <Courses> <Undergrad ID=“IST462”> <Room>110 IST</Room> <Instructor /> <TA>Robert Luo</TA> </Undergrad> <Graduate ID=“IST597”> <Room>210 IST</Room> </Graduate> </Courses>
/Courses/*/Room/text() IST Example What rooms are used by IST courses? /Courses/*/Room/text() Original XML Result <Courses> <Undergrad ID=“IST402”> <Room>110 IST</Room> <Instructor /> <TA>Robert Luo</TA> </Undergrad> <Graduate ID=“IST597”> <Room>210 IST</Room> </Graduate> </Courses> 110 IST 210 IST
NOTE: When used within Predicate, Child::Room == Child::Room/text() Comparison Comparison can be performed using =, !=, <=, <, >=, and > Examples [child::Room != ‘205 IST’] [child::Time > 1220] NOTE: When used within Predicate, Child::Room == Child::Room/text()
Math Operators + : performs addition - : performs subtraction * : performs multiplication div : performs division mod : returns the remainder of division Examples: [child::Time mod 100 = 30]
Node Functions last() : returns the numeric position of the last node in a list position() : returns the numeric position of the current node count() : returns the number of nodes in a list name(): returns the name of a node id() : selects elements by their unique ID
/Courses/*[count(child::*)>2] Node Function Example Which courses have more than 2 child elements? /Courses/*[count(child::*)>2] Original XML Result <Courses> <Undergrad ID=“IST402”> <Room>110 IST</Room> <Instructor /> <TA>Robert Luo</TA> </Undergrad> <Graduate ID=“IST597”> <Room>210 IST</Room> </Graduate> </Courses>
String Functions concat(string, string) : concatenates the string arguments starts-with(string, string) : returns true if the first string starts with the second string contains(string, string) : returns true if the first string contains the second string Eg, concat(‘sh’, ‘oe’) = ‘shoe’ starts-with(‘cat’, ‘ca’) = true contains(‘puppy’, ‘upp’) = true
String Functions substring(string, number, [number]) : returns a substring of the provided string string-length(string) : returns the number of characters in the string Eg, substring(‘chicken’, 3, 4) = ‘icke’ substring(‘chicken’, 3) = ‘icken’ string-length(‘cat’) = 3
String Functions Examples //Book [starts-with(child::Title, “X”)] / price //Book [string-length(Author/FN)=3] / Title <Catalog> <Book> <Title>XML</> <Price>19.9</> <Author> <FN>Joe</> </Author> </Book> <Book> <Title>XSLT</> <Price>22.9</> <FN>HJ</><LN>Kyle</> </Book> </Catalog>
Number Functions sum(node-set) : returns the sum of values for each node in a node set Eg, sum(//@price) floor(number) : returns the largest integer that is not greater than the argument Eg, floor(2.6) = 2 ceiling(number) : returns the smallest integer that is not less than the argument Eg, ceiling(2.6) = 3 round(number) : returns the closest integer to the argument Eg, round (2.4) = 2
Boolean OPs in XPath Conjunction: “and” Disjunction: ““or” //Product[@price>10.8 and @year>2000] Disjunction: ““or” /Customer[@cname=‘Lee’ or @cid>100] Disjunction: “|” Compute both node-sets and return the union //Book | //Tape NOTE: some XPath engines currently support only either “|” or “or” disjunction
XPath Lab [www.zvon.org] /AAA/CCC <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA/DDD/BBB <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA>
XPath Lab [www.zvon.org] //BBB <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA/* <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA>
XPath Lab [www.zvon.org] /AAA/BBB[1] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA/BBB[last()] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA>
XPath Lab [www.zvon.org] /AAA//BBB[1] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA//BBB[last()] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> Position=3 Position =1
Position Explanation “/AAA//BBB” returns two lists: Three <BBB> as the children of <AAA> One <BBB> as the grandchild of <AAA> Then, position like [1] or [2] applies predicate to answers in each list SEPARATELY /AAA//BBB[1] returns both: First <BBB> from the first list -- a child of <AAA> First <BBB> from the second list -- a grandchild of <AAA> /AAA//BBB[last()] however returns nothing last() returns the position of the last node in a list But there are two lists here and can’t pick which
XPath Lab [www.zvon.org] //@id <AAA> <BBB id = "b1"/> <BBB id = "b2"/> <BBB name = "bbb"/> <BBB/> </AAA> //BBB[@id=“b2”] <AAA> <BBB id = "b1"/> <BBB id = "b2"/> <BBB name = "bbb"/> <BBB/> </AAA>
XPath Lab [www.zvon.org] //*[count(BBB)=2] <AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD> <BBB/> <BBB/> </DDD> <EEE> <CCC/> <DDD/> </EEE> </AAA> //*[count(*)=3] <AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD> <BBB/> <BBB/> </DDD> <EEE> <CCC/> <DDD/> </EEE> </AAA>
XPath Evaluation S/W Many S/W have built-in support for XPath 1.0 and 2.0 now Eg, XPath Visualizer: Windows only http://xpathvisualizer.codeplex.com/ XMLSpy: Windows only <oXygen/>: Mac and Windows XMLPad: Windows only
#1. XPath Visualizer Answer #2 for //letter/paragraph Answer #1 for Minor bug here
#2. XMLSpy Choose Evaluate XPath
#2. XMLSpy Answer #1 for //letter/paragraph
#2. XMLSpy Answer #2 for //letter/paragraph
#3. <Oxygen/> Press Enter key Answer #1 for //letter/paragraph
#3. <Oxygen/> Answer #2 for //letter/paragraph
#4 XMLPad
XPath Evaluation in Programming XPath Engines / Libraries Apache Xalan-Java: http://xml.apache.org/xalan-j/ Saxon: http://saxon.sourceforge.net/ Jaxen: http://jaxen.codehaus.org/ PL specific APIs Java: package javax.xml.xpath + DOM PHP: domxml’s xpath_eval() (v4), SimpleXML (v5)
Eg. XPath in JAVA public Node findAddress(String name, Document source) throws Exception { // need to recreate a few helper objects XMLParserLiaison xpathSupport = new XMLParserLiaisonDefault(); XPathProcessor xpathParser = new XPathProcessorImpl(xpathSupport); PrefixResolver prefixResolver = new PrefixResolverDefault(source.getDocumentElement()); // create the XPath and initialize it XPath xp = new XPath(); String xpString = "//address[child::addressee[text() = '” +name+"']]"; xpathParser.initXPath(xp, xpString, prefixResolver); // now execute the XPath select statement XObject list = xp.execute(xpathSupport, source.getDocumentElement(), prefixResolver); return list.nodeset().item(0); } http://www.javaworld.com/javaworld/jw-09-2000/jw-0908-xpath.html?page=3
Eg. SimpleXML in PHP http://www.tuxradar.com/practicalphp/12/3/3 <?php $xml = simplexml_load_file('employees.xml'); echo "<strong>Using direct method...</strong><br />"; $names = $xml->xpath('/employees/employee/name'); foreach($names as $name) { echo "Found $name<br />"; } echo "<br />"; echo "<strong>Using indirect method...</strong><br />"; $employees = $xml->xpath('/employees/employee'); foreach($employees as $employee) { echo "Found {$employee->name}<br />"; } echo "<br />"; echo "<strong>Using wildcard method...</strong><br />"; $names = $xml->xpath('//name'); foreach($names as $name) { echo "Found $name<br />"; } ?> http://www.tuxradar.com/practicalphp/12/3/3
Lab #2 (DUE: Sep. 25 11:55PM) https://online.ist.psu.edu/ist516/labs Tasks: Individual Lab Using an XML files, practice XPath queries Turn-In XPath queries and English interpretation Screenshot of results of XPath queries