Querying XML Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 6, 2003 Some slide content courtesy of Susan Davidson.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

Chungnam National University DataBase System Lab
XML III. Learning Objectives Formatting XML Documents: Overview Using Cascading Style Sheets to format XML documents Using XSL to format XML documents.
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
XML: Extensible Markup Language
XSLT 11-Apr-17.
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
XSL XSLT and XPath 11-Apr-17.
Normal Forms, XML, and Schemas Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 13, 2005 Some slide content.
Querying XML: XQuery and XSLT Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 21, 2004 Some slide content courtesy.
XML, XPath, and XQuery Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 18, 2005 Some slide content courtesy.
XML: Semistructured Data Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 2, 2003 Some slide content courtesy.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Administrivia  HW3 deadline extended: Tuesday 10/7  Revised upcoming schedule:  Thurs. 10/2: Querying XML  Tues. 10/7: More on XML; beginning of.
XML Querying and Views Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 1, 2005 Some slide content courtesy.
Views: Alternate Data Representations Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 2, 2004 Some slide content.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
XML, Schemas, and XPath Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 14, 2004 Some slide content courtesy.
XML Querying and Views Helena Galhardas DEI IST (slides baseados na disciplina CIS 550 – Database & Information Systems, Univ. Pennsylvania, Zachary Ives)CIS.
Querying XML: XPath, XQuery, and XSLT Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 27, 2005 Some slide content.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML Transformations and Content-based Crawling Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.
XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.
MC 365 – Software Engineering Presented by: John Ristuccia Shawn Posts Ndi Sampson XSLT Introduction BCi.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
4/20/2017.
10/06/041 XSLT: crash course or Programming Language Design Principle XSLT-intro.ppt 10, Jun, 2004.
Sheet 1XML Technology in E-Commerce 2001Lecture 6 XML Technology in E-Commerce Lecture 6 XPointer, XSLT.
XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 7, 2008.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
ECA 228 Internet/Intranet Design I XSLT Example. ECA 228 Internet/Intranet Design I 2 CSS Limitations cannot modify content cannot insert additional text.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
CITA 330 Section 6 XSLT. Transforming XML Documents to XHTML Documents XSLT is an XML dialect which is declared under namespace "
Querying XML – Concluded Introduction to Views Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 9, 2003 Some.
Database Systems Part VII: XML Querying Software School of Hunan University
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
XML Schemas, XPath, and XQuery Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 19, 2004 Some slide content.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
More XML XPATH, XSLT CS 431 – February 23, 2005 Carl Lagoze – Cornell University.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Lecture 23 XQuery 1.0 and XPath 2.0 Data Model. 2 Example 31.7 – User-Defined Function Function to return staff at a given branch. DEFINE FUNCTION staffAtBranch($bNo)
© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania XML (continued) February 10, 2016.
Querying XML, Part II Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 5, 2008.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Schema – XSLT Week 8 Web site:
Kynn Bartlett 11 April 2001 STC San Diego The HTML Writers Guild Copyright © 2001 XML, XHTML, XSLT, and other X-named specifications.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
Querying and Transforming XML Data
Chapter 7 Representing Web Data: XML
XML, XPath, and XQuery Zachary G. Ives October 22, 2007
Querying XML: XPath, XQuery, and XSLT
XML Querying and Views Zachary G. Ives November 1, 2007
Querying XML: XQuery and XSLT
More XML XML schema, XPATH, XSLT
Presentation transcript:

Querying XML Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 6, 2003 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 DTDs Aren’t Enough for Some People DTDs capture grammatical structure, but have some drawbacks:  Not themselves in XML – inconvenient to build tools for them  Don’t capture database datatypes’ domains  IDs aren’t a good implementation of keys  Why not?  No way of defining OO-like inheritance  No way of expressing FDs

3 XML Schema Aims to address the shortcomings of DTDs … But an “everything including the kitchen sink” spec Features:  XML syntax  Better way of defining keys using XPaths  Type subclassing that’s more complex than in a programming language  Programming languages don’t consider order of member variables!  Subclassing “by extension” and “by restriction”  … And, of course, domains and built-in datatypes

4 Simple Schema Example

5 Designing an XML Schema/DTD  XML data design is generally not as formalized as relational data design  We can still use ER diagrams to break into entity, relationship sets  ER diagrams have extensions for “aggregation” – treating smaller diagrams as entities – and for composite attributes  Note that often we already have our data in relations and need to design the XML schema to export them!  We generally orient the XML tree around the “central” objects in a particular application  A big decision: element vs. attribute  Element if it has its own properties, or if you *might* have more than one of them  Attribute if it is a single property – or perhaps not!

6 XML as a Data Model XML is a non-first-normal-form representation  Can represent documents, data  Standard data exchange format  Several competing schema formats – esp., DTD and XML Schema – provide typing information Now: the core ideas in querying XML

7 Querying XML How do you query a directed graph? a tree? The standard approach used by many XML, semistructured-data, and object query languages:  Define some sort of a template describing traversals from the root of the directed graph  In XML, the basis of this template is called an XPath

8 XPaths In its simplest form, an XPath is like a path in a file system: /mypath/subpath/*/morepath  The XPath returns a node set representing the XML nodes (and their subtrees) at the end of the path(s)  XPaths can have node tests at the end, returning only particular node types, e.g., text(), processing-instruction(), comment(), element(), attribute()  XPath is fundamentally an ordered language: it can query in order- aware fashion, and it returns nodes in document order  XPath will often do type coercion in comparing different-typed entities (e.g., nodes with scalars)

9 Sample XML Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC db/labs/dec/SRC html

10 XML Data Model Visualized Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec attribute root p-i element text

11 Some Example XPath Queries  /dblp/mastersthesis/title  /dblp/*/editor  //title  //title/text()

12 Context Nodes and Relative Paths XPath has a notion of a context node: it’s analogous to a current directory  “.” represents this context node  “..” represents the parent node  We can express relative paths: subpath/sub-subpath/../.. gets us back to the context node  By default, the document root is the context node

13 Predicates – Selection Operations A predicate allows us to filter the node set based on selection- like conditions over sub-XPaths: /dblp/article[title = “Paper1”] which is equivalent to: /dblp/article[./title/text() = “Paper1”] because of type coercion. What does this do: = “123” and./title/text() = “Paper1” and./author/*/element()]

14 Axes: More Complex Traversals Thus far, we’ve seen XPath expressions that go down the tree (and up one step)  But we might want to go up, left, right, etc.  These are expressed with so-called axes:  self::path-step  child::path-stepparent::path-step  descendant::path-stepancestor::path-step  descendant-or-self::path-stepancestor-or-self::path-step  preceding-sibling::path-stepfollowing-sibling::path-step  preceding::path-stepfollowing::path-step  The previous XPaths we saw were in “abbreviated form”

15 Querying Order  We saw in the previous slide that we could query for preceding or following siblings or nodes  We can also query a node for its position according to some index:  fn::first(), fn::last()returns index of 0 th & last element matching the last step  fn::position()gives the relative count of the current node child::article[fn::position() = fn::last()]

16 Users of XPath  XML Schema uses simple XPaths in defining keys and uniqueness constraints  XLink and XPointer, hyperlinks for XML  XQuery  XSLT

17 XQuery A strongly-typed, Turing-complete XML manipulation language  Attempts to do static typechecking against XML Schema  Based on an object model derived from Schema Unlike SQL, fully compositional, highly orthogonal:  Inputs & outputs collections (sequences or bags) of XML nodes  Anywhere a particular type of object may be used, may use the results of a query of the same type  Designed mostly by DB and functional language people Attempts to satisfy the needs of data management and document management  The database-style core is mostly complete (even has support for NULLs in XML!!)  The document keyword querying features are still in the works – shows in the order-preserving default model

18 XQuery’s Basic Form  Has an analogous form to SQL’s SELECT..FROM..WHERE..GROUP BY..ORDER BY  The model: bind nodes (or node sets) to variables; operate over each legal combination of bindings; produce a set of nodes  “FLWOR” statement: for {iterators that bind variables} let {collections} where {conditions} order by {order-conditions} return {output constructor}

19 “Iterations” in XQuery A series of (possibly nested) FOR statements assigning the results of XPaths to variables for $root in document(“ for $sub in $root/rootElement, $sub2 in $sub/subElement, …  Something like a template that pattern-matches, produces a “binding tuple”  For each of these, we evaluate the WHERE and possibly output the RETURN template  document() or doc() function specifies an input file as a URI

20 Two XQuery Examples { for $p in document(“dblp.xml”)/dblp/proceedings, $yr in $p/yr where $yr = “1999” return {$p} } for $i in doc (“dblp.xml”)/dblp/inproceedings[author/text() = “John Smith”] return { $i/title/text() } { } { $i/crossref }

21 Nesting in XQuery Nesting XML trees is perhaps the most common operation In XQuery, it’s easy – put a subquery in the return clause where you want things to repeat! for $u in doc(“dblp.xml”)/universities where $u/country = “USA” return { $u/title} { for $mt in $u/../mastersthesis where $mt/year/text() = “1999” return $mt/title }

22 Collections & Aggregation in XQuery  In XQuery, many operations return collections  XPaths, sub-XQueries, functions over these, …  The let clause assigns the results to a variable  Aggregation simply applies a function over a collection, where the function returns a value (very elegant!) let $allpapers := doc(“dblp.xml”)/dblp/article return {fn:count(fn:distinct-values($allpapers/authors)) } {for $paper in doc(“dblp.xml”)/dblp/article let $pauth := $paper/author return {$paper/title} { fn:count($pauth) } }

23 Sorting in XQuery  SQL actually allows you to sort its output, with a special ORDER BY clause (which we haven’t discussed)  XQuery borrows this idea  In XQuery, what we order is the sequence of “result tuples” output by the return clause: for $x in doc(“dblp.xml”)/proceedings order by $x/title/text() return $x

24 What If Order Doesn’t Matter?  By default:  SQL is unordered  XQuery is ordered everywhere!  But unordered queries are much faster to answer  XQuery has a way of telling the DBMS to avoid preserving order:  for $x in fn:unordered(mypath) …  Some of us feel the default is “wrong”…

25 Distinct-ness  XQuery has a notion that DISTINCT-ness happens as a function over a collection  But since we have nodes, we can do duplicate removal according to value or node  Can do fn:distinct-values(collection) to remove duplicate values, or fn:distinct-nodes(collection) to remove duplicate nodes for $years in fn:distinct-values(doc(“dblp.xml”)//year/text() return $years

26 Querying & Defining Metadata – Can’t Do This in SQL  Can get a node’s name by querying node-name(): for $x in document(“dblp.xml”)/dblp/* return node-name($x)  Can construct elements and attributes using computed names: for $x in document(“dblp.xml”)/dblp/*, $year in $x/year, $title in $x/title/text(), element node-name($x) { attribute {“year-” + $year} { $title } }

27 XQuery Summary  Very flexible and powerful language for XML  Clean and orthogonal: can always replace a collection with an expression that creates collections  DB and document-oriented (we hope)  The core is relatively clean and easy to understand  Actually Turing Complete – we’ll talk more about XQuery functions soon

28 XSL(T): The Bridge Back to HTML  XSL (XML Stylesheet Language) is actually divided into two parts:  XSL:FO: formatting for XML  XSLT: a special transformation language  We’ll leave XSL:FO for you to read off if you’re interestedwww.w3.org  XSLT is actually able to convert from XML  HTML, which is how many people do their formatting today  Products like Apache Cocoon generally translate XML  HTML on the server side

29 A Different Style of Language  XSLT is based on a series of templates that match different parts of an XML document  There’s a policy for what rule or template is applied if more than one matches (it’s not what you’d think!)  XSLT templates can invoke other templates  XSLT templates can be nonterminating (beware!)  XSLT templates are based on XPath “match”es, and we can also apply other templates (potentially to “select”ed XPaths)  Within each template, we describe what should be output

30 An XSLT Stylesheet This is DBLP …

31 What XSLT Can and Can’t Do  XSLT is great at converting XML to other formats  XML  diagrams in SVG; HTML; LaTeX  …  XSLT doesn’t do joins (well), it only works on one XML file at a time, and it’s limited in certain respects  It’s not a query language, really  … But it’s a very good formatting language  Most web browsers (post Netscape 4.7x) support XSLT and XSL formatting objects  But most real implementations use XSLT with something like Apache Cocoon  You may want to use XSL/XSLT for your projects – see for the spec

32 Wrapping Up We’ve seen three XML manipulation formalisms today:  XPath: the basic language for “projecting and selecting” (evaluating path expressions and predicates) over XML  XQuery: a statically typed, Turing-complete XML processing language  XSLT: a template-based language for transforming XML documents  Each is extremely useful for certain applications!