Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Querying and Views Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 1, 2005 Some slide content courtesy.

Similar presentations


Presentation on theme: "XML Querying and Views Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 1, 2005 Some slide content courtesy."— Presentation transcript:

1 XML Querying and Views Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 1, 2005 Some slide content courtesy of Susan Davidson, Dan Suciu, & Raghu Ramakrishnan

2 2 Administrivia Homework 4 due 11/3  XQuery Project plan (~2 pp) due 11/3  Milestones  Division of responsibilities  For non-RSS projects: proposal including scope, milestones, and what you plan to demonstrate Review the Shanmugasundaram paper by Tuesday 11/8  Post a 1-page summary to the newsgroup  What problems did they address? Basic techniques? Strengths & weaknesses of their methods?

3 3 XQuery’s Basic Form  Has an analogous form to SQL’s SELECT..FROM..WHERE..GROUP BY..ORDER BY  The model: bind nodes (or node sets) to variables; operate over each legal combination of bindings; produce a set of nodes  “FLWOR” statement [note case sensitivity!]: for {iterators that bind variables} let {collections} where {conditions} order by {order-conditions}(older version was “SORTBY”) return {output constructor}

4 4 “Iterations” in XQuery A series of (possibly nested) FOR statements assigning the results of XPaths to variables for $root in document(“http://my.org/my.xml”)http://my.org/my.xml for $sub in $root/rootElement, $sub2 in $sub/subElement, …  Something like a template that pattern-matches, produces a “binding tuple”  For each of these, we evaluate the WHERE and possibly output the RETURN template  document() or doc() function specifies an input file as a URI  Old version was “document”; now “doc” but it depends on your XQuery implementation

5 5 Example XML Data Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec http://www. attribute root p-i element text

6 6 Two XQuery Examples { for $p in document(“dblp.xml”)/dblp/proceedings, $yr in $p/yr where $yr = “1999” return {$p} } for $i in document(“dblp.xml”)/dblp/inproceedings[author/text() = “John Smith”] return { $i/title/text() } { $i/@key } { $i/crossref }

7 7 Another Example Root ?xml universities name Univ…. attribute root p-i element text university country USA mastersthesis key authortitleyear ms/Brown92 Kurt P…. PRPL… 1999 … … school Univ….

8 8 Nesting in XQuery Nesting XML trees is perhaps the most common operation In XQuery, it’s easy – put a subquery in the return clause where you want things to repeat! for $u in document(“dblp.xml”)/universities where $u/country = “USA” return { $u/title } { for $mt in $u/../mastersthesis where $mt/year/text() = “1999” and ____________ return $mt/title }

9 9 Collections & Aggregation in XQuery In XQuery, many operations return collections  XPaths, sub-XQueries, functions over these, …  The let clause assigns the results to a variable Aggregation simply applies a function over a collection, where the function returns a value (very elegant!) let $allpapers := document(“dblp.xml”)/dblp/article return { fn:count(fn:distinct-values($allpapers/authors)) } {for $paper in doc(“dblp.xml”)/dblp/article let $pauth := $paper/author return {$paper/title} { fn:count($pauth) } }

10 10 Collections, Ctd. Unlike in SQL, we can compose aggregations and create new collections from old: { let $avgItemsSold := fn:avg( for $order in document(“my.xml”)/orders/order let $totalSold = fn:sum($order/item/quantity) return $totalSold) return $avgItemsSold }

11 11 Distinct-ness In XQuery, DISTINCT-ness happens as a function over a collection  But since we have nodes, we can do duplicate removal according to value or node  Can do fn:distinct-values(collection) to remove duplicate values, or fn:distinct-nodes(collection) to remove duplicate nodes for $years in fn:distinct-values(doc(“dblp.xml”)//year/text() return $years

12 12 Sorting in XQuery  SQL actually allows you to sort its output, with a special ORDER BY clause (which we haven’t discussed, but which specifies a sort key list)  XQuery borrows this idea  In XQuery, what we order is the sequence of “result tuples” output by the return clause: for $x in document(“dblp.xml”)/proceedings order by $x/title/text() return $x

13 13 What If Order Doesn’t Matter? By default:  SQL is unordered  XQuery is ordered everywhere!  But unordered queries are much faster to answer XQuery has a way of telling the query engine to avoid preserving order:  unordered { for $x in (mypath) … }

14 14 Querying & Defining Metadata – Can’t Do This in SQL Can get a node’s name by querying node-name(): for $x in document(“dblp.xml”)/dblp/* return node-name($x) Can construct elements and attributes using computed names: for $x in document(“dblp.xml”)/dblp/*, $year in $x/year, $title in $x/title/text(), element node-name($x) { attribute {“year-” + $year} { $title } }

15 15 XQuery Wrap-up  XQuery is very SQL-like, but in some ways cleaner and more orthogonal  It is based on paths and binding tuples, with collections and trees as its first-class objects  See www.w3.org/TR/xquery/ for more details on the languagewww.w3.org/TR/xquery/

16 16 A Problem  We frequently want to reference data in a way that differs from the way it’s stored  XML data  HTML, text, etc.  Relational data  XML data  Relational data  Different relational representation  XML data  Different XML representation  Generally, these can all be thought of as different views over the data  A view is a named query  Let’s start with a special presentation language for XML  HTML

17 17 XSL(T): XML  “Other Stuff”  XSL (XML Stylesheet Language) is actually divided into two parts:  XSL:FO: formatting for XML  XSLT: a special transformation language  We’ll leave XSL:FO for you to read off www.w3.org, if you’re interestedwww.w3.org  XSLT is actually able to convert from XML  HTML, which is how many people do their formatting today  Products like Apache Cocoon generally translate XML  HTML on the server side  Your browser will do XML  HTML on the client side

18 18 A Different Style of Language  XSLT is based on a series of templates that match different parts of an XML document  There’s a policy for what rule or template is applied if more than one matches (it’s not what you’d think!)  XSLT templates can invoke other templates  XSLT templates can be nonterminating (beware!)  XSLT templates are based on XPath “match”es, and we can also apply other templates (potentially to “select”ed XPaths)  Within each template, we describe what should be output  (Matches to text default to outputting it)

19 19 An XSLT Stylesheet This is DBLP …

20 20 Results of XSLT Stylesheet Paper1 Smith Chakrabarti Gray Paper2 This Is DBLP Paper1 Smith Paper2 Chakrabarti Gray

21 21 What XSLT Can and Can’t Do  XSLT is great at converting XML to other formats  XML  diagrams in SVG; HTML; LaTeX  …  XSLT doesn’t do joins (well), it only works on one XML file at a time, and it’s limited in certain respects  It’s not a query language, really  … But it’s a very good formatting language  Most web browsers (post Netscape 4.7x) support XSLT and XSL formatting objects  But most real implementations use XSLT with something like Apache Cocoon – compatible with more browsers  You should use XSL/XSLT to format the forms and pages for your projects – see www.w3.org/TR/xslt or the chapter we handed out for more detailswww.w3.org/TR/xslt

22 22 Other Forms of Views XSLT is a language primarily designed from going from XML  non-XML Obviously, we can do XML  XML in XQuery … Or relations  relations … What about relations  XML and XML  relations? Let’s start with XML  XML, relations  relations

23 23 Views in SQL and XQuery  A view is a named query  We use the name of the view to invoke the query (treating it as if it were the relation it returns) SQL: CREATE VIEW V(A,B,C) AS SELECT A,B,C FROM R WHERE R.A = “123” XQuery: declare function V() as element(content)* { for $r in doc(“R”)/root/tree, $a in $r/a, $b in $r/b, $c in $r/c where $a = “123” return {$a, $b, $c} } SELECT * FROM V, R WHERE V.B = 5 AND V.C = R.C for $v in V()/content, $r in doc(“r”)/root/tree where $v/b = $r/b return $v Using the views:

24 24 What’s Useful about Views Providing security/access control  We can assign users permissions on different views  Can select or project so we only reveal what we want! Can be used as relations in other queries  Allows the user to query things that make more sense Describe transformations from one schema (the base relations) to another (the output of the view)  The basis of converting from XML to relations or vice versa  This will be incredibly useful in data integration, discussed soon… Allow us to define recursive queries

25 25 Materialized vs. Virtual Views  A virtual view is a named query that is actually re-computed every time – it is merged with the referencing query CREATE VIEW V(A,B,C) AS SELECT A,B,C FROM R WHERE R.A = “123”  A materialized view is one that is computed once and its results are stored as a table  Think of this as a cached answer  These are incredibly useful!  Techniques exist for using materialized views to answer other queries  Materialized views are the basis of relating tables in different schemas SELECT * FROM V, R WHERE V.B = 5 AND V.C = R.C

26 26 Views Should Stay Fresh  Views (sometimes called intensional relations) behave, from the perspective of a query language, exactly like base relations (extensional relations)  But there’s an association that should be maintained:  If tuples change in the base relation, they should change in the view (whether it’s materialized or not)  If tuples change in the view, that should reflect in the base relation(s)

27 27 View Maintenance and the View Update Problem  There exist algorithms to incrementally recompute a materialized view when the base relations change  We can try to propagate view changes to the base relations  However, there are lots of views that aren’t easily updatable:  We can ensure views are updatable by enforcing certain constraints (e.g., no aggregation), but this limits the kinds of views we can have AB 12 22 BC 24 23 R S ABC 124 123 224 223 R⋈SR⋈S delete?

28 28 Views as a Bridge between Data Models A claim we’ve made several times: “XML can’t represent anything that can’t be expressed in in the relational model” If this is true, then we must be able to represent XML in relations Store a relational view of XML (or create an XML view of relations)

29 29 A View as a Translation between XML and Relations  You’ll be reviewing the most-cited paper in this area (Shanmugasundaram et al), and there are many more (Fernandez et al., …)  Techniques already making it into commercial systems  XPERANTO at IBM Research will appear in DB2 v9  SQL Server has some XML support; Oracle is also doing some XML  … Next time you’ll see how it works!


Download ppt "XML Querying and Views Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 1, 2005 Some slide content courtesy."

Similar presentations


Ads by Google