1 CSE544 XML, XQuery Wednesday, April 5, 2006. 2 Announcements Project Ideas are posted (to discuss in class) Groups due today Proposals due Monday Two.

Slides:

Advertisements

Similar presentations

Spring Part III: Introduction to XPath XML Path Language.

Advertisements

Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !

XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.

XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.

1 Lecture 6 XML/Xpath/XQuery Wednesday, November 3, 2010 Dan Suciu -- CSEP544 Fall 2010.

1 Part 3: Query Languages Managing XML and Semistructured Data.

Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.

CSE 636 Data Integration XML Semistructured Data Document Type Definitions.

Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?

1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.

1 Lecture 12: XQuery in SQL Server Monday, October 23, 2006.

1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.

1 Managing XML and Semistructured Data Part 1: Preliminaries, Motivation and Overview Acknowledgement: Part of the materials in this set of XML slides.

Query Languages - XQuery Slides partially from Dan Suciu.

Managing XML and Semistructured Data

Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.

XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.

1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.

1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.

1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Lecture #6 XML November 2 nd, Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.

1 Lecture 4 XML/Xpath/XQuery Tuesday, January 30, 2007.

End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =

1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.

1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.

Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:

1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.

Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.

1 Lecture 12: XML Publishing, XML Storage Monday, October 24, 2005.

Introduction to XQuery Resources: Official URL: Short intros:

1 XQuery Slides From Dr. Suciu. 2 FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...

XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.

XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.

Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.

End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...

Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.

Lecture 6: XML Query Languages Thursday, January 18, 2001.

Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)

CSE 636 Data Integration Fall 2006 XML Query Languages XPath.

XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.

More XML: semantics, DTDs, XPATH February 18, 2004.

Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.

1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.

1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.

1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004.

XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.

Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.

1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.

SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.

1 CSE544 XML, XQuery Monday+Wednesday, April 11+13, 2004.

XML path expressions CSE 350 Fall 2003.

Lecture 11: Xpath/XQuery

Lecture 15: Midterm Review

Managing XML and Semistructured Data

Lecture 11 XML Wednesday, Oct. 24, 2001.

XML: Schemas, Queries Wednesday, 4/17/2002

Lecture 12: XML, XPath, XQuery

Introduction to Database Systems CSE 444 Lecture 12 More Xquery and Xquery in SQL Server April 25, 2008.

Lecture 9: XML Monday, October 17, 2005.

Lecture 8: XML Data Wednesday, October

Xquery Slides From Dr. Suciu.

Introduction to Database Systems CSE 444 Lecture 10 XML

Lecture 15: Querying XML Friday, October 27, 2000.

Lecture 12: XQuery in SQL Server

Introduction to Database Systems CSE 444 Lecture 12 Xquery in SQL Server October 22, 2007.

Lecture 11: XML and Semistructured Data

Lecture 14: XML Publishing & Storage Midterm Review

Lecture 13: XQuery XML Publishing, XML Storage

Presentation transcript:

1 CSE544 XML, XQuery Wednesday, April 5, 2006

2 Announcements Project Ideas are posted (to discuss in class) Groups due today Proposals due Monday Two paper reviews are due on Monday Please read about E/R diagrams and OQL somewhere (e.g. in the textbook)

3 Outline XML: syntax, semantics, data, DTDs XPath Xquery Storage Chamberlin: XQuery: An XML query language. IBM Systems Journal 2002

4 XML Syntax tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

5 XML Syntax Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data

6 XML Semantics: a Tree ! Mary Maple 345 Seattle John Thailand Mary Maple 345 Seattle John Thailand data Mary person name address name address streetnocity Maple345 Seattle John Thai phone id o555 Element node Text node Attribute node Order matters !!!

7 XML Data XML is self-describing Schema elements become part of the data –Reational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

8 Relational Data as XML John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” person XML: person namephone John3634 Sue6343 Dick6363

9 What is “Semi-Structured” ? Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe  no phone ! namephone John1234 Joe-

10 What is “Semi-Structured” ? Repeated attributes Impossible in tables: Mary Mary  two phones ! namephone Mary ???

11 What is “Semi-Structured” ? Attributes with different types in different objects Nested collections (non 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !

12 Document Type Definitions DTD DTD’s = old and archaic –You’ll hate it XML Schema = new, baroque, and horrible –Won’t discuss, or you’ll hate me XML document: well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it Validation is useful in data exchange

13 Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>

14 Very Simple DTD John B Jim B John B Jim B Example of valid XML document:

15 DTD: The Content Model Content model: –Complex = a regular expression over other elements –Text-only = #PCDATA –Empty = EMPTY –Any = ANY –Mixed content = (#PCDATA | A | B | C)* content model

16 DTD: Regular Expressions <!ELEMENT name (firstName, lastName)) <!ELEMENT name (firstName?, lastName)) DTDXML <!ELEMENT person (name, phone*)) sequence optional <!ELEMENT person (name, (phone| ))) Kleene star alternation

17 Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>

18 Very Simple DTD John B Jim B John B Jim B Example of valid XML document:

19 Attributes in DTDs

20 Attributes in DTDs <!ATTLIST person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <!ATTLIST person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED > <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”>

21 Attributes in DTDs Types: CDATA = string ID = key IDREF = foreign key IDREFS = foreign keys separated by space (Monday | Wednesday | Friday) = enumeration NMTOKEN = must be a valid XML name NMTOKENS = multiple valid XML names ENTITY = you don’t want to know this

22 Attributes in DTDs Kind: #REQUIRED #IMPLIED = optional value = default value value #FIXED = the only value allowed

23 Using DTDs Must include in the XML document Either include the entire DTD: – Or include a reference to it: – Or mix the two... (e.g. to override the external definition)

24 Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will not discuss in class

25 Sample Data for Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

26 Data Model for XPath bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element

27 XPath: Simple Expressions Result: Result: empty (there were no papers) /bib/book/year /bib/paper/year

28 XPath: Restricted Kleene Closure Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman Result: Rick //author /bib//first-name

29 Xpath: Text Nodes Result: Serge Abiteboul Victor Vianu Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: –text() = matches the text value –node() = matches any node (= * or text()) –name() = returns the name of the current tag /bib/book/author/text()

30 Xpath: Wildcard Result: Rick Hull * Matches any element //author/*

31 Xpath: Attribute Nodes Result: means that price is has to be an attribute

32 Xpath: Predicates Result: Rick Hull /bib/book/author[firstname]

33 Xpath: More Predicates Result: … … /bib/book/author[firstname][address[.//zip][city]]/lastname

34 Xpath: More Predicates < 60] < 25] /bib/book[author/text()]

35 Xpath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a a price attribute price attribute in book, in bib matches…

36 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries

37 FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...

38 FOR-WHERE-RETURN Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year/text() > 1995 RETURN $x/title FOR $x IN document("bib.xml")/bib/book WHERE $x/year/text() > 1995 RETURN $x/title Result: abc def ghi

39 FOR-WHERE-RETURN Equivalently (perhaps more geekish) FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title RETURN $x FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title RETURN $x And even shorter: document("bib.xml")/bib/book[year/text() > 1995] /title

40 FOR-WHERE-RETURN Find all book titles and the year when they were published: FOR $x IN document("bib.xml")/bib/book RETURN { $x/title/text() } { $x/year/text() } We can construct whatever XML results we want !

41 Answer How to cook a Turkey 2003 Cooking While Watching TV 2004 Turkeys on TV

42 FOR-WHERE-RETURN Notice the use of “{“ and “}” What is the result without them ? FOR $x IN document("bib.xml")/bib/book RETURN $x/title/text() $x/year/text()

43 XQuery: Nesting For each author of a book by Morgan Kaufmann, list all books she published: FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/author RETURN { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/author RETURN { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } In the RETURN clause comma concatenates XML fragments

44 XQuery Jones abc def Smith ghi Jones abc def Smith ghi Result:

45 Aggregates Find all books with more than 3 authors: count = a function that counts avg = computes the average sum = computes the sum distinct-values = eliminates duplicates FOR $x IN document("bib.xml")/bib/book WHERE count($x/author)>3 RETURN $x

46 Aggregates Same thing: FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

47 XQuery Find books whose price is larger than average: FOR $b in document(“bib.xml”)/bib LET $a:=avg($b/book/price/text()) FOR $x in $b/book WHERE $x/price/text() > $a RETURN $x FOR $b in document(“bib.xml”)/bib LET $a:=avg($b/book/price/text()) FOR $x in $b/book WHERE $x/price/text() > $a RETURN $x LET binds a variable to one value; FOR iterates a variable over a list of values We will come back to that

48 FOR-WHERE-RETURN “Flatten” the authors, i.e. return a list of (author, title) pairs FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text() RETURN { $x } { $y } Answer: abc efg abc hkj

49 FOR-WHERE-RETURN For each author, return all book titles he/she wrote FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text() RETURN { $x } { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } What about duplicate authors ? Answer: efg abc klm....

50 FOR-WHERE-RETURN Same, but eliminate duplicate authors: FOR $b IN document("bib.xml")/bib LET $a := distinct-values($b/book/author/text()) FOR $x IN $a RETURN $x { FOR $y IN $b/book[author/text()=$x]/title RETURN $y }

51 FOR-WHERE-RETURN Same thing: FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text()) RETURN $x { FOR $y IN $b/book[author/text()=$x]/title RETURN $y }

52 SQL and XQuery Side-by-side Product(pid, name, maker, price) Find all product names, prices, sort by price SELECT x.name, x.price FROM Product x ORDER BY x.price SQL FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN { $x/name, $x/price } XQuery

53 abc 7 def Answers Notice: this is NOT a well-formed document ! (WHY ???) NamePrice abc7 def23...

54 Producing a Well-Formed Answer { FOR $x in document(“db.xml”)/db/Product/row ORDER BY $x/price/text() RETURN { $x/name, $x/price } }

55 abc 7 def Xquery’s Answer Now it is well-formed !

56 SQL and XQuery Side-by-side Product(pid, name, maker, price) Company(cid, name, city, revenues) Find all products made in Seattle SELECT x.name FROM Product x, Company y WHERE x.maker=y.cid and y.city=“Seattle” SQL FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/row WHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle” RETURN { $x/name } XQuery FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()] RETURN { $x/name } Cool XQuery

abc efg …. …

58 SQL and XQuery Side-by-side For each company with revenues < 1M count the products over $100 SELECT y.name, count(*) FROM Product x, Company y WHERE x.price > 100 and x.maker=y.cid and y.revenue < GROUP BY y.cid, y.name FOR $r in document(“db.xml”)/db, $y in $r/Company/row[revenue/text() { $y/name/text() } { count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100]) }

59 SQL and XQuery Side-by-side Find companies with at least 30 products, and their average price SELECT y.name, avg(x.price) FROM Product x, Company y WHERE x.maker=y.cid GROUP BY y.cid, y.name HAVING count(*) > 30 FOR $r in document(“db.xml”)/db, $y in $r/Company/row LET $p := $r/Product/row[maker/text()=$y/cid/text()] WHERE count($p) > 30 RETURN { $y/name/text() } avg($p/price/text()) A collection An element

60 FOR v.s. LET FOR Binds node variables  iteration LET Binds collection variables  one value

61 FOR v.s. LET FOR $x IN /bib/book RETURN { $x } FOR $x IN /bib/book RETURN { $x } Returns:... LET $x := /bib/book RETURN { $x } LET $x := /bib/book RETURN { $x } Returns:...

62 XML from/to Relational Data XML publishing: –relational data  XML XML storage: –XML  relational data

63 XML Publishing Relational Database Application Web XML publishing Tuple streams XML SQL Xpath/ XQuery

64 XML Publishing Relational schema: Student(sid, name, address) Course(cid, title, room) Enroll(sid, cid, grade) StudentCourse Enroll

65 XML Publishing Operating Systems MGH084 John Seattle 3.8 … Database EE045 Mary Shoreline 3.9 … … Operating Systems MGH084 John Seattle 3.8 … Database EE045 Mary Shoreline 3.9 … … Other representations possible too Group by courses: redundant representation of students

66 XML Publishing First thing to do: design the DTD:

67 { FOR $x IN /db/Course/row RETURN { $x/title/text() } { $x/room/text() } { FOR $y IN /db/Enroll/row[cid/text() = $x/cid/text()] $z IN /db/Student/row[sid/text() = $y/sid/text()] RETURN { $z/name/text() } { $z/address/text() } { $y/grade/text() } } { FOR $x IN /db/Course/row RETURN { $x/title/text() } { $x/room/text() } { FOR $y IN /db/Enroll/row[cid/text() = $x/cid/text()] $z IN /db/Student/row[sid/text() = $y/sid/text()] RETURN { $z/name/text() } { $z/address/text() } { $y/grade/text() } } Now we write an XQuery to export relational data  XML Note: result is is the right DTD

68 XML Publishing Query: find Mary’s grade in Operating Systems FOR $x IN /xmlview/course[title/text()=“Operating Systems”], $y IN $x/student/[name/text()=“Mary”] RETURN $y/grade/text() FOR $x IN /xmlview/course[title/text()=“Operating Systems”], $y IN $x/student/[name/text()=“Mary”] RETURN $y/grade/text() XQuery SELECT Enroll.grade FROM Student, Enroll, Course WHERE Student.name=“Mary” and Course.title=“OS” and Student.sid = Enroll.sid and Enroll.cid = Course.cid SQL Can be done automatically

69 XML Publishing How do we choose the output structure ? Determined by agreement with partners/users Or dictated by committees –XML dialects (called applications) = DTDs XML Data is often nested, irregular, etc No normal forms for XML

70 XML Storage Most often the XML data is small –E.g. a SOAP message –Parsed directly into the application (DOM API) Sometimes XML data is large –need to store/process it in a database The XML storage problem: –How do we choose the schema of the database ?

71 XML Storage Three solutions: Schema derived from DTD Storing XML as a graph: “Edge relation” Store it as a BLOB –Simple, boring, inefficient –Won’t discuss in class

72 Designing a Schema from DTD Design a relational schema for: <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>

73 Designing a Schema from DTD First, construct the DTD graph: company personproduct ssnname officephonepidpriceavail.descr. * * * We ignore the order

74 Designing a Schema from DTD Next, design the relational schema, using common sense. company personproduct ssnname officephonepidpriceavail.descr. * * * Person(ssn, name, office) Phone(ssn, phone) Product(pid, name, price, avail., descr.) Which attributes may be NULL ? (Look at the DTD)

75 Designing a Schema from DTD What happens to queries: FOR $x IN /company/product[description] RETURN { $x/name, $x/description } FOR $x IN /company/product[description] RETURN { $x/name, $x/description } SELECT Product.name, Product.description FROM Product WHERE Product.description IS NOT NULL SELECT Product.name, Product.description FROM Product WHERE Product.description IS NOT NULL

76 Storing XML as a Graph Sometimes we don’t have a DTD: How can we store the XML data ? Every XML instance is a tree Store the edges in an Edge table Store the #PCDATA in a Value table

77 Storing XML as a Graph db book publisher titleauthor titleauthor titlestate “Complete Guide to DB2” “Chamberlin”“Transaction Processing” “Bernstein”“Newcomer” “Morgan Kaufman” “CA” SourceTagDest 0db1 1book2 2title3 2author4 1book5 5title6 5author7... SourceVal 3Complete guide... 4Chamberlin 6... Edge Value Can be ANY XML data (don’t know DTD)

78 Storing XML as a Graph What happens to queries: FOR $x IN /db/book[author/text()=“Chamberlin”] RETURN $x/title FOR $x IN /db/book[author/text()=“Chamberlin”] RETURN $x/title db book author title “Chamberlin”Return value xdb xbook xauthorxtitle vauthor vtitle

79 Storing XML as a Graph What happens to queries: SELECT vtitle.value FROM Edge xdb, Edge xbook, Edge xauthor, Edge xtitle, Value vauthor, Value vtitle WHERE xdb.source = 0 and xdb.tag = ‘db’ and xdb.dest = xbook.source and xbook.tag = ‘book’ and xbook.dest = xauthor.source and xauthor.tag = ‘author’ and xbook.dest = xtitle.source and xtitle.tag = ‘title’ and xauthor.dest = vauthor.source and vauthor.value = ‘Chamberlin” and xtitle.dest = vtitle.source SELECT vtitle.value FROM Edge xdb, Edge xbook, Edge xauthor, Edge xtitle, Value vauthor, Value vtitle WHERE xdb.source = 0 and xdb.tag = ‘db’ and xdb.dest = xbook.source and xbook.tag = ‘book’ and xbook.dest = xauthor.source and xauthor.tag = ‘author’ and xbook.dest = xtitle.source and xtitle.tag = ‘title’ and xauthor.dest = vauthor.source and vauthor.value = ‘Chamberlin” and xtitle.dest = vtitle.source A 6-way join !!!

80 XML in SQL Server 2005 Create tables with attributes of type XML Use Xquery in SQL queries Rest of the slides are from: Shankar Pal et al., Indexing XML data stored in a relational database, VLDB’2004

81 CREATE TABLE DOCS ( ID int primary key, XDOC xml) SELECT ID, XDOC.query(’ for $s in “ ”]//SECTION return {data($s/TITLE)} ') FROM DOCS SELECT ID, XDOC.query(’ for $s in “ ”]//SECTION return {data($s/TITLE)} ') FROM DOCS

82 XML Methods in SQL Query() = returns XML data type Value() = extracts scalar values Exist() = checks conditions on XML nodes Nodes() = returns a rowset of XML nodes that the Xquery expression evaluates to

83 Examples From here: p?url=/library/en- us/dnsql90/html/sql2k5xml.asp p?url=/library/en- us/dnsql90/html/sql2k5xml.asp

84 XML Type CREATE TABLE docs ( pk INT PRIMARY KEY, xCol XML not null ) CREATE TABLE docs ( pk INT PRIMARY KEY, xCol XML not null )

85 Inserting an XML Value INSERT INTO docs VALUES (2, ' XML Schema Benefits Features ') INSERT INTO docs VALUES (2, ' XML Schema Benefits Features ')

86 Query( ) SELECT pk, = 123]//section') FROM docs SELECT pk, = 123]//section') FROM docs

87 Exists( ) SELECT = 123]//section') FROM docs WHERE xCol.exist = 123]') = 1 SELECT = 123]//section') FROM docs WHERE xCol.exist = 123]') = 1

88 Value( ) SELECT xCol.value( = 3]/title)[1])', 'nvarchar(max)') FROM docs SELECT xCol.value( = 3]/title)[1])', 'nvarchar(max)') FROM docs

89 Nodes( ) SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName, nref.value('last-name[1]', 'nvarchar(50)') LastName AS R(nref) WHERE nref.exist('.[first-name != "David"]') = 1 SELECT nref.value('first-name[1]', 'nvarchar(50)') FirstName, nref.value('last-name[1]', 'nvarchar(50)') LastName AS R(nref) WHERE nref.exist('.[first-name != "David"]') = 1

90 Nodes( ) SELECT 'varchar(max)') LastName FROM docs CROSS APPLY xCol.nodes('//book') AS R(nref) SELECT 'varchar(max)') LastName FROM docs CROSS APPLY xCol.nodes('//book') AS R(nref)

91 Internal Storage XML is “shredded” as a table A few important ideas: –Dewey decimal numbering of nodes; store in clustered B-tree indes –Use only odd numbers to allow insertions –Reverse PATH-ID encoding, for efficient processing of postfix expressions like //a/b/c –Add more indexes, e.g. on data values

92 Bad Bugs Nobody loves bad bugs. Tree Frogs All right-thinking people love tree frogs.

93

94 Infoset Table

95 = “ ”]/SECTION SELECT SerializeXML (N2.ID, N2.ORDPATH) FROM infosettab N1 JOIN infosettab N2 ON (N1.ID = N2.ID) WHERE N1.PATH_ID = AND N1.VALUE = ' ' AND N2.PATH_ID = PATH_ID(BOOK/SECTION) AND Parent (N1.ORDPATH) = Parent (N2.ORDPATH) SELECT SerializeXML (N2.ID, N2.ORDPATH) FROM infosettab N1 JOIN infosettab N2 ON (N1.ID = N2.ID) WHERE N1.PATH_ID = AND N1.VALUE = ' ' AND N2.PATH_ID = PATH_ID(BOOK/SECTION) AND Parent (N1.ORDPATH) = Parent (N2.ORDPATH)