1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004.

Slides:



Advertisements
Similar presentations
Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
Lecture 07: Relational Algebra
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
1 Part 3: Query Languages Managing XML and Semistructured Data.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.
Query Languages - XQuery Slides partially from Dan Suciu.
CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University1 Database Management Systems Session 10 Instructor: Vinnie Costa
XML May 2 nd, Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
XML, XML Schema, Xpath and Xquery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Xpath to XQuery February 23rd, Other Stuff HW 3 is out. Instructions for Phase 3 are out. Today: finish Xpath, start and finish Xquery. From Wednesday:
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Relational Schema Design (end) Relational Algebra Finally, querying the database!
Xquery. Summary of XQuery FLWR expressions FOR and LET expressions Collections and sorting Resource W3C recommendation:
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
More XML: semantics, DTDs, XPATH February 18, 2004.
Transactions, Relational Algebra, XML February 11 th, 2004.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.
1 Lecture 10: Database Design and Relational Algebra Monday, October 20, 2003.
XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections and sorting 2.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Lecture 14: Relational Algebra Projects XML?
Relational Algebra.
XML path expressions CSE 350 Fall 2003.
Lecture 8: Relational Algebra
Lecture 11: Xpath/XQuery
Querying XML and Semistructured Data
XML: Schemas, Queries Wednesday, 4/17/2002
Lecture 12: XML, XPath, XQuery
Lecture 33: The Relational Model 2
Lecture 9: XML Monday, October 17, 2005.
Relational Algebra Friday, 11/14/2003.
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Lecture 11: XML and Semistructured Data
Lecture 11: Functional Dependencies
Presentation transcript:

1 Lecture 5: Relational Algebra and XML Monday, April 26th, 2004

2 Course Agenda Today, XML and relational algebra Next two weeks: the internals of DBMS. –Covered in gory detail in the book, but stay tuned for reading assignments. May 20th (not 17th!): Phil Bernstein on meta-data management. May 24th: data integration. May 27th: final exam.

3 Agenda Relational algebra XML: –What is it and why do we care? –Data model –Query language: XPath –Real query language: XQuery. –General ruminations about XML.

4 Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language Algebra Implementation SQL, relational calculus Relational algebra Relational bag algebra

5 Relational Algebra Five operators: –Union:  –Difference: - –Selection:  –Projection:  –Cartesian Product:  Derived or auxiliary operators: –Intersection, complement –Joins (natural,equi-join, theta join, semi-join) –Renaming: 

6 1. Union and 2. Difference R1  R2 Example: –ActiveEmployees  RetiredEmployees R1 – R2 Example: –AllEmployees -- RetiredEmployees

7 What about Intersection ? It is a derived operator R1  R2 = R1 – (R1 – R2) Also expressed as a join (will see later) Example –UnionizedEmployees  RetiredEmployees

8 3. Selection Returns all tuples which satisfy a condition Notation:  c (R) Examples –  Salary > (Employee) –  name = “Smith” (Employee) The condition c can be =,, , <>

9 Find all employees with salary more than $40,000.  Salary > (Employee)

10 4. Projection Eliminates columns, then removes duplicates Notation:  A1,…,An (R) Example: project social-security number and names: –  SSN, Name (Employee) –Output schema: Answer(SSN, Name)

11  SSN, Name (Employee)

12 5. Cartesian Product Each tuple in R1 with each tuple in R2 Notation: R1  R2 Example: –Employee  Dependents Very rare in practice; mainly used to express joins

13

14 Renaming Changes the schema, not the instance Notation:  B1,…,Bn (R) Example: –  LastName, SocSocNo (Employee) –Output schema: Answer(LastName, SocSocNo)

15 Renaming Example Employee NameSSN John Tony LastNameSocSocNo John Tony  LastName, SocSocNo (Employee)

16 Natural Join Notation: R1 ⋈ R2 Meaning: R1 ⋈ R2 =  A (  C (R1  R2)) Where: –The selection  C checks equality of all common attributes –The projection eliminates the duplicate common attributes

17 Natural Join Example Employee NameSSN John Tony Dependents SSNDname Emily Joe NameSSNDname John Emily Tony Joe Employee Dependents =  Name, SSN, Dname (  SSN=SSN2 (Employee x  SSN2, Dname (Dependents))

18 Natural Join R= S= R ⋈ S= AB XY XZ YZ ZV BC ZU VW ZV ABC XZU XZV YZU YZV ZVW

19 Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S ? Given R(A, B), S(A, B), what is R ⋈ S ?

20 Theta Join A join that involves a predicate R1 ⋈  R2 =   (R1  R2) Here  can be any condition

21 Eq-join A theta join where  is an equality R1 ⋈ A=B R2 =  A=B (R1  R2) Example: –Employee ⋈ SSN=SSN Dependents Most useful join in practice

22 Semijoin R ⋉ S =  A1,…,An (R ⋈ S) Where A 1, …, A n are the attributes in R Example: –Employee ⋉ Dependents

23 Semijoins in Distributed Databases Semijoins are used in distributed databases SSNName... SSNDnameAge... Employee Dependents network Employee ⋈ ssn=ssn (  age>71 (Dependents)) T =  SSN  age>71 (Dependents) R = Employee ⋉ T Answer = R ⋈ Dependents

24 Complex RA Expressions Person Purchase Person Product  name=fred  name=gizmo  pid  ssn seller-ssn=ssnpid=pidbuyer-ssn=ssn  name

25 Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags {a,b,b,c}  {a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}  C (R): preserve the number of occurrences  A (R): no duplicate elimination Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets ! Reading assignment: 5.3 – 5.4

26 Finally: RA has Limitations ! How do we compute “transitive closure”? Find all direct and indirect relatives of Fred Name1Name2Relationship FredMaryFather MaryJoeCousin MaryBillSpouse NancyLouSister

27 XMLXML

28 XML eXtensible Markup Language XML 1.0 – a recommendation from W3C, 1998 Roots: SGML (a very nasty language). After the roots: a format for sharing data

29 Why XML is of Interest to Us XML is just syntax for data –Note: we have no syntax for relational data –But XML is not relational: semistructured This is exciting because: –Can translate any data to XML –Can ship XML over the Web (HTTP) –Can input XML into any application –Thus: data sharing and exchange on the Web

30 XML Data Sharing and Exchange application relational data Transform Integrate Warehouse XML DataWEB (HTTP) application legacy data object-relational Specific data management tasks

31 From HTML to XML HTML describes the presentation

32 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999

33 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content

34 Web Services A new paradigm for creating distributed applications? Systems communicate via messages, contracts. Example: order processing system. MS.NET, J2EE – some of the platforms XML – a part of the story; the data format.

35 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

36 More XML: Attributes Foundations of Databases Abiteboul … 1995 Foundations of Databases Abiteboul … 1995 attributes are alternative ways to represent data

37 More XML: Oids and References Jane Mary John Jane Mary John oids and references in XML are just syntax

38 XML Semantics: a Tree ! Mary Maple 345 Seattle John Thailand Mary Maple 345 Seattle John Thailand data Mary person name address name address streetnocity Maple345 Seattle John Thai phone id o555 Element node Text node Attribute node Order matters !!!

39 XML Data XML is self-describing Schema elements become part of the data –Reational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

40 Relational Data as XML John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” person XML: person

41 XML is Semi-structured Data Missing attributes: Could represent in a table with nulls John 1234 Joe John 1234 Joe  no phone ! namephone John1234 Joe-

42 XML is Semi-structured Data Repeated attributes Impossible in tables: Mary Mary  two phones ! namephone Mary ???

43 XML is Semi-structured Data Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !

44 Document Type Definitions DTD part of the original XML specification an XML document may have a DTD XML document: well-formed = if tags are correctly closed Valid = if it has a DTD and conforms to it validation is useful in data exchange

45 Very Simple DTD <!DOCTYPE company [ ]> <!DOCTYPE company [ ]>

46 Very Simple DTD John B Jim B John B Jim B Example of valid XML document:

47 DTD: The Content Model Content model: –Complex = a regular expression over other elements –Text-only = #PCDATA –Empty = EMPTY –Any = ANY –Mixed content = (#PCDATA | A | B | C)* content model

48 DTD: Regular Expressions <!ELEMENT name (firstName, lastName)) <!ELEMENT name (firstName?, lastName)) DTDXML <!ELEMENT person (name, phone*)) sequence optional <!ELEMENT person (name, (phone| ))) Kleene star alternation

49 Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will not discuss in class

50 Sample Data for Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

51 Data Model for XPath bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element

52 XPath: Simple Expressions Result: Result: empty (there were no papers) /bib/book/year /bib/paper/year

53 XPath: Restricted Kleene Closure Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman Result: Rick //author /bib//first-name

54 Xpath: Text Nodes Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: –text() = matches the text value –node() = matches any node (= * or text()) –name() = returns the name of the current tag /bib/book/author/text()

55 Xpath: Wildcard Result: Rick Hull * Matches any element //author/*

56 Xpath: Attribute Nodes Result: means that price is has to be an attribute

57 Xpath: Predicates Result: Rick Hull /bib/book/author[firstname]

58 Xpath: More Predicates Result: … … /bib/book/author[firstname][address[//zip][city]]/lastname

59 Xpath: More Predicates < “60”] < “25”] /bib/book[author/text()]

60 Xpath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a a price attribute price attribute in book, in bib matches…

61 Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?

62 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries

63 FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...

64 XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN { $x/title } Result: abc def ghi

65 XQuery Find book titles by the coauthors of “Database Theory”: FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN bib/book[author/text() = $x/text()]/title RETURN { $y/text() } FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN bib/book[author/text() = $x/text()]/title RETURN { $y/text() } Result: abc def ghi The answer will contain duplicates !

66 XQuery Same as before, but eliminate duplicates: FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN distinct(bib/book[author/text() = $x/text()]/title) RETURN { $y/text() } FOR $x IN bib/book[title/text() = “Database Theory”]/author $y IN distinct(bib/book[author/text() = $x/text()]/title) RETURN { $y/text() } Result: abc def ghi distinct = a function that eliminates duplicates

67 XQuery: Nesting For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t } FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN { $a, FOR $t IN /bib/book[author=$a]/title RETURN $t }

68 XQuery Jones abc def Smith ghi Jones abc def Smith ghi Result:

69 XQuery FOR $x in expr -- binds $x to each value in the list expr LET $x = expr -- binds $x to the entire list expr –Useful for common subexpressions and for aggregations

70 XQuery count = a (aggregate) function that returns the number of elms FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN { $p } FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN { $p }

71 XQuery Find books whose price is larger than average: LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN { $b } LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN { $b } Let’s try to write this in SQL…

72 XQuery Summary: FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses WHERE Clause RETURN Clause List of tuples Instance of Xquery data model

73 FOR v.s. LET FOR Binds node variables  iteration LET Binds collection variables  one value

74 FOR v.s. LET FOR $x IN document("bib.xml") /bib/book RETURN { $x } FOR $x IN document("bib.xml") /bib/book RETURN { $x } Returns:... LET $x IN document("bib.xml") /bib/book RETURN { $x } LET $x IN document("bib.xml") /bib/book RETURN { $x } Returns:...

75 Collections in XQuery Ordered and unordered collections –/bib/book/author = an ordered collection –Distinct(/bib/book/author) = an unordered collection LET $a = /bib/book  $a is a collection $b/author  a collection (several authors...) RETURN { $b/author } Returns:...

76 Collections in XQuery What about collections in expressions ? $b/price  list of n prices $b/price * 0.7  list of n numbers $b/price * $b/quantity  list of n x m numbers ?? $b/price * ($b/quant1 + $b/quant2)  $b/price * $b/quant1 + $b/price * $b/quant2 !!

77 Sorting in XQuery FOR $p IN distinct(document("bib.xml")//publisher) RETURN { $p/text() }, FOR $b IN document("bib.xml")//book[publisher = $p] RETURN { $b/title, $b/price } SORTBY(price DESCENDING) SORTBY(name) FOR $p IN distinct(document("bib.xml")//publisher) RETURN { $p/text() }, FOR $b IN document("bib.xml")//book[publisher = $p] RETURN { $b/title, $b/price } SORTBY(price DESCENDING) SORTBY(name)

78 If-Then-Else FOR $h IN //holding RETURN { $h/title, IF = "Journal" THEN $h/editor ELSE $h/author } SORTBY (title) FOR $h IN //holding RETURN { $h/title, IF = "Journal" THEN $h/editor ELSE $h/author } SORTBY (title)

79 Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN { $b/title } FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN { $b/title }

80 Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN { $b/title } FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN { $b/title }

81 Other Stuff in XQuery BEFORE and AFTER –for dealing with order in the input FILTER –deletes some edges in the result tree Recursive functions –Currently: arbitrary recursion –Perhaps more restrictions in the future ?

82 Final Comments on XML How are we going to process XML efficiently? –Special purpose XML engines, or –Add functionality to relational engines? Need to manage XML streams. Here, data management is much closer to other programming tasks.