Lecture 14: XML Publishing & Storage Midterm Review

Slides:

Advertisements

Similar presentations

XML: Extensible Markup Language

Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !

Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.

Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?

1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.

1 Lecture 12: XQuery in SQL Server Monday, October 23, 2006.

1 Lecture 9: XQuery. 2 XQuery Motivation XPath expressivity insufficient –no join queries (as in SQL) –no changes to the XML structure possible –no quantifiers.

1 Managing XML and Semistructured Data Part 1: Preliminaries, Motivation and Overview Acknowledgement: Part of the materials in this set of XML slides.

1 COS 425: Database and Information Management Systems XML and information exchange.

1 Lecture 4 XML/Xpath/XQuery Tuesday, January 30, 2007.

End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =

1 Lecture 12: XML Publishing, XML Storage Monday, October 24, 2005.

Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.

End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...

Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.

28 October 2008CIS 340 # 1 Topics To define XML as a technology To place XML in the context of system architectures Online support:

Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)

XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.

1 Final Review Tuesday, March 6, The Final Date: Tuesday, March 13, 2007 Time: 6:30 - 8:30 Room: EE 037 You must come to campus Open book exam.

+ 1 XML eXtensible Markup Language. + 2 XML Lecture Adapted from the work of Dr. Praveen Madiraju of Marquette University.

Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.

XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &

1 CSE544 XML, XQuery Wednesday, April 5, Announcements Project Ideas are posted (to discuss in class) Groups due today Proposals due Monday Two.

SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.

SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.

XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.

1 CSE544 XML, XQuery Monday+Wednesday, April 11+13, 2004.

Midterm Review. Main Topics ER model Relational model Relational Database Design (Theory)

Conceptual Modeling for XML Data

Lecture 11: Functional Dependencies

XML to Relational Database Mapping

XML: Extensible Markup Language

Lecture 10 XML Monday, Oct. 21, 2001.

CPSC-310 Database Systems

Lecture 15: Midterm Review

Database Processing with XML

Managing XML and Semistructured Data

Lecture 11 XML Wednesday, Oct. 24, 2001.

Lecture 06 Data Modeling: E/R Diagrams

eXtensible Markup Language (XML)

Lecture 09: Functional Dependencies, Database Design

Introduction to Database Systems CSE 444 Lecture 23: Final Review

Lecture 12: XML, XPath, XQuery

Semi-Structured data (XML Data MODEL)

Introduction to Database Systems CSE 444 Lecture 12 More Xquery and Xquery in SQL Server April 25, 2008.

Lecture 8: Database Design

Lecture 07: E/R Diagrams and Functional Dependencies

Lecture 9: XML Monday, October 17, 2005.

Lecture 30: Final Review Wednesday, December 6, 2000.

Lecture 5: The Relational Data Model

Wednesday, May 29, 2002 XML Storage Final Review

Lecture 8: XML Data Wednesday, October

Lecture 24: Final Review Friday, March 10, 2006.

Lecture 30: Final Review Wednesday, December 10, 2003.

Syllabus Introduction Website Management Systems

CSE591: Data Mining by H. Liu

Introduction to Database Systems CSE 444 Lecture 23: Final Review

Wednesday, May 22, 2002 XML Publishing, Storage

Lecture 08: E/R Diagrams and Functional Dependencies

Introduction to Database Systems CSE 444 Lecture 10 XML

Final Review Friday, December 8, 2006.

Lecture 12: XQuery in SQL Server

Introduction to Database Systems CSE 444

Introduction to Database Systems CSE 444 Lecture 12 Xquery in SQL Server October 22, 2007.

Semi-Structured data (XML)

Lecture 11: XML and Semistructured Data

Lecture 11: Functional Dependencies

Lecture 13: XQuery XML Publishing, XML Storage

Lecture 29: Final Review Wednesday, December 11, 2002.

Presentation transcript:

Lecture 14: XML Publishing & Storage Midterm Review Wednesday, October 30, 2002

Outline XML publishing XML storage Final review

XML from/to Relational Data XML publishing: relational data  XML XML storage: XML  relational data

XML Publishing XML Tuple streams Web Relational XML publishing Database XML publishing Application Xpath/ XQuery SQL

XML Publishing Exporting the data is easy: we do this for HTML Translating XQuery  SQL is hard XML publishing systems: Research: Xperanto (IBM/DB2), SilkRoute (AT&T Labs and UW) XQuery  SQL Commercial: SQL Server, Oracle only Xpath  SQL and with restrictions

XML Publishing Relational schema: Student(sid, name, address) Will follow SilkRoute, more or less Relational schema: Student(sid, name, address) Course(cid, title, room) Enroll(sid, cid, grade) Enroll Student Course

XML Publishing Group by courses: redundant representation of students <xmlview> <course> <title> Operating Systems </title> <room> MGH084 </room> <student> <name> John </name> <address> Seattle </address > <grade> 3.8 </grade> </student> <student> …</student> … </course> <course> <title> Database </title> <room> EE045 </room> <student> <name> Mary </name> <address> Shoreline </address > <grade> 3.9 </grade> </xmlview> Group by courses: redundant representation of students Other representations possible too

XML Publishing First thing to do: design the DTD: <!ELEMENT xmlview (course*)> <!ELEMENT course (title, room, student*)> <!ELEMENT student (name,address,grade)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT grade (#PCDATA)> <!ELEMENT title (#PCDATA)>

Now we write an XQuery to export relational data  XML Note: result is is the right DTD <xmlview> { FOR $x IN /db/Course/row RETURN <course> <title> { $x/title/text() } </title> <room> { $x/room/text() } </room> { FOR $y IN /db/Enroll/row[cid/text() = $x/cid/text()] $z IN /db/Student/row[sid/text() = $y/sid/text()] RETURN <student> <name> { $z/name/text() } </name> <address> { $z/address/text() } </address> <grade> { $y/grade/text() } </grade> </student> } </course> } </xmlview>

SilkRoute does this automatically XML Publishing Query: find Mary’s grade in Operating Systems XQuery FOR $x IN /xmlview/course[title/text()=“Operating Systems”], $y IN $x/student/[name/text()=“Mary”] RETURN <answer> $y/grade/text() </answer> SilkRoute does this automatically SELECT Enroll.grade FROM Student, Enroll, Course WHERE Student.name=“Mary” and Course.title=“OS” and Student.sid = Enroll.sid and Enroll.cid = Course.cid SQL

XML Publishing How do we choose the output structure ? Determined by agreement Or dictated by committees XML dialects (called applications) = DTDs XML Data is often nested, irregular, etc No normal forms for XML

XML Storage Most often the XML data is small E.g. a SOAP message Parsed directly into the application (DOM API) Sometimes XML data is large need to store/process it in a database The XML storage problem: How do we choose the schema of the database ?

XML Storage Two solutions: Schema derived from DTD Storing XML as a graph: “Edge relation”

Designing a Schema from DTD Design a relational schema for: <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office?, phone*)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, ((price,availability)|description))> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]>

Designing a Schema from DTD <!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office?, phone*)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, ((price,availability)|description))> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)> ]> First, construct the DTD graph: We ignore the order company * * person product * ssn name office phone pid price avail. descr.

Designing a Schema from DTD Next, design the relational schema, using common sense. company * * person product * ssn name office phone pid price avail. descr. Person(ssn, name, office) Phone(ssn, phone) Product(pid, name, price, avail., descr.) Which attributes may be NULL ?

Designing a Schema from DTD What happens to queries: FOR $x IN /company/product[description] RETURN <answer> { $x/name, $x/description } </answer> SELECT Product.name, Product.description FROM Product WHERE Product.description IS NOT NULL

Storing XML as a Graph Sometimes we don’t have a DTD: How can we store the XML data ? Every XML instance is a tree Store the edges in an Edge table Store the #PCDATA in a Value table

Can be ANY XML data (don’t know DTD) Storing XML as a Graph Can be ANY XML data (don’t know DTD) db book publisher title author state “Complete Guide to DB2” “Chamberlin” “Transaction Processing” “Bernstein” “Newcomer” “Morgan Kaufman” “CA” 1 2 3 4 5 6 7 8 9 10 11 Edge Source Tag Dest db 1 book 2 title 3 author 4 5 6 7 . . . Value Source Val 3 Complete guide . . . 4 Chamberlin 6 . . .

Storing XML as a Graph FOR $x IN /db/book[author/text()=“Chamberlin”] What happens to queries: FOR $x IN /db/book[author/text()=“Chamberlin”] RETURN $x/title xdb xbook xauthor xtitle vauthor vtitle db book author title “Chamberlin” Return value

Storing XML as a Graph What happens to queries: A 6-way join !!! SELECT vtitle.value FROM Edge xdb, Edge xbook, Edge xauthor, Edge xtitle, Value vauthor, Value vtitle WHERE xdb.source = 0 and xdb.tag = ‘db’ and xdb.dest = xbook.source and xbook.tag = ‘book’ and xbook.dest = xauthor.source and xauthor.tag = ‘author’ and xbook.dest = xtitle.source and xtitle.tag = ‘title’ and xauthor.dest = vauthor.source and vauthor.value = ‘Chamberlin” and xtitle.dest = vtitle.source

Storing XML as a Graph Edge relation summary: Same relational schema for every XML document: Edge(Source, Tag, Dest) Value(Source, Val) Generic: works for every XML instance But inefficient: Repeat tags multiple times Need many joins to reconstruct data

Other XML Topics Name spaces XML API: XML languages: XML Schema DOM = “Document Object Model” XML languages: XSLT XML Schema Xlink, XPointer SOAP Available from www.w3.org (but don’t spend rest of your life reading those standards !)

Research on XML Data Management at UW Processing: Query languages (XML-QL, a precursor of XQuery) Tukwila XML updates XML publishing/storage SilkRoute STORED XML tools XML Compressor: Xmill XML Toolkit (xsort, xagg, xgrep, xtransf, etc) Theory: Typechecking Xpath containment

The Midterm Open book exam Four questions: And open notebooks, lecture notes, .... But no computers Four questions: SQL E/R Relational model (algebra, FDs, normal forms) XML

1. SQL Selection/project/join Understand well duplicates Aggregate queries avoid nested queries when a GROUP BY suffices Nested queries More difficult ones are with ANY, and NOT IN Updates, table creations, views

2. E/R Diagrams One/many v.s. many/many relationships Inheritance Translation to relations Remember: no table for one/many !

3. Relational Model Relational model What is a semijoin ? FD’s: make sure you can compute S+, find keys Normal forms: BCNF, 3NF

4. XML XML: XPath XQuery How to store XML data Basic syntax: elements + attributes DTDs: elements only The tree model Canonical XML view of a relation (<row>...) XPath XQuery Use it to publish XML from relational data How to store XML data

Final Thoughts Open book Some question(s) may be hard(er) But read the book before the exam Some question(s) may be hard(er) Answer first the questions that are easier The answers should not be very complex