Publishing Relational Data in XML David McWherter.

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Implementation of relational operations
CS 540 Database Management Systems
CS 245Notes 31 (1) Insertion/Deletion (2) Buffer Management (3) Comparison of Schemes Other Topics.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Query Processing and Optimization
Unary Query Processing Operators Not in the Textbook!
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
Database Systems and XML David Wu CS 632 April 23, 2001.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
Relational Databases What is a relational database? What would we use one for? What do they look like? How can we describe them? How can you create one?
4/20/2017.
CS1100: Access Reports A (Very) Short Tutorial on Microsoft Access Report Construction Created By Martin Schedlbauer With contributions from Matthew Ekstrand-Abueg.
Database Management 9. course. Execution of queries.
The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram et al. Proceedings -VLDB 2000, Cairo.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
SQL Report Writer.  The SQL Report Writer is included with every Appx runtime.  It is intended to be used by end users to create their own reports.
CS4432: Database Systems II Query Processing- Part 2.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Module 3: Using XML. Overview Retrieving XML by Using FOR XML Shredding XML by Using OPENXML Introducing XQuery Using the xml Data Type.
Chapter 8 Views and Indexes 第 8 章 视图与索引. 8.1 Virtual Views  Views:  “virtual relations”. Another class of SQL relations that do not exist physically.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Execution Plans Detail From Zero to Hero İsmail Adar.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
Unary Query Processing Operators
Efficiently Publishing Relational Data as XML Documents
Relational Algebra Chapter 4, Part A
Evaluation of Relational Operations
Database Management Systems (CS 564)
CPSC-310 Database Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
Teach A level Computing: Algorithms and Data Structures
Selected Topics: External Sorting, Join Algorithms, …
Advance Database Systems
Evaluation of Relational Operations: Other Techniques
Wednesday, May 22, 2002 XML Publishing, Storage
Presentation transcript:

Publishing Relational Data in XML David McWherter

What's the issue? ● All Important Data – Relational Models ● All Important Tools – Relational Models ● Inter-Business Data Exchange – XML Graph Models ● What's in the Koolaid? – XML Graph Models

What's the issue? ● All Important Data – Relational Models ● All Important Tools – Relational Models ● Inter-Business Data Exchange – XML Graph Models ● What's in the Koolaid? – XML Graph Models Solution: Make DB2 Output XML

XML Model vs Relational Model ● Hoho ● Customer( id, name, acctset ) AccountSet( id, acctid ) Account( id, name )... Foreign Key Relns Containment Relns

XML Model vs Relational Model ● Hoho ● Customer( id, name, acctset ) AccountSet( id, acctid ) Account( id, name )... Foreign Key Relns Containment Relns Easiest Solution: Too Easy; Unnatural XML

Outline ● Procedure: – INPUT: Relational data – Add Tags (convert to XML text) – Add Nested Structure – OUTPUT: XML ● How do we add Tags and Structure? – Inside/Outside the DBMS? – Early/Late in the query? ● Evaluate simple algorithms – Find interesting tradeoffs

Algo. Parameter Space ● Early Tags, Early Structure – DBMS queries follow XML structure – DBMS munges XML fragments ● Early Tags, Late Structure – DBMS can choose joins – DBMS munges XML fragments ● Late Tags, Late Structure – DBMS can choose joins – DBMS returns tuples – Postprocessing XMLifies output tuples

Early Tags, Early Structure, Outside ● Stored procedures – C=select * from customers; – Foreach c in C { ● A=Select * from accounts where cid=c ● P=Select * from porders where cid=c order by date ● Foreach p in P { – Select * from items in porder_items where pid=p; –... ● Stitch together XML as query is run ● Used Everywhere, But – Forced nested-loop join – Thousands of queries (per-tuple)

Early Tags, Early Structure ● Stored procedures – C=select * from customers; – Foreach c in C { ● A=Select * from accounts where cid=c ● P=Select * from porders where cid=c order by date ● Foreach p in P { – Select * from items in porder_items where pid=p; –... ● Stitch together XML as query is run ● Used Everywhere, But – Forced nested-loop join – Thousands of queries (per-tuple) Early Tags: Tags output ASAP Early Structure: Queries follow result form

Early Tags, Early Structure ● Correlated CLOBS – Build a horrible nested SQL query. – Aggregator XML() to preserve XML file order – Constructors for XML formatting ● Select cust.name, CUST(cust.id,cust.name, (select XML(ACCT(acct.id,acct.acctnum) from Acct...), (select XML(PORD(...))) ● CUST(cid, cname, acctlst, porderlst ) => {cname} {acctlst} {porderlst}

Early Tags, Early Struct (cont) ● Correlated CLOBS – Good: Fewer queries – Bad: ● Force a nested loops join ● Copying and concatenation of CLOBs ● Decorrelated CLOBS – Find all table paths from XML root to leaves ● cust->acct; cust->porder->item; cust->porder->paymt – Join these tables and make XML fragments ● Reuse common join subexpressions – Join XML fragments on parent keys

Late Tags, Late Structure ● Stupidest Solution – Join all sort tables ● Cust |> <| Paymt – Return tuples to app/dbms – Filter and Tag the result ● Good: – Free join order – No CLOB munging ● BAD: – Too much redundancy ● Eg: Cust fields copied everywhere

Late Tags, Late Structure ● Path-Outer-Union – Start with Decorrelated CLOB Joins ● (cust,acct), (cust,porder,item), (cust,porder,paymt) – Return sets of tuples, not XML: ● (xml-level, col1, col2, col3... ) – (0, custid, cust.b, null, null, null,... ) – (1, custid, null, acct.a, acct.b, null,... ) – (2, custid, null, null, null, porder.a, paymt.a,...) – Tag the result ● Good: – No CLOB munging ● Bad: – So many nulls (Need null-compression)

Late Tags: How to Tag ● Tagging (Textifying) can be hard ● Two solutions: – Hashing ● In-mem Hashtable ● Remember (id/idref) pairs, entity structures ● Output at EOF – Sorting ● Sort Path-Outer-Union result in DBMS ● Entities can occur in XML-order! ● Cute trick ● Makes it “Late-Tag, Early-Structure”

Late Tags: How to Tag ● The Tagging (Textifying) can be hard ● Two solutions: – Hashing ● In-mem Hashtable ● Remember (id/idref) pairs, entity structures ● Output at EOF – Sorting ● Sort Path-Outer-Union results ● Can do it so that entities occur in XML-order ● Cute trick Hashing: Good iff table fits in-memory Sorting: Good otherwise

Performance Points ● Stored Procedures – “Devestating” (2x worse than best) ● Decorrelated Queries – Good, but need null-compression ● CLOBs – CLOB overhead only with deep trees ● Outer-Union Solutions – ~ Decorrelated CLOBS ● Computation in DBMS – Binding data to applications is slow – Pipelining can reduce the pain