XML Publishing Introduction General approach XPERRANTO SilkRoute Microsoft SQL 2000 Summary.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
XML: Extensible Markup Language
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
SilkRoute: A Framework for Publishing Relational Data in XML Mary Fernández, AT&T Labs - Research Dan Suciu, Univ. of Washington Yada Kadiyska, Univ. of.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
1 COS 425: Database and Information Management Systems XML and information exchange.
Introduction XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
INTEGRATION INTEGRATION Ramon Lawrence University of Iowa
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
SQL Server 2000 and XML Erik Veerman Consultant Intellinet Business Intelligence.
Data Integration in Service Oriented Architectures Rahul Patel Sr. Director R & D, BEA Systems Liquid Data – XML-based data access and integration for.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
Integrating XML with Microsoft SQL Server ©NIITeXtensible Markup Language/Lesson 9/Slide 1 of 31 Objectives In this lesson, you will learn to: * Generate.
1 Introduction to databases concepts CCIS – IS department Level 4.
Introduction to Databases Chapter 7: Data Access and Manipulation.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
5/24/01 Leveraging SQL Server 2000 in ColdFusion Applications December 9, 2003 Chris Lomvardias SRA International
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
1 Design Issues in XML Databases Ref: Designing XML Databases by Mark Graves.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Chapter 14 1 Chapter 14 Storing and Retrieving XML in SQL Server 2000 November 6, 2001 Sook-Kyo Kwon.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
SQLXML XML Technology For SQL Server Brian Moore Developer and Platform Strategy Group Microsoft Corporation.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Chapter 1: Introduction. 1.2 Database Management System (DBMS) DBMS contains information about a particular enterprise Collection of interrelated data.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
Bridging Relational Technology and XML Jayavel Shanmugasundaram Cornell University (Joint work with Catalina Fan, John Funderburk, Jerry Kiernan, Eugene.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
XML: Extensible Markup Language
XML in Web Technologies
Chapter 4 Relational Databases
Database Processing with XML
Database management concepts
Chapter 1: Introduction
SilkRoute: A Framework for Publishing Rational Data in XML
MANAGING DATA RESOURCES
Database management concepts
Query Optimization.
Databases and Information Management
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

XML Publishing Introduction General approach XPERRANTO SilkRoute Microsoft SQL 2000 Summary

Introduction What is XML Publishing? XML Publishing is the task of transforming the relational data into XML, for the purpose of exchange over the Internet. More specifically, publishing XML data involves joining tables, selecting and projecting the data that needs to be exported, creating XML hierarchies; and processing values in an application specific manner.

Introduction Why need XML Publishing? - most business data are stored in relational database systems. - XML is a standard for exchanging business data on the web. - it’s a simply, platform independent, unicode based syntax for which simple and efficient parsers are widely available. - it can not only represent structured data, but also provide an uniform syntax for semi-structured data and marked-up content.

Introduction Two data model: Relational data - fragmented into many flat relations - normalized - proprietary XML data - nested - un-normalized - public (450 schemas at

General Approach Create XML views over Relational Data, each of these XML views can provide an alternative, application-specific view of the underlying relational data. Through these XML views, business partners can access existing relational data as though it was in some industry-standard XML format.

Virtual vs. Materialize Materialized XML Publishing Materialize the entire XML view on request and return the resulting XML document. Virtual XML Publishing Support queries over XML views, return what user applications actually want.

Virtual vs. Materialize Materialized XML Publishing - applications can access all the data without interfering with the relational engine - XML view need to be refreshed periodically - inefficient in some cases Virtual XML Publishing - guarantee data freshness - leverage the processing power of relational engines - translation of an XML query of an XML view into SQL may be complex

Middleware System Interface between Relational Database and User Application - defines and manages XML views - translates incoming XML queries into SQL and submits them to the database system - receives the queries’ results, then translates them back into XML terms.

Applications Web/Intranet XML Query Processor XML Views Manager XML Tagger RDBMS Middleware System Figure 1 A high-level architecture of middleware system View Definition View Description Result XML Documents User XML Queries SQL QueriesTuples Streams

XPERRANTO vs. SilkRoute IBM XPERRANTO - pure XML, single query language approach. XML views are defined by XML query language which is using the type system of XML schema. SilkRoute - XML views are defined using a declarative query language called RXL (Relational to XML Transformation Language).

XPERRANTO vs. SilkRoute XPERRANTO - user only need be familiar with XML - both relation data and meta-data can be represented and queried in the same framework - can publish object-relational structures - pushes all relational logic down to database engine

Query Translation XML View Services XML Tagger Figure 2 XPERRANTO Architecture View Definition View Description SQL Queries Data Tuples XML-QL Parser Query Rewrite SQL Translation XQGM XML Schema Generator XML Result O-R Database SQL Query Processor Stored Tables System Catalog Catalog Info. XML Schema

Example 1: Relational Schema vs. XML View Schema DDL (Data Definition Language) for O-R Schema in SQL99 Terms 1.Create Table Book AS (bookID CHAR(30), name VARCHAR(225), publisher VARCHAR(30)) 2.Create Table publisher AS (name VARCHAR(30), address VARCHAR(255)) 3.Create Type author_type AS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30)) 4.Create Table author OF author_type (REF IS ssn USER GENERATED)

XML View Schema over Example O-R database Create Type author_type AS... Create Table book AS... Create Table author OF...

Default XML View over Example O-R database … … … … … … … … … …similar to and

Example 2: From XQuery to SQL XPERRANTO Query Engine XQuery Parser Query Rewrite & View Composition Computational Pushdown XQuery RDBMS SQL Query Tagger Runtime Tuples Query Result XQGM Tagger Graph

A Purchase Order Database and its Default View id Smith Construction10 9 custnum custname Western Builders order oidcostdesc 10 generator backhoe item oidamtdue 10 1/10/01 6/10/ payment 10 Smith Construction Western Builders generator backhoe …similar to and

XML Purchase Order Smith Construction … 01. create view orders as ( 02. for $order in view (“default”)/order/row 03. return $order/custname for $item in view(“default”)/item/row 08. where $order/id=$item/oid 09. return $item/cost for $payment in view(“default”)/payment/row 16. where $order/id=$payment/oid 17. return $payment/amount ) User-defined XML “ orders ” view

1. for $order in view(“orders”) 2. where $order/customer/text() like “Smith%” 3. return $order XQuery over “ orders ” view XQuery Parser

Query Parsing XQGM (XML Query Graph Model) - extension of a SQL internal query representation called Query Graph Model (QGM). - consists of a set of operators and functions that are designed to capture the semantics of an XML query.

OPERATORDESCRIPTION TableRepresents a table in a relational database ProjectComputes results based on its input SelectRestricts its input JoinJoin two or more inputs GroupbyApplies aggregate functions and grouping OrderbySorts input based on column values UnionUnions two or more inputs UnnestApplies super-scalar functions to input ViewRepresents a view FunctionRepresents an Xquery function XML FUNCTIONDESCRIPTIONOPERATORS 1cr8Elem(Tag, Atts, Clist)Creates an element with tag name Tag, attribute list Atts, and contents ClistProject 2cr8AttList(A1,…,An)Creates a list of attributes from the attributes passed as parametersProject 3cr8XMLFragList(C1,…,Cn)Creates an XML fragment list from the content (element/text) parametersProject 4aggXMLFrags©Aggregate function that creates an XML fragment list from content inputsGroupby 5getTagName(Elem)Returns the element name of ElemProject, Select 6getAttributes(Elem)Returns the list of attributes of ElemProject, Select 7getAttName(Att)Returns the name of attribute AttProject, Select 8Is Element(E)Returns true if E is an element, returns false otherwiseSelect 9isText(T)Returns true if T is text, returns false otherwiseSelect 10Unnest(List)Superscalar function that unnest a listUnnest Part of the XML Functions and Operators in XQGM

table: itemtable: paymenttable: order project: $order= $custname $items $pmts $oid $desc$cost $id $custname $oid$due$amt select: $oid = $id $desc $cost$due$amt project: $item = … project: $pmt = … groupby: $items = aggXMLFrags($item) groupby: orderby (on $due): $pmts = aggXMLFrags($pmt) $due $pmt$item join (correlated): $id $custname $items$pmts $items $pmts $order correlation on order.id view result XQGM for the XML Orders View

project: $elems = getContents($order) $elems View: orders $order Unnest: $elem = unnest($elems) $elem select: isElement($elem) and getTagName($elem) = “customer” $elem project: $vals = getContents($elem) $vals Unnest: $val = unnest($vals) $val select: isText($val) and $val like “Smith%” $val join (correlated): $order correlation on $order XQGM for the Query over Orders View for $order in view(“order”) where $order/customer/text() like “Smith%” return $order

XQGM after the Query Parsing Stage is composed with the views it references (orders view here) and rewrite optimizations are performed to eliminate the construction of intermediate XML fragments and push down predicates. View Composition

FUNCTIONCOMPOSES WITHREDUCTION 1 getTagNamecr8Elem(Tag, Atts, Clist)Tag 2 getAttributescr8Elem(Tag, Atts, Clist)Atts 3 getContentscr8Elem(Tag, Atts, Clist)Clist 4 getAttNamecr8Att(Name, Val)Name 5 getAttValuecr8Att(Name, Val)Val 6 isElementcr8Elem(Tag, Atts, Clist)True 7 isElementOther than cr8ElemFalse 8 isTextPCDATATrue 9 isTextOther than PCDATAFalse 10 unnestaggXMLFrags( C )C 11 unnestcr8XMLFragList(C1,…,Cn) C1 ∪ … ∪ Cn 12 unnestcr8AttList(A1,…, An) A1 ∪ … ∪ An Composition Rules

table: itemtable: paymenttable: order project: $order= … $oid $desc$cost $id $custname $oid$due$amt select: $oid = $id $desc $cost$due$amt project: $item = … project: $pmt = … groupby: $items = aggXMLFrags($item) groupby: orderby (on $due): $pmts = aggXMLFrags($pmt) $due $pmt$item join (correlated): $id $custname $items$pmts $items $pmts $order correlation on order.id Select: $custname like “Smith%” $custname join (correlated): $order Select: $custname like “Smith%” $custname $id $custnameQueryView Predicate pushdown

The goal in this phase of query processing is to push all data and memory intensive operations down to the relational engine as an efficient SQL query. Two techniques are available: 1. Query Decorrelation 2. Tagger Pull-up Computation Pushdown

Complex expressions in Xquery can be represented using correlations. However, it has been shown in earlier work that executing correlated XML queries over a relational database leads to poor performance, so query de-correlation is a necessary step for efficient XML query execution. Query Decorrelation

table: itemtable: paymenttable: order left outer join: $id = $id $oid $desc$cost $id $custname $oid$due$amt join: $oid = $id $desc $cost$due$amt project: $item = … project: $pmt = … Groupby (on $id) : $items = aggXMLFrags($item) groupby: orderby (on $due): $pmts = aggXMLFrags($pmt) $due $pmt$item right outer join: $id = $id $id $custname $items $pmts $order Select: $custname like “Smith%” $custname $id $custname XQGM after Decorrelation 10 project: $order= … $order 13 $id $items$pmts $id

This step comes right after the query decorrelation. It separates the tagger and SQL operations before SQL query are generated Relational operations are pushed to the bottom of the graph. SQL statements are generated and sent to the relational engine for execution. XML construction functions are pulled up to the top of the query graph and transformed into a “tagger run-time” graph, which produces the result XML documents. Tagger Pull-up

input: $id $custname input: $oid = $id $desc $cost$due$amt merge: $item = … merge: $pmt = … aggregate: $items = aggXMLFrags($item) aggregate:: $pmts = aggXMLFrags($pmt) $pmt$item Merge: $order= … $items$pmts $order correlation on id XQGM after Tagger Pull-up select p.oid, i.desc, i.cost from item i, order o where o.custname like ‘Smith%’ and i.oid = o.id order by o.id select o.id, o.custname from order o where o.custname like ‘Smith%’ order by o.id select p.oid, p.due, p.amt from payment p, order o where o.custname like ‘Smith%’ and p.oid = o.id order by o.id, p.due

SilkRoute Approach

Applications Web/Intranet Plan Generator XML Tagger RDBMS SilkRoute’s Architecture Query RXL Result XML Documents User XML Queries SQL QueriesTuples Streams SilkRoute Query Composer XML Template Source Description XML Virtual View Or Materialized View RXL

SilkRoute Approach Database administrator starts by writing an RXL query that defines the XML view of the database. It is called the view query. A materialized view is fed directly into the Plan Generator, which generates a set of SQL queries and one XML template. A virtual view is first composed by the Query Composer with a user query resulting another RXL query which then is fed into Plan Generator. SQL queries are sent to the RDMS server, which returns one sorted tuple stream per SQL query XML Tagger merges the tuple streams and produces the XML document, which is returned to the application.

Query Composer This component takes a user XML-QL query and composes it with the RXL view query resulting a new RXL query. It combines fragments of the view query and user query. Works the similar way that the Query Parser and Query Rewrite components in XPERRANTO do.

Plan Generator This component in SilkRoute uses a greedy optimization algorithm to choose an optimal set of SQL queries for a given RXL view definition. The algorithm bases its decisions on query cost estimations provided by the relational engine and can return more than one plan, which will be integrated with additional optimization algorithms that optimize specific parameters, such as network traffic or server load. Details of the greedy algorithm can be found in: Efficient evaluation of XML middle-ware queries. M. Fernandez etc.

XML Publishing : SQL Server Two approaches SQL-centric approach extend the function of SQL queries to realize the transformation. The extended version of SQL query is called “FOR XML”. Virtual XML views approach use XDR (XML-based XML-Data Reduced) schema language to define virtual XML views over relation database, then do querying with XPath.

XML Publishing : SQL Server SQL-centric approach Three modes RAW mode Auto Mode Explicit Mode

XML Publishing : SQL Server, RAW Mode SELECT CustomerID, OrderID FROM Customer LEFT OUTER JOIN ORDERS ON Customers.CustomerID = Orders.CustomerID For XML Raw SELECT CustomerID, OrderID FROM Customer LEFT OUTER JOIN ORDERS ON Customers.CustomerID = Orders.CustomerID For XML Raw.... flat XML default tag and attribute names

XML Publishing : SQL Server Auto Mode SELECT Customers.CustomerID, OrderID FROM Customer LEFT OUTER JOIN ORDERS ON Customers.CustomerID = Orders.CustomerID ORDER BY Customers.OrderID For XML Auto SELECT Customers.CustomerID, OrderID FROM Customer LEFT OUTER JOIN ORDERS ON Customers.CustomerID = Orders.CustomerID ORDER BY Customers.OrderID For XML Auto.... default tag and attribute names no differently typed sibling elements

XML Publishing : SQL Server Explicit Mode Nested XML User defined tags and attributes Idea: write SQL queries with complex column names Ad-hoc, order dependent semantics

XML Publishing : SQL Server Virtual XML Views The core mechanism of providing XML views over relation data is the concept of an annotated schema, which consist of a schema description of the XML view and annotations that describe the mapping of the XML schema constructs to the relational schema constructs. Then the XPath query together with the annotated schema is translated into a FOR XML query that only returns the data that is required by the query.

Summary IBM XPERRANTO pure XML, single query language approach. XML views are defined by XML query language which is using the type system of XML schema. SilkRoute XML views are defined using a declarative query language called RXL (Relational to XML Transformation Language). Microsoft SQL 2000 Supports queries over XML views, but the support is very limited, because queries are specified using XPath, which is a subset of XQuery.

Future Work IBM XPERRANTO - provides support for insertable and updateable XML views - pushes tagging inside the database system SilkRoute - looks for better algorithms for translating of RXL into efficient SQL and minimization of composed RXL views Microsoft SQL finds out whether query composition and decomposition is possible for the complete XQuery language or for only a subset of the language