Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001.

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

XML: Extensible Markup Language
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
SilkRoute: A Framework for Publishing Relational Data in XML Mary Fernández, AT&T Labs - Research Dan Suciu, Univ. of Washington Yada Kadiyska, Univ. of.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Information Retrieval in Practice
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Xyleme A Dynamic Warehouse for XML Data of the Web.
2005rel-xml-ii1 The SilkRoute system  The system goals  Scenario, examples  View Forests  View forest and query composition  View forest efficient.
Database Systems and XML David Wu CS 632 April 23, 2001.
CS155b: E-Commerce Lecture 10: Feb. 13, 2003 XML and its relationship to B2B commerce Acknowledgements: R. Glushko, A. Gregory, and V. Ramachandran.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Database Systems More SQL Database Design -- More SQL1.
Chapter 14 Database Connectivity and Web Technologies
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of Wisconsin-Madison/ IBM Almaden Research Center Joint work.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
Overview of Search Engines
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
Create with SharePoint 2010 Jen Dodd Sr. Solutions Consultant
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Chapter 11 Databases.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
2005rel-xml-iii1  View forests and query composition The composition algorithm works for a (large) subset of XQuery, excluding : (see paper for details)
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
OFC304 Excel 2003 Overview: XML Support Joseph Chirilov Program Manager.
Dept. Computer Science, Korea Univ. Intelligent Information System Lab. XML clustering methods Sohn Jong-Soo Intelligent Information.
IT420: Database Management and Organization XML 21 April 2006 Adina Crăiniceanu
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
7.1 Managing Data Resources Chapter 7 Essentials of Management Information Systems, 6e Chapter 7 Managing Data Resources © 2005 by Prentice Hall.
OnLine Analytical Processing (OLAP)
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Rainbow: XML and Relational Database Design, Implementation, Test, and Evaluation Project Members: Tien Vu, Mirek Cymer, John Lee Advisor:
Central Arizona Phoenix LTER Center for Environmental Studies Arizona State University Data Query Peter McCartney RDIFS Training Workshop Sevilleta LTER.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Efficiently Publishing Relational Data as XML Documents IBM Almaden Research Center Eugene Shekita Rimon Barr Michael Carey Bruce Lindsay Hamid Pirahesh.
XPERANTO: A Middleware for Publishing Object-Relational Data as XML Documents Michael Carey Daniela Florescu Zachary Ives Ying Lu Jayavel Shanmugasundaram.
Information Retrieval in Practice
More SQL: Complex Queries, Triggers, Views, and Schema Modification
XML: Extensible Markup Language
LOCO Extract – Transform - Load
Microsoft Access 2003 Illustrated Complete
Databases.
MANAGING DATA RESOURCES
SilkRoute: A Framework for Publishing Rational Data in XML
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
CSE591: Data Mining by H. Liu
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

Using XML to View Relational Data Xin He AMPS Seminar November 30, 2001

Outline Introduction An XML Primer Several related systems and results An XML middle-ware system: SilkRoute XML and Spreadsheets XML and OLAP

Introduction XML -- Basic ideas are simple but potential impact is significant  Easy to read  Simple and flexible  Easy to extract useful information Research opportunities XML brings to Database Management  XML will “turn the Web into a database”  Thus general Database Management issues arise for XML

Introduction Why using XML to view Relational Data XML is emerging as the standard data-exchange format between applications on the Web While most existing data is stored in relational databases This scenario is common This scenario is challenging  Relational data is flat, normalized, its schema is often proprietary  XML data is nested, unnormalized, its schema is public  So, the mapping is inherently complex and maybe difficult to compute efficiently

An XML(eXtensible Markup Language) Primer Example XML file (From Apache Tomcat 4.0 configuration file): mail.smtp.host 10. localhost

An XML Primer Example DTD(Document Type Definitions) <!ATTLIST Connector className #REQUIRED 7. port #REQUIRED>

An XML Primer XML is a method for putting structured data in a text file XML looks a bit like HTML but isn't HTML XML is a family of technologies  XML is new, but not that new XML is license-free, platform-independent and well- supported

Several Related Systems and Results An XML middle-ware system in AT&T research labs: SilkRoute  Automate the conversion of realtional data into XML A new paper published to optimize the query processing algorithm IBM research center: Efficiently Publishing Relational Data as XML Documents  Language specification is based on SQL with minor extension So standard APIs like ODBC can be used  Query performance is worse than the revised SilkRoute

Several Related Systems and Results UCSD: MIX--Mediation of Information using XML  DTD inference  Concentrate on Information Integration  A more complicated architecture University of Wisconsin-Madison: Relational Databases for Querying XML Documents  Objective is different, but part of techniques is related  Limitations and opportunities: some valuable points

SilkRoute Introduction Public DTDs: Numerious industries are working on it  Construct XML views which conform to the public DTDs from vast stores of relational data automatically The system is general, dynamic, and efficient

Motivating Example A simple example from electronic commerce: Suppliers provide product information to resellers For mutual benefit, they have agreed on a particular DTD Supplier's business data is organized according to a relational schema Supplier: convert its relational data into an XML view conforms to the DTD and make the XML view available to resellers  Assume supplier wants to export a subset of its inventory (e.g. only its winter-outerwear stock) Resellers: access that data by formulating queries over the XML view  Reseller is typically only interested in a small subset of the info (e.g. sale price less than half of the retail price) Relational schemas differ from supplier to supplier

Architecture SilkRoute’s Architecture

The View Query: RXL Full power on both sides  Joins, selection conditions, aggregates, and nested queries  Generate XML data with arbitrary levels of nesting RXL has three powerful features which make it possible to create arbitrary complex XML structures  Nested queries, Skolem functions, and Block structure

1. construct ”Acme Clothing” 4. { 5. from Clothing $c 6. where $c.category = “outerwear” 7. construct $c.item 10. $c.category 11. $c.price 12. {from SalePrice $s 13. where $s.pid = $c.pid 14. construct 15. $s.price 16. } 17. {from Problems $p 18. where $p.pid = $c.pid 19. construct $p.comments } } 26. RXL (Relational to XML transformation Language) view query ( V )

The User Query: XML-QL 1. Construct { where 4. $company $name 7. $retail 8. $sale in $sale < 0.5 * $retail 12. construct $company 15. $name } 18. XML-QL user query (U)

The Query Composer The composed RXL query is equivalent to the user query evaluated on the materialized view Composed query often contain constraints on scalar values that can be evaluated using indexes in the relational database

The Query Composer 1. Construct 2. from Clothing $c, SalePrice $s 3. { from Clothing $c, SalePrice $s 4. where $c.category = “outerwear”, 5. $c.pid = $s.pid, $s.price < 0.5 * $c.retail 6. $s.price < 0.5 * $c.retail 7. construct ”Acme Clothing” 10. $c.item 11. } 12. Composed RXL query ( C )

Composition Algorithm Problem Statement: * C = UοV * XD = V ( RDB ) * A = U ( XD ) = U ( V ( RDB ) ) * C ( RDB ) = A = U ( V ( RDB ) ) = ( UοV ) ( RDB )

Composition Algorithm Key Idea: Match U's pattern on V directly, without constructing XD First step: match U's pattern with V's template (Next slide shows V again with U's patterns matched in it highlighted) Second step: Construct C  C's construct clause is the same as U's construct clause, with variable substitutions  C's from and where clauses consist of all the "relevant" from and where in V and all the where filters in U, with variable renaming

1. construct ”Acme Clothing” 3. ”Acme Clothing” 4. { 5. from Clothing $c 6. where $c.category = “outerwear” 7. construct $c.item 9. $c.item 10. $c.category $c.price 11. $c.price 12. {from SalePrice $s 13. where $s.pid = $c.pid 14. construct $s.price 15. $s.price 16. } 17. {from Problems $p 18. where $p.pid = $c.pid 19. construct $p.comments } } 26. RED RXL view query ( V ) with patterns from XML-QL query in RED

Composition Algorithm Diagram of Query Composition

Translator and XML Generator The translator takes an RXL query and decomposes it into one or more SQL queries and an XML template  Initial SilkRoute uses full partition strategy  The IBM research paper: sorted, outer union strategy  The 2001 SIGMOD paper gives an optimal algorithm XML generator merges the result tuples into XML document in a single pass

Other Scenarios Minor changes to the information flow permit other scenarios Export the entire database as one, large XML document by materializing the view query The result of query composition can be kept virtual for later composition with other user queries

Alternative Approaches Materialized XML view  Precompute or compute on demand  Feasible when the XML view is small and the applicaton needs to load the entire view in memory  Data may become stale Use a native XML database engine  Stanford DB group: Lore Project  One can materialize an XML view using SilkRoute and store the result in an XML engine Avoid the cost of query composition Performance is unlikely to compete with SQL engine anytime soon Can't guarantee data freshness and incur a high space cost

XML and Spreadsheets XML support in Microsoft Excel 2002 for Office XP "..these new features mean that Microsoft Excel is set to play an important role in any organization's application environment." Bi-direction transformation  Excel can recognize and open XML documents including XSL processing XML flattening  Any Excel Spreadsheet can be saved as an XML file while preserving "the new XML Spreadsheet file format"

XML and Spreadsheets Well, Microsoft enable Spreadsheets to manipulate XML documents We can enable Spreadsheets to manipulate relational data using XML view!

XML and Spreadsheets Can we execute Spreadsheet style processing directly on XML files ? XML is hierarchical, and it's unnormalized, which is exactly what people would like to see in Spreadsheets Given system like SilkRoute Can we define Spreadsheet function map on XML view?  Think the function executions as XML view queries Define one function on the other is similar as define a new XML view from an existing composed view  It will also arise challenge in the view generation systems

XML and OLAP (OnLine Analytical Processing) Physically integrating unexpected data into OLAP systems is time-consuming Logical integration is the better choice XML’s increasing use in data-exchange suggests that the required data can be available through XML views Possibilities:  Reference external XML data in OLAP queries  XML data can be presented along with dimensional data in the result of an OLAP query  Use XML data for selection and grouping Microsoft and Hyperion published "Open XML for Analysis Specification" in April 2001