CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou.

Slides:



Advertisements
Similar presentations
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Advertisements

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Distributed Query Processing Donald Kossmann University of Heidelberg
Chapter 10: Designing Databases
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
8.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
A Next Wave of Challenges in the Junction of Information Management (esp. Integration) and the Web Yannis Papakonstantinou Associate Prof., CSE, UCSD.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
2005rel-xml-ii1 The SilkRoute system  The system goals  Scenario, examples  View Forests  View forest and query composition  View forest efficient.
2005rel-xml-i1 Relational to XML Transformations  Background & Issues  Preliminaries  Execution strategies  The SilkRoute System.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Database Systems Chapter 1 The Worlds of Database Systems.
Query Processing Presented by Aung S. Win.
Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden.
GMD German National Research Center for Information Technology Innovation through Research Jörg M. Haake Applying Collaborative Open Hypermedia.
Data Integration in Service Oriented Architectures Rahul Patel Sr. Director R & D, BEA Systems Liquid Data – XML-based data access and integration for.
The Design Discipline.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Concept demo System dashboard. Overview Dashboard use case General implementation ideas Use of MULE integration platform Collection Aggregation/Factorization.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. WSMX: a Semantic Service Oriented Middleware for B2B Integration.
INTERPRETING IMPERATIVE PROGRAMMING LAGUAGES IN EXTENSIBLE STYLESHEET LANGUAGE TRANSFORMATIONS (XSLT) Authors: Ruhsan Onder Assoc.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
QURSED : Querying and Reporting Semistructured Data Yannis Papakonstantinou, Michalis Petropoulos, and Vasilis Vassalos Data Warehousing Lab. Semester.
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Session-8 Data Management for Decision Support
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
October 31,  The RDBMS steps in executing SQL query:  Checks query syntax  Validates query-checks data dictionary; verifies objects referred.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Systems Analysis and Design in a Changing World, 3rd Edition
B3AS Joseph Lewthwaite 1 Dec, 2005 ARL Knowledge Fusion COE Program.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Kevin D. Munroe Bertram Ludäscher Yannis Papakonstantinou.
Goal-based Problem Solving Goal formation Based upon the current situation and performance measures. Result is moving into a desirable state (goal state).
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Distributed Information Systems (CSCI 5533) Presentation ID: 19 Query Processing In Distributed Multi - DBMS Submitted to: Dr. Liaw, Morris Submitted by:
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
The PADS-Galax Project Enabling XQuery over Ad-hoc Data Sources Yitzhak Mandelbaum.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Database Management System
Chapter 12: Query Processing
OrientX: an Integrated, Schema-Based Native XML Database System
Database management concepts
MANAGING DATA RESOURCES
Database Architecture
Query Processing CSD305 Advanced Databases.
Overview of Query Evaluation
Yannis Papakonstantinou Associate Prof., CSE, UCSD
Adaptive Query Processing (Background)
Presentation transcript:

CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou

2 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation

3 It starts with … “Provide to customers, partners, employees Application X”, where X may be in Business Intelligence, Customer Support, … Then the problem comes up… “The applications uses information assets widely distributed across my enterprise” If only…. “Give to the application a single place to go to access all the information required. Requirements are evolving so make sure the system can be easily maintained and upgraded” Data Integration Requirements in eBusiness Applications

4 (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … John 56 Chicago George 58 Chicago … View-Based Approach: Wrappers Export Basic Source Views

5 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … Wrappers Export Basic Source Views (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper

6 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … Mediators Export Integrated Views, Tailored to Application Needs (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer …

7 Mediator Wrapper Orders Database Customers Database Find all Chicago customer names, along with their ordered items Retrieve Chicago customer names and id’s Retrieve all cid’s and item names of orders Application Virtual Views: Query-Driven Mediator Operation

8 Mediator Wrapper Orders Database Customers Database Application customer name John id 56 … order cid 56 item chips order cid 56 item salsa … customers customer name John ordered_items item chips item salsa customer … On-Demand (Query-Driven) Mediator Operation

9 Multiple Plans are Possible Retrieve customers For each customer find matching orders

10 Build and Run “Optimal” Plan –Consisting of operators that –Collect source info using supported queries and commands –Combine info into XML result A New Kind of Query Processing Problem

11 Operate within the Limited and Different Capabilities of the Sources –Describe sets of supported queries –Use most efficient supported queries Optimize plans/queries sent to sources –Estimate Costs of Plans –Adapt Plans Along the Way –Beyond Conjunctive Queries –Compose Queries/Views Efficiently Schema inference & optimization Combine navigation & querying Challenges in Query Processing & Optimization

12 Queries supported by mediator Answering Queries Using Views But with Infinite Sets of Views Increasing Relevance due to Web Services Source Data & Schema all queries over schema Queries supported by wrapper Source Data & Schema From Limited Wrappers to Efficient Plans for Extended Query Sets

13 Operate within the Limited and Different Capabilities of the Sources –Describe sets of supported queries –Use most efficient supported queries Optimize plans/queries sent to sources –Estimate Costs of Plans –Adapt Plans Along the Way –Beyond Conjunctive Queries –XQuery processing Schema inference & optimization Combine navigation & querying –Build iterator models for low memory footprint Challenges in Query Processing & Optimization

14 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Navigation-Driven Evaluation of Query Result

15... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations down(p) right(p) p Client Navigation-Driven Evaluation

16... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

17... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

18... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

19... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation

20 Mixing Querying & Navigation customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Find details of all salsa orders below visited node

21 Two-dimensional navigation –Reminds of cursors but there are multiple continuation points Controlling size + shape Contextualizing queries by navigation Challenges in Mixing Querying & Navigation

22 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation

23 Translation to Algebra Rewriter/Optimizer Algebra Plan Physical Algebra Plan Queries & Fetch Requests to Sources Source Description Function Description Functions Source Schemas & Types Navigation Requests Results Client Plan Execution Engine An Algebra-Based Query Processor Architecture XQuery Views XQuery

24 Well-known efficient physical implementations of the operators Join optimization Nested data by nested plans or group-by Efficient iterator model Query Processing on Tuple-Oriented Algebra Enables…

25 XQuery: Queries & Views for XML { for $cust in document(“db”)/customer return { $cust/id, for $order in document(“db”)/order where $order/cid = $cust/id return { $order/id } } }

26 Access and Navigation customer_table customer name John id 56 customer name George id 58 source db, [$db1] db $db1 ct getD $db1, customer  $cust c1c1 c2c2 $db1 $cust ct c 1 ct c 2 getD $cust, id  $cust_id $db1 $cust $cust_id ct c 1 i 1 ct c 2 i 2 i1i1 i2i2

27 customer_table customer name John id 56 customer name George id 58 source db, [$db1] db $db1 ct getD $db1, customer/id  $cust_id $db1 $cust_id ct i 1 ct i 2 i1i1 i2i2 Since $cust_id  $cust and $cust is “useless” otherwise Simplification Using Schema Inference

28 Nested Plans for $part $db1 $cust_id ct i 1 ct i 2 $db1 $cust_id $part ct i 1 ct i 2 $db1 $cust_id ct i 1 $db1 $cust_id ct i 2 apply $part, p  $orders nestedSrc $part $db1 $cust_id ct i 1 … Plan p $db1 $cust_id $orders ct i 1 [o 11 …] $db1 $cust_id ct i 2 ct i 2 [o 21 …]

29 Joins and Selections nestedSrc $part $db1 $cust_id ct i 1 getD $db2, order  $order source db, [$db2] getD $order, cid  $cust_id2 getD $order, id  $order_id $db2 $order $cust_id2 $order_id …  $cust_id2=? $cust_id $db1 $cust_id $db2 $order $cust_id2 $order_id …

30 Constructors crList $order_id  $oidL … $order_id … o 1 … o 2 o1o1 o2o2 … $order_id $oidL … o 1 [o 1 ] … o 2 [o 2 ] crEl order, $oidL  $oidE order … $oidL $oidE … [o 1 ] e 1 … [o 2 ] e 2 e1e1 e2e2 listify $oidE  $orders $orders [e 1, e 2 ]

31 Algebra Example

32 Plan Decomposition Within Rewriting Optimizer Rules replacing “leaf” trees May move commutable parts Catch: No projection limitation

33 Plan After Decomposition

34 p2p2 p2p2 p1p1 p1p1 for $part apply $part, p  $R nestedSrc $part p3p3 p3p3 p1p1 p1p1 p2p2 p2p2 groupBy S(p1)  $part apply $part, p  $R nestedSrc $part p3p3 p3p3 Replacing Nested Plans with GroupBy/Outerjoin Combinations

35 Multiple Possible Plans

36 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation

37 Source access Source access Source Client Building Navigation-Driven Evaluation on the Algebra

38 customer_table customer name John id 56 customer name George id 58 c1c1 c2c2 $db1 $cust ct c 1 ct c 2 getD $cust, id  $cust_id $db1 $cust $cust_id ct c 1 i 1 ct c 2 i 2 i1i1 i2i2 root tuple $db1 $cust $cust_id tuple $db1 $cust $cust_id Think of Each Operator as a Lazy Mediator

39... s1 sn Result of Operator below result Lazy Operator Input: client navigations Output: source navigations Result of Operator below Augmented with nextTuple(p) p.attr Navigation-Driven Evaluation of Operators

40 r/d( ) Operator State V 1 : V 2 : … V n : Other: … f1f2…fnf1f2…fn Proceed down/right Operator State V 1 : V 2 : … V n : Other: … f’ 1 f’ 2 … f’ n Use of Semantic Id’s in Navigation- Driven Evaluation

41 lineitem Hole 1 order customer root name, “John” oid, 123 Hole 3 Fragments Reduce the “Set State” – “Produce State” Overhead Hole 2

42 lineitem order ordnum=16 Hole 4 Hole 5 lineitem Hole 1 order customer root name, “John” oid, 123 Hole 3 Fragments Reduce the “Set State” – “Produce State” Overhead

43 Source access Source access listify Source Client Client-Server Interaction Controller Controlling the Size and Shape of Fragments

44  Fragment Size causes  Memory Footprint causes  Performance

45 Fragmentation Strategies Fixed Fragment Size –Ideal for depth-first, left-to-right navigation Adaptive Fragment Size –Assign larger pieces to those who use them

46 Depth First traversal Breadth First traversal Response Performance for Breadth-First and Depth-First

47 References Navigation-Driven Evaluation of Virtual Mediated Views –Bertram Ludäscher, Yannis Papakonstantinou, Pavel Velikhov –EDBT 2000 Architecture and Implementation of an XQuery- based Information Integration Platform –Yannis Papakonstantinou, Vasilis Vassalos –IEEE Data Eng. Bull. 25(1), 2002 XML queries and algebra in the Enosys integration platform –Yannis Papakonstantinou, Vinayak R. Borkar, Maxim Orgiyan, Konstantinos Stathatos, Lucian Suta, Vasilis Vassalos, Pavel Velikhov –Data Knowl. Eng. 44(3), 2003