Download presentation
Presentation is loading. Please wait.
1
CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou
2
2 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation
3
3 It starts with … “Provide to customers, partners, employees Application X”, where X may be in Business Intelligence, Customer Support, … Then the problem comes up… “The applications uses information assets widely distributed across my enterprise” If only…. “Give to the application a single place to go to access all the information required. Requirements are evolving so make sure the system can be easily maintained and upgraded” Data Integration Requirements in eBusiness Applications
4
4 (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … John 56 Chicago George 58 Chicago … View-Based Approach: Wrappers Export Basic Source Views
5
5 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … Wrappers Export Basic Source Views (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper
6
6 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … Mediators Export Integrated Views, Tailored to Application Needs (XML) View Client Application Mediator Integrated (XML) View (XML) View Orders Rel. DB Customers Rel. DB Wrapper customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer …
7
7 Mediator Wrapper Orders Database Customers Database Find all Chicago customer names, along with their ordered items Retrieve Chicago customer names and id’s Retrieve all cid’s and item names of orders Application Virtual Views: Query-Driven Mediator Operation
8
8 Mediator Wrapper Orders Database Customers Database Application customer name John id 56 … order cid 56 item chips order cid 56 item salsa … customers customer name John ordered_items item chips item salsa customer … On-Demand (Query-Driven) Mediator Operation
9
9 Multiple Plans are Possible Retrieve customers For each customer find matching orders
10
10 Build and Run “Optimal” Plan –Consisting of operators that –Collect source info using supported queries and commands –Combine info into XML result A New Kind of Query Processing Problem
11
11 Operate within the Limited and Different Capabilities of the Sources –Describe sets of supported queries –Use most efficient supported queries Optimize plans/queries sent to sources –Estimate Costs of Plans –Adapt Plans Along the Way –Beyond Conjunctive Queries –Compose Queries/Views Efficiently Schema inference & optimization Combine navigation & querying Challenges in Query Processing & Optimization
12
12 Queries supported by mediator Answering Queries Using Views But with Infinite Sets of Views Increasing Relevance due to Web Services Source Data & Schema all queries over schema Queries supported by wrapper Source Data & Schema From Limited Wrappers to Efficient Plans for Extended Query Sets
13
13 Operate within the Limited and Different Capabilities of the Sources –Describe sets of supported queries –Use most efficient supported queries Optimize plans/queries sent to sources –Estimate Costs of Plans –Adapt Plans Along the Way –Beyond Conjunctive Queries –XQuery processing Schema inference & optimization Combine navigation & querying –Build iterator models for low memory footprint Challenges in Query Processing & Optimization
14
14 order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Navigation-Driven Evaluation of Query Result
15
15... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations down(p) right(p) p Client Navigation-Driven Evaluation
16
16... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation
17
17... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation
18
18... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation
19
19... s1 sn XML source result Lazy Mediator view definition ans = q( s1 … sn ) Input: client navigations Output: source navigations Client Navigation-Driven Evaluation
20
20 Mixing Querying & Navigation customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Find details of all salsa orders below visited node
21
21 Two-dimensional navigation –Reminds of cursors but there are multiple continuation points Controlling size + shape Contextualizing queries by navigation Challenges in Mixing Querying & Navigation
22
22 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation
23
23 Translation to Algebra Rewriter/Optimizer Algebra Plan Physical Algebra Plan Queries & Fetch Requests to Sources Source Description Function Description Functions Source Schemas & Types Navigation Requests Results Client Plan Execution Engine An Algebra-Based Query Processor Architecture XQuery Views XQuery
24
24 Well-known efficient physical implementations of the operators Join optimization Nested data by nested plans or group-by Efficient iterator model Query Processing on Tuple-Oriented Algebra Enables…
25
25 XQuery: Queries & Views for XML { for $cust in document(“db”)/customer return { $cust/id, for $order in document(“db”)/order where $order/cid = $cust/id return { $order/id } } }
26
26 Access and Navigation customer_table customer name John id 56 customer name George id 58 source db, [$db1] db $db1 ct getD $db1, customer $cust c1c1 c2c2 $db1 $cust ct c 1 ct c 2 getD $cust, id $cust_id $db1 $cust $cust_id ct c 1 i 1 ct c 2 i 2 i1i1 i2i2
27
27 customer_table customer name John id 56 customer name George id 58 source db, [$db1] db $db1 ct getD $db1, customer/id $cust_id $db1 $cust_id ct i 1 ct i 2 i1i1 i2i2 Since $cust_id $cust and $cust is “useless” otherwise Simplification Using Schema Inference
28
28 Nested Plans for $part $db1 $cust_id ct i 1 ct i 2 $db1 $cust_id $part ct i 1 ct i 2 $db1 $cust_id ct i 1 $db1 $cust_id ct i 2 apply $part, p $orders nestedSrc $part $db1 $cust_id ct i 1 … Plan p $db1 $cust_id $orders ct i 1 [o 11 …] $db1 $cust_id ct i 2 ct i 2 [o 21 …]
29
29 Joins and Selections nestedSrc $part $db1 $cust_id ct i 1 getD $db2, order $order source db, [$db2] getD $order, cid $cust_id2 getD $order, id $order_id $db2 $order $cust_id2 $order_id … $cust_id2=? $cust_id $db1 $cust_id $db2 $order $cust_id2 $order_id …
30
30 Constructors crList $order_id $oidL … $order_id … o 1 … o 2 o1o1 o2o2 … $order_id $oidL … o 1 [o 1 ] … o 2 [o 2 ] crEl order, $oidL $oidE order … $oidL $oidE … [o 1 ] e 1 … [o 2 ] e 2 e1e1 e2e2 listify $oidE $orders $orders [e 1, e 2 ]
31
31 Algebra Example
32
32 Plan Decomposition Within Rewriting Optimizer Rules replacing “leaf” trees May move commutable parts Catch: No projection limitation
33
33 Plan After Decomposition
34
34 p2p2 p2p2 p1p1 p1p1 for $part apply $part, p $R nestedSrc $part p3p3 p3p3 p1p1 p1p1 p2p2 p2p2 groupBy S(p1) $part apply $part, p $R nestedSrc $part p3p3 p3p3 Replacing Nested Plans with GroupBy/Outerjoin Combinations
35
35 Multiple Possible Plans
36
36 Overview The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators –Issues Overview –An Algebra-Based Architecture –Navigation-driven Evaluation
37
37 Source access Source access Source Client Building Navigation-Driven Evaluation on the Algebra
38
38 customer_table customer name John id 56 customer name George id 58 c1c1 c2c2 $db1 $cust ct c 1 ct c 2 getD $cust, id $cust_id $db1 $cust $cust_id ct c 1 i 1 ct c 2 i 2 i1i1 i2i2 root tuple $db1 $cust $cust_id tuple $db1 $cust $cust_id Think of Each Operator as a Lazy Mediator
39
39... s1 sn Result of Operator below result Lazy Operator Input: client navigations Output: source navigations Result of Operator below Augmented with nextTuple(p) p.attr Navigation-Driven Evaluation of Operators
40
40 r/d( ) Operator State V 1 : V 2 : … V n : Other: … f1f2…fnf1f2…fn Proceed down/right Operator State V 1 : V 2 : … V n : Other: … f’ 1 f’ 2 … f’ n Use of Semantic Id’s in Navigation- Driven Evaluation
41
41 lineitem Hole 1 order customer root name, “John” oid, 123 Hole 3 Fragments Reduce the “Set State” – “Produce State” Overhead Hole 2
42
42 lineitem order ordnum=16 Hole 4 Hole 5 lineitem Hole 1 order customer root name, “John” oid, 123 Hole 3 Fragments Reduce the “Set State” – “Produce State” Overhead
43
43 Source access Source access listify Source Client Client-Server Interaction Controller Controlling the Size and Shape of Fragments
44
44 Fragment Size causes Memory Footprint causes Performance
45
45 Fragmentation Strategies Fixed Fragment Size –Ideal for depth-first, left-to-right navigation Adaptive Fragment Size –Assign larger pieces to those who use them
46
46 Depth First traversal Breadth First traversal Response Performance for Breadth-First and Depth-First
47
47 References Navigation-Driven Evaluation of Virtual Mediated Views –Bertram Ludäscher, Yannis Papakonstantinou, Pavel Velikhov –EDBT 2000 Architecture and Implementation of an XQuery- based Information Integration Platform –Yannis Papakonstantinou, Vasilis Vassalos –IEEE Data Eng. Bull. 25(1), 2002 XML queries and algebra in the Enosys integration platform –Yannis Papakonstantinou, Vinayak R. Borkar, Maxim Orgiyan, Konstantinos Stathatos, Lucian Suta, Vasilis Vassalos, Pavel Velikhov –Data Knowl. Eng. 44(3), 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.