Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

Similar presentations


Presentation on theme: "1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,"— Presentation transcript:

1 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada {Thuy_Thi_Thu.Le, Duong_Dai.Doan, bhavsar}@unb.ca **Institute for Information Technology - e-Business, NRC, Fredericton, NB, Canada harold.boley@nrc-cnrc.gc.ca A Bottom-up Strategy for Query Decomposition First IEEE International Conference on Digital Information Management (ICDIM) December 6-8, 2006

2 2 Agenda Introduction Lausen and Marron (LM) Approach Proposed Approach Query Decomposition Algorithm Additional Cases of Input Queries Conclusion

3 3 Introduction Utilization of available heterogeneous web data sources is still a demanding task Automatic retrieval of relevant data from distributed and chaotic sources Avoid generation of such data from scratch Global-As-View (GAV) Distributed data sources follow their own schemas Integration system integrates heterogeneous schemas to a global schema Users interact with the integration system through a global schema

4 4 Introduction Query of a user based on a global schema Cannot be directly employed to query distributed sources due to different structures of global schema and distributed nature To access data from distributed sources, global query must be decomposed into subqueries, conforming to structures of distributed sources Query decomposition plays important role

5 Users Integrated data   >  query  >  Xuan G26 Vietnam Phuoc A12 Campuchia System 1 Xuan G26 Vietnam Phuoc A12 Campuchia System 2 Xuan G26 Vietnam Phuoc A12 Campuchia System n s n r c s n r c s n r c xxx xx xx QUERY DECOMPOSITION DATA CONVERSION Query n Query 1  >  Query 2  >   >   >   > >      >  Result 1Result 2 Result n   >   >   >  Result 2Result nResult 1 Mappings are needed General Scenario for Query Decomposition of Distributed Databases 5

6 6 Problems with Mappings Building mappings is a difficult task Mappings are normally handcrafted Can we decompose a global query into subqueries without mappings ?

7 7 Lausen and Marron (LM02) Approach XML data sources Use XPath query Qglobal='/p1/p2/…/pi/…/pn-1/pn' Decompose global query into subqueries without mappings Use top-down approach Process from top (root node) of a tree (schema) to its bottom (leaf nodes) Process global query from left to right (P1  Pn) G. Lausen and P.J. Marron, “Adaptive evaluation techniques for querying XML-based E-Catalogs,” DBLP, 2002, pp. 19-28.

8 8 Proposed Approach XML data sources Use XPath query Qglobal='/p1/p2/…/pi/…/pn-1/pn' Decompose global query into subqueries without mappings Use bottom-up approach Process from bottom (leaf nodes) of a tree (schema) to the top (root node) Process global query from right to left (Pn  P1)

9 a. Global schema b. SESP schema c. BIGGER schema Fig. 1. Example of a global schema and two local schemas (from LM) XPath query based on global schema Qglobal='/p1/p2/…/pi/…/pn-1/pn' Qglobal= '/department/mobile/products/jammer[price<200]' Find Q SESP AND Q BIGGER for schemas SESP and BIGGER ? Q SESP ='/products/jammer[price<200]' Q BIGGER ='/department/mobile/jammer[price<200]' 9 Example

10 10 Query Decomposition Algorithm Given Q global ='/p1/p2/…/pi/…pn-1/pn' Take rightmost part pn to evaluate If pn is not found in local schema  no subquery for schema. Stop the algorithm Else, pn is found at a node in the tree (local schema), mark that node so that the next search will only be performed on its ancestor nodes Sequentially, consider pi (i=(n-1),...1) of the query for evaluation

11 Check(P i, Anchor) P i exists in the local schema S from Anchor up to the root node P i is matched with the root of S Subquery:='/'  Subquery Subquery:=P i  '/'  Subquery Anchor:=father(P i ) in S Yes i>1 No Yes Anchor:=LeftmostLeafNode Subquery:='' i:= |Q global | Stop Subquery='' Yes No Subquery:=P i P i =Anchor Yes Subquery:=P i  '//'  Subquery No Return Subquery Yes No Subquery='' i=1 Subquery:='//'  Subquery Yes No Yes No i := i-1 Flowchart of the algorithm 11

12 12 XPath queries contain constraints (filter expressions) Qglobal := '/department/mobile/products/jammer[price<200]' Idea Examine price before jammer. Avoid transforming the whole query if price does not exist in local schema  Considerable reduction in execution time Additional Cases of Input Queries (Constraints in Queries)

13 13 Additional Cases of Input Queries (Constraints in Queries) In this case, no subquery for local schema from the global query Qglobal := '/department/mobile/products/jammer[price<200]' products jammer name company Fig. 2. Local schema without price leaf node (adapted from LM)

14 14 Algorithm Analysis We evaluate right to left parts of the input query and from bottom to top of the XML tree Worst case No subquery for a local schema The rightmost part Pn of global query has to be compared to all nodes of local schema Time complexity for a query having n parts to full k-ary tree of height h Numbers of nodes in full k-ary tree of height h

15 15 Algorithm Analysis Best case The rightmost part pn matches with a leaf node of tree at first The above is true for all pi nodes at upper levels of tree. Time complexity for a query having n parts to a full k-ary tree of height h  min(n,h) because algorithm stops when  all n parts of Q global are processed  all nodes from bottom to top of the tree (with height h ) are traversed

16 16 LM Approach – Algorithm Analysis Transform XPath query Qglobal='/p1/p2/…/pi/…/pn-1/pn‘ into local subqueries for local schemas At each pi, to evaluate pi for a binary tree of height h, three operators are used compute and select suitable elements from global query to form local subqueries No transformation: 1 unit time Subquery generalization: 2 h+1 -1 unit times Subquery elimination: 1 unit time

17 17 LM Approach: Algorithm Analysis (cont.) Time complexity to evaluate the whole query Time complexity of the algorithm for a query having n parts given a full k-ary tree of height h

18 18 Comparison LM Algorithm Our algorithm Worst case Best case

19 19 Conclusion Proposed an efficient bottom-up algorithm for query decomposition without predefined mappings Global query is efficiently processed based on its constraints Our algorithm can be extended to work not only with XPath queries, but also with general path expressions like those in Object-Oriented Databases

20 20


Download ppt "1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,"

Similar presentations


Ads by Google