Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas.

Similar presentations


Presentation on theme: "Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas."— Presentation transcript:

1 Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Timos Sellis (NTUA, GREECE)

2 Partial path queries Query processing Query evaluation Experiments Conclusion Evaluation of Partial Path Queries on XML Data 

3 3 Difficulties on Querying XML Data Creta theHotel.gr Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta

4 4 Difficulties on Querying XML Data Creta Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem :  structural difference Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem :  structural difference Parthenon (438 BC) Phaistos’ Disk (1700 BC) theHotel.gr Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta 

5 5 Difficulties on Querying XML Data Creta Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem:  structural inconsistency Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem:  structural inconsistency theHotel.gr Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta   Windsurf Jet ski

6 6 Difficulties on Querying XML Data Creta Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem:  unknown structure Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem:  unknown structure theHotel.gr Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta  HDMS 2008

7 7 Difficulties on Querying XML Data Creta theHotel.gr  Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem:  multiple sources Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem:  multiple sources hotels.gr holidays.gr 1400 islands

8 8 Difficulties on Querying XML Data Creta theHotel.gr Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries?

9 9 Path Queries in XPath theHotel.gr City Island partial path queries theHotel.gr City Island theHotel.gr City Island //theHotel.gr [descendant-or-self::* [ancestor-or-self::City] [ancestor-or-self::Island]] /theHotel.gr/City//Island //theHotel.gr//City [descendant-or-self::* [ancestor-or-self::Island]] no structure (keywords) full structure (path patterns)

10 10 Partial Path Queries root node (optional) query node labelled by “a” child relationship descendant relationship r a a b r c d a c partial path query

11 11 Partial Path Queries a b r c d a c QUERY PROCESSING a b r c d a partial path query partial path query in canonical form QUERY EVALUATION

12 Evaluation of Partial Path Queries on XML Data   Partial path queries Query processing Query evaluation Experiments Conclusion

13 13 Query Processing a b r c d a c 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form

14 14 Query Processing a b r c d a c IR1 INFERENCE RULES (IR1) |- r//a i (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form

15 15 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b

16 16 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b

17 17 Query Processing a b r c d a c 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b

18 18 Query Processing a b r c d a c 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form yx A query is unsatisfiable if its full form contains a trivial cycle:

19 19 Query Processing c a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form y x y y z y y x y z y x y z y A node y is redundant if one of the following patterns occur: a) b) c) d)

20 20 Query Processing a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes The canonical form of a query is a directed acyclic graph (dag)

21 Evaluation of Partial Path Queries on XML Data    Partial path queries Query processing Query evaluation Experiments Conclusion

22 22 Evaluation Algorithms Based on PathStack [Bruno et al. ’02]  Produce all possible path queries…  Decompose into root-to-leaf paths…  PartialMJ: Decompose a spanning tree into paths… Extending PathStack [Bruno et al. ’02]  PartialPathStack: Produce a topological order of the query nodes and extend PathStack to handle it…

23 23 Based on PathStack d c e b r a g f 1. Producing all possible path queries…

24 24 Based on PathStack d c e b r a g f d c e b r a g f d c e b r a g f c e b r a d g f d c e b r a g f 1. Producing all possible path queries…

25 25 Based on PathStack d c e b r a g f c e b r a d g f d c e b r a g f d c e b r a g f d c b r a e g f 1. Producing all possible path queries…

26 26 Based on PathStack c e b r a d g f Problems:  too many queries to evaluate  multiple traversal of the XML tree 1. Producing all possible path queries…

27 27 b r a d g f r a c d e Based on PathStack 2. Decomposing into root-to-leaf paths… b r a d e r a c d g f

28 28 Based on PathStack 2. Decomposing into root-to-leaf paths… b r a d g f r a c d e b r a d e r a c d g f PathStack

29 29 b r a d g f r a c d e Based on PathStack 2. Decomposing into root-to-leaf paths… b r a d e r a c d g f Problems:  path overlaps  more than one components to evaluate  intermediate results

30 30 Based on PathStack PartialMJ. Using a spanning tree… Remove edges to create a spanning tree b r a d g f r a c b r a d e

31 31 Based on PathStack PartialMJ. Using a spanning tree… b r a d g f r a c b r a d e c e b r a d g f

32 32 Based on PathStack PartialMJ. Using a spanning tree… b r a d g f r a c b r a d e c e b r a d g f PathStack

33 33 Based on PathStack PartialMJ. Using a spanning tree… b r a d g f r a c b r a d e c e b r a d g f Join conditions (identity, structural, path)

34 34 Based on PathStack PartialMJ. Using a spanning tree… b r a d g f r a c b r a d e c e b r a d g f Join conditions (identity, structural, path)

35 35 Based on PathStack PartialMJ. Using a spanning tree… b r a d g f r a c b r a d e c e b r a d g f Join conditions (identity, structural, path)

36 36 Based on PathStack PartialMJ. Using a spanning tree… b r a d g f r a c b r a d e c e b r a d g f

37 37 Based on PathStack PartialMJ. Using a spanning tree… c e b r a d g f Problems:  path overlaps  more than one components to evaluate  intermediate results

38 38 Extending PathStack d c e b r a g f PartialPathStack. Employ a topological order… c e b r a d g f

39 39 Extending PathStack PartialPathStack. Employ a topological order… c e b r a d g f d c e b r a g f PartialPathStack

40 40 PartialPathStack Example querytree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r d b r a ce sink nodes results

41 41 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce r sink nodes results

42 42 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes results

43 43 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 b1b1 sink nodes results

44 44 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 b1b1 d1d1 sink nodes results

45 45 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 b1b1 d1d1 c1c1 sink nodes results

46 46 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 results OUTPUT!!!

47 47 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 results OUTPUT!!!

48 48 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 results OUTPUT!!!

49 49 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 results OUTPUT!!!

50 50 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 results OUTPUT!!!

51 51 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 OUTPUT!!! results ra 1 b 1 d 1 c 1 e 1

52 52 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 results ra 1 b 1 d 1 c 1 e 1 d2d2

53 53 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 c2c2 OUTPUT!!! results ra 1 b 1 d 1 c 1 e 1

54 54 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 c2c2 OUTPUT!!! results ra 1 b 1 d 1 c 1 e 1

55 55 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 c2c2 OUTPUT!!! results ra 1 b 1 d 1 c 1 e 1

56 56 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 c2c2 OUTPUT!!! results ra 1 b 1 d 1 c 1 e 1

57 57 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 c2c2 OUTPUT!!! results ra 1 b 1 d 1 c 1 e 1

58 58 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 c2c2 OUTPUT!!! results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1

59 59 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 c2c2 results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1

60 60 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1 e2e2 OUTPUT!!!

61 61 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1 e2e2 OUTPUT!!!

62 62 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1 e2e2 OUTPUT!!!

63 63 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1 e2e2 OUTPUT!!!

64 64 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1 e2e2 OUTPUT!!!

65 65 PartialPathStack Example tree SrSr SaSa SbSb SdSd ScSc SeSe d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r query d b r a ce ra1a1 sink nodes b1b1 d1d1 c1c1 e1e1 d2d2 results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1 ra 1 b 1 d 1 c 1 e 2 e2e2 OUTPUT!!!

66 66 PartialPathStack Example querytree d2d2 e1e1 c1c1 d1d1 c2c2 e2e2 d1d1 b1b1 a1a1 r d b r a ce results ra 1 b 1 d 1 c 1 e 1 ra 1 b 1 d 1 c 2 e 1 ra 1 b 1 d 1 c 1 e 2  only one component to evaluate  no intermediate results

67 67 Evaluation Algorithms Problems: Algorithm: Many queries / components to evaluate Path overlaps Intermediate results Produce all path queries…  Decompose into paths…  PartialMJ (spanning tree)  PartialPathStack

68 68 PartialPathStack vs PathStack PathStack Path queries Indegree = 1 Outdegree = 1 O(input + output) d c e b r a g f d c e b r a g f PartialPathStack Partial path queries Indegree > 1 Outdegree > 1 O(input*indegree + output*outdegree)

69 Evaluation of Partial Path Queries on XML Data     Partial path queries Query processing Query evaluation Experiments Conclusion

70 70 Queries Used in the Experiments d c e b r a f d c e b r a f d e r a f c b d e r a f c b Q1/Q5Q2/Q6Q3/Q7Q4/Q8

71 71 Experiment 1 Execution time on Treebank… 2.5 million nodes

72 72 Experiment 1 path queries Execution time on Treebank… 2.5 million nodes

73 73 Experiment 1 too many results Execution time on Treebank… 2.5 million nodes

74 74 Experiment 1 2.5 million nodes (IBM AlphaWorks XML generator) Execution time on Synthetic data…

75 75 Experiment 2 PartialMJ PartialPathStack PartialMJ PartialPathStack PartialMJ Q2 Q3 Q7 Execution time varying the size of the XML tree… (1 - 3 million nodes)

76 Evaluation of Partial Path Queries on XML Data      Partial path queries Query processing Query evaluation Experiments Conclusion

77 77 Conclusion EvaluationContainment Heuristics for Containment Partial Path Queries CIKM ’07SSDBM ’06CIKM ’06 Queries with repetitions ?SSDBM ’06CIKM ’06 Partial Tree Queries ?SSDBM ’06CIKM ’06

78 Questions?      Partial path queries Query processing Query evaluation Experiments Conclusion


Download ppt "Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas."

Similar presentations


Ads by Google