Presentation is loading. Please wait.

Presentation is loading. Please wait.

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ.

Similar presentations


Presentation on theme: "ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ."— Presentation transcript:

1 ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ

2 ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

3 3 Difficulties on Querying XML Data Creta Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta

4 4 Difficulties on Querying XML Data Creta Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem :  structural difference Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem :  structural difference Parthenon (438 BC) Phaistos’ Disk (1700 BC) Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta 

5 5 Difficulties on Querying XML Data Creta Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem:  structural inconsistency Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem:  structural inconsistency Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta   Windsurf Jet ski

6 6 Difficulties on Querying XML Data Creta Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem:  unknown structure Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem:  unknown structure Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta  HDMS 2008

7 7 Difficulties on Querying XML Data Creta theHotel.gr  Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem:  multiple sources Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem:  multiple sources hotels.gr holidays.gr 1400 islands

8 8 Difficulties on Querying XML Data Creta Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries?

9 9 Partial Queries in XPath 1. //Hotels[descendant-or-self::*[ancestor-or-self::City][ancestor-or-self::Athens]] 2. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]] 3. //Hotels[/City//Athens] 4. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]][//City [descendant-or-self::*[ancestor-or-self::Island]]] 5. //Hotels[/City//Athens][/City//Island] 0%100%structure Hotels City Athens 2 Hotels City Athens 3 1 Hotels City Athens Hotels City Island City Athens 5 Hotels City Island City Athens 4 Path queries Tree-pattern queries

10 10 Partial Queries root node (optional) query node labelled by “a” child relationship descendant relationship r a a b r c d a c

11 11 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath

12 ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

13 13 Query Processing a b r c d a c QUERY PROCESSING a b r c d a partial path query partial path query in canonical form QUERY EVALUATION

14 14 Query Processing a b r c d a c 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form

15 15 Query Processing a b r c d a c IR1 INFERENCE RULES (IR1) |- r//a i (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form

16 16 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b

17 17 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b

18 18 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b a b r c d a c IR6 IR8

19 19 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b a b r c d a c

20 20 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form yx A query is unsatisfiable if its full form contains a trivial cycle: a b r c d a c

21 21 Query Processing c a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form y x y y z y y x y z y A node y is redundant if one of the following patterns occur: a) b) c)

22 22 Query Processing a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes

23 23 Canonical Form partial tree-pattern query directed acyclic graph with same-path constraints partial path query directed acyclic graph with same-path constraint r d e b c d b r ce

24 24 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag

25 ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

26 26 Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PPJoin: Decompose into PPs PartialTreeStack: novel holistic r d e b c d b r ce

27 27 Partial Path Queries: PQGen Producing all possible path queries… d b r ce  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c

28 28 Partial Path Queries: PQGen Producing all possible path queries… d b r ce  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c

29 29 Partial Path Queries: PQGen Producing all possible path queries… d b r ce  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c

30 30 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce  b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )

31 31 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce  b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )

32 32 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce  b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )

33 33 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

34 34 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

35 35 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

36 36 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

37 37 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe Results: PathStack PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes leaf node r b dc e SrSr SbSb SdSd ScSc SeSe d b r ce

38 38 Partial Path Queries: PartialPathStack SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: leaf nodes tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 r r Results: PathStack r b dc e SrSr SbSb SdSd ScSc SeSe leaf node d b r ce

39 39 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe Results: PathStack PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes r r b1b1 r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 leaf node d b r ce

40 40 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 Results: PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 leaf node d b r ce

41 41 Results: PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 c1c1 leaf node d b r ce

42 42 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 e1e1 d b r ce

43 43 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 d b r ce

44 44 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 c2c2 c2c2 d b r ce

45 45 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1, ra 1 b 1 d 1 c 1 e 2 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 c2c2 e2e2 Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 1 e 2 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 c2c2 e2e2 d b r ce

46 46 Partial Path Queries: PartialPathStack tree PartialPathStack d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 PathStack r b dc e Optimal for path queries: O(input + output) Optimal for partial path queries: O(input*indegree+output*outdegree) [Bruno et al, 2002] [Souldatos et al, 2007] d b r ce

47 47 Partial Path Queries: Comparison Problems: Algorithm: Many queries to evaluate Path overlaps Intermediate results PQGen (path queries)  PathJoin (dec. to paths)  PartialMJ (spanning tree)  PartialPathStack

48 48 Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PartialPathJoin: Decompose into PPs PartialTreeStack: novel holistic r d e b c d b r ce

49 49 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce d r b c e 

50 50 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce  d r b c e

51 51 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce  d r b c e

52 52 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity )  r d e b c r d b c r d e

53 53 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity )  r d e b c r d b c r d e

54 54 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity )  r d e b c r d b c r d e

55 55 Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe

56 56 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 rr

57 57 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 rr b1b1 b1b1

58 58 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 d1d1 rr d1d1 rr b1b1 b1b1

59 59 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1

60 60 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1 e1e1 rb 1 d 1 e 1 e1e1 rd 1 e 1

61 61 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1 rb 1 d 1 e 1 e1e1 d2d2 d2d2 rd 1 e 1

62 62 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 e1e1 d2d2 d2d2 c2c2 c2c2 rd 1 e 1

63 63 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 e1e1 rd 1 e 1 rd 1 e 2 rd 2 e 2 d2d2 d2d2 c2c2 e2e2 e2e2

64 64 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rb 1 d 1 c 1 e 1, rb 1 d 1 c 1 e 2, rb 1 d 1 c 2 e 1, rb 1 d 1 c 2 e 2, rb 1 d 2 c 2 e 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 e1e1 rd 1 e 1 rd 1 e 2 rd 2 e 2 d2d2 d2d2 c2c2 e2e2

65 65 TwigStack b r d ce PartialTreeStack r d e b c Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 O(input + output) Optimal for tree-pattern queries O(input*|Q|*|PP|+output*N) Optimal for “small” partial tree-pattern queries |Q|=nodes+edges |PP|=No of PPs N=nodes

66 66 Partial Tree-Pattern Queries: Comparison Problems: Algorithm: Many queries to evaluate Path overlaps Intermediate results TPQGen (TPQs)  PartialPathJoin (dec. to PPs)  PartialTreeStack

67 67 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag We proposed algorithms for their evaluation

68 Partial queries Query processing Query evaluation Query containment Experiments Conclusion ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML 

69 69 Absolute Query Containment Q2 Q1  a c r b a r b c Q1  Q2 Each result of Q1 is a result of Q2. 

70 70 Absolute Query Containment Q2 Q1  a c r b a r b c Q1  Q2 Each result of Q1 is a result of Q2.  homomorphism from Q2 to the full form of Q1

71 71 Absolute Query Containment Q2 Q1  a c r b a r b c Q1  Q2 Each result of Q1 is a result of Q2.  homomorphism from Q2 to the full form of Q1

72 72 Absolute Query Containment Q2 Q1  a c r b a r b c => Checking absolute query containment is very fast (homomorphism) Q1  Q2 Each result of Q1 is a result of Q2.  homomorphism from Q2 to the full form of Q1

73 73 Relative Query Containment Some important stuff first: 1. Dimension graphs: summarize the structure of an XML tree: XML Tree Dimension graph

74 74 Relative Query Containment Some important stuff first: 2. Dimension trees: equivalent to a query in a specific dimension graph DT1.1 Dimension graph = + Q1

75 75 Relative Query Containment Some important stuff first: Q2 DT2.1 DT2.2 Dimension graph = + 2. Dimension trees: equivalent to a query in a specific dimension graph

76 76 Relative Query Containment Q1 Q2 Dimension graph GG Q1  G Q2 Each result of Q1 in G is a result of Q2 in G. 

77 77 Relative Query Containment Q1 Q2 Dimension graph GG Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

78 78 Relative Query Containment Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  GG DT2.1 DT2.2DT1.1 homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

79 79 Relative Query Containment GG DT2.1 DT2.2DT1.1 Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  => Checking relative query containment can be very slow (#dimension trees) homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

80 80 Heuristic for Relative Cont. Q1 Dimension graph Q2 GG 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2

81 81 Heuristic for Relative Cont. Q1 Dimension graph : Q2 GG 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2

82 82 Heuristic for Relative Cont. Q1 Q2 Dimension graph GG : 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2

83 83 Heuristic for Relative Cont. Q1 Q2 Dimension graph GG : 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2 OK

84 ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

85 85 Queries Used in the Experiments d c e b r a f d c e b r a f d e r a f c b d e r a f c b Q1/Q5Q2/Q6Q3/Q7Q4/Q8

86 86 Query Evaluation Execution time on Treebank… 2.5 million nodes

87 87 Query Evaluation path queries Execution time on Treebank… 2.5 million nodes

88 88 Query Evaluation too many results Execution time on Treebank… 2.5 million nodes

89 89 Query Evaluation 2.5 million nodes (IBM AlphaWorks XML generator) Execution time on Synthetic data…

90 90 Query Evaluation PartialMJ PartialPathStack PartialMJ PartialPathStack PartialMJ Q2 Q3 Q7 Execution time varying the size of the XML tree…

91 91 Query Containment Heuristic accuracy > 98% > 90% > 78% > 60% Time (sec) Number of Graph Paths Execution time varying the graph size… On-The-Fly Heuristic Relative Containment Precomputed Heuristics

92 92 Query Containment Time (sec) Number of Nodes per Query Path On-The-Fly Heuristic Relative Containment Precomputed Heuristics Heuristic accuracy > 98% > 79% > 39% > 32% Execution time varying the query size…

93 93 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques

94 ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

95 95 Conclusions Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques

96 96 Contribution Partial Path Queries Partial Tree-Pattern Queries Evaluation CIKM ’07 WWW ’08 EDBT ’09?? Containment SSDBM ’06 VLDB Journal ’08 Heuristics for Containment CIKM ’06 CIKM ’08

97 97 Publications QUERY EVALUATION  Stefanos Souldatos, Xiaoying Wu, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Evaluation of Partial Path Queries on XML Data. 16th CIKM Conference, Lisboa, Portugal, 2007.  Xiaoying Wu, Stefanos Souldatos, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Efficient Evaluation of Generalized Path Pattern Queries on XML Data. 17th WWW Conference, Beijing, China, 2008.

98 98 Publications QUERY CONTAINMENT  Dimitri Theodoratos, Theodore Dalamagas, Pawel Placek, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries. 18th SSDBM Conference, Vienna, Austria, 2006.  Dimitri Theodoratos, Pawel Placek, Theodore Dalamagas, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries in the Presence of Dimension Graphs. VLDB Journal, 2008.

99 99 Publications HEURISTICS FOR CONTAINMENT  Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Pawel Placek, Timos Sellis. Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. 15th CIKM Conference, Arlington, USA, 2006.  Pawel Placek, Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Heuristic Approaches for Checking Containment of Generalized Tree-Pattern Queries. 17th CIKM Conference, Napa Valley, California, USA, 2008.

100 100 Publications WEB SEARCH PERSONALIZATION  Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Learning in Web Search Workshop, 22nd ICML Conference, Bonn, Germany, 2005.  Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Captain Nemo: A Metasearch Engine with Personalized Hierarchical Search Space. Informatica Journal, 2006.  Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Internet Search Engines (book), ICFAI University (Institute of Chartered Financial Analysts of India). Reprint of the publication in Learning in Web Search Workshop, 2007.

101 Questions? Partial queries Query processing Query evaluation Query containment Experiments Conclusion


Download ppt "ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ."

Similar presentations


Ads by Google