ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
3 Difficulties on Querying XML Data Creta Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta
4 Difficulties on Querying XML Data Creta Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem : structural difference Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem : structural difference Parthenon (438 BC) Phaistos’ Disk (1700 BC) Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta
5 Difficulties on Querying XML Data Creta Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem: structural inconsistency Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem: structural inconsistency Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta Windsurf Jet ski
6 Difficulties on Querying XML Data Creta Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem: unknown structure Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem: unknown structure Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta HDMS 2008
7 Difficulties on Querying XML Data Creta theHotel.gr Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem: multiple sources Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem: multiple sources hotels.gr holidays.gr 1400 islands
8 Difficulties on Querying XML Data Creta Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries?
9 Partial Queries in XPath 1. //Hotels[descendant-or-self::*[ancestor-or-self::City][ancestor-or-self::Athens]] 2. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]] 3. //Hotels[/City//Athens] 4. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]][//City [descendant-or-self::*[ancestor-or-self::Island]]] 5. //Hotels[/City//Athens][/City//Island] 0%100%structure Hotels City Athens 2 Hotels City Athens 3 1 Hotels City Athens Hotels City Island City Athens 5 Hotels City Island City Athens 4 Path queries Tree-pattern queries
10 Partial Queries root node (optional) query node labelled by “a” child relationship descendant relationship r a a b r c d a c
11 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
13 Query Processing a b r c d a c QUERY PROCESSING a b r c d a partial path query partial path query in canonical form QUERY EVALUATION
14 Query Processing a b r c d a c 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form
15 Query Processing a b r c d a c IR1 INFERENCE RULES (IR1) |- r//a i (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form
16 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b
17 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b
18 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b a b r c d a c IR6 IR8
19 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b a b r c d a c
20 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form yx A query is unsatisfiable if its full form contains a trivial cycle: a b r c d a c
21 Query Processing c a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form y x y y z y y x y z y A node y is redundant if one of the following patterns occur: a) b) c)
22 Query Processing a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes
23 Canonical Form partial tree-pattern query directed acyclic graph with same-path constraints partial path query directed acyclic graph with same-path constraint r d e b c d b r ce
24 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
26 Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PPJoin: Decompose into PPs PartialTreeStack: novel holistic r d e b c d b r ce
27 Partial Path Queries: PQGen Producing all possible path queries… d b r ce 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c
28 Partial Path Queries: PQGen Producing all possible path queries… d b r ce 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c
29 Partial Path Queries: PQGen Producing all possible path queries… d b r ce 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c
30 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )
31 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )
32 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )
33 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce
34 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce
35 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce
36 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce
37 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe Results: PathStack PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes leaf node r b dc e SrSr SbSb SdSd ScSc SeSe d b r ce
38 Partial Path Queries: PartialPathStack SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: leaf nodes tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 r r Results: PathStack r b dc e SrSr SbSb SdSd ScSc SeSe leaf node d b r ce
39 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe Results: PathStack PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes r r b1b1 r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 leaf node d b r ce
40 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 Results: PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 leaf node d b r ce
41 Results: PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 c1c1 leaf node d b r ce
42 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 e1e1 d b r ce
43 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 d b r ce
44 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 c2c2 c2c2 d b r ce
45 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1, ra 1 b 1 d 1 c 1 e 2 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 c2c2 e2e2 Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 1 e 2 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 c2c2 e2e2 d b r ce
46 Partial Path Queries: PartialPathStack tree PartialPathStack d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 PathStack r b dc e Optimal for path queries: O(input + output) Optimal for partial path queries: O(input*indegree+output*outdegree) [Bruno et al, 2002] [Souldatos et al, 2007] d b r ce
47 Partial Path Queries: Comparison Problems: Algorithm: Many queries to evaluate Path overlaps Intermediate results PQGen (path queries) PathJoin (dec. to paths) PartialMJ (spanning tree) PartialPathStack
48 Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PartialPathJoin: Decompose into PPs PartialTreeStack: novel holistic r d e b c d b r ce
49 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce d r b c e
50 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce d r b c e
51 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce d r b c e
52 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity ) r d e b c r d b c r d e
53 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity ) r d e b c r d b c r d e
54 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity ) r d e b c r d b c r d e
55 Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe
56 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 rr
57 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 rr b1b1 b1b1
58 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 d1d1 rr d1d1 rr b1b1 b1b1
59 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1
60 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1 e1e1 rb 1 d 1 e 1 e1e1 rd 1 e 1
61 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1 rb 1 d 1 e 1 e1e1 d2d2 d2d2 rd 1 e 1
62 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 e1e1 d2d2 d2d2 c2c2 c2c2 rd 1 e 1
63 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 e1e1 rd 1 e 1 rd 1 e 2 rd 2 e 2 d2d2 d2d2 c2c2 e2e2 e2e2
64 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rb 1 d 1 c 1 e 1, rb 1 d 1 c 1 e 2, rb 1 d 1 c 2 e 1, rb 1 d 1 c 2 e 2, rb 1 d 2 c 2 e 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 e1e1 rd 1 e 1 rd 1 e 2 rd 2 e 2 d2d2 d2d2 c2c2 e2e2
65 TwigStack b r d ce PartialTreeStack r d e b c Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 O(input + output) Optimal for tree-pattern queries O(input*|Q|*|PP|+output*N) Optimal for “small” partial tree-pattern queries |Q|=nodes+edges |PP|=No of PPs N=nodes
66 Partial Tree-Pattern Queries: Comparison Problems: Algorithm: Many queries to evaluate Path overlaps Intermediate results TPQGen (TPQs) PartialPathJoin (dec. to PPs) PartialTreeStack
67 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag We proposed algorithms for their evaluation
Partial queries Query processing Query evaluation Query containment Experiments Conclusion ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML
69 Absolute Query Containment Q2 Q1 a c r b a r b c Q1 Q2 Each result of Q1 is a result of Q2.
70 Absolute Query Containment Q2 Q1 a c r b a r b c Q1 Q2 Each result of Q1 is a result of Q2. homomorphism from Q2 to the full form of Q1
71 Absolute Query Containment Q2 Q1 a c r b a r b c Q1 Q2 Each result of Q1 is a result of Q2. homomorphism from Q2 to the full form of Q1
72 Absolute Query Containment Q2 Q1 a c r b a r b c => Checking absolute query containment is very fast (homomorphism) Q1 Q2 Each result of Q1 is a result of Q2. homomorphism from Q2 to the full form of Q1
73 Relative Query Containment Some important stuff first: 1. Dimension graphs: summarize the structure of an XML tree: XML Tree Dimension graph
74 Relative Query Containment Some important stuff first: 2. Dimension trees: equivalent to a query in a specific dimension graph DT1.1 Dimension graph = + Q1
75 Relative Query Containment Some important stuff first: Q2 DT2.1 DT2.2 Dimension graph = + 2. Dimension trees: equivalent to a query in a specific dimension graph
76 Relative Query Containment Q1 Q2 Dimension graph GG Q1 G Q2 Each result of Q1 in G is a result of Q2 in G.
77 Relative Query Containment Q1 Q2 Dimension graph GG Q1 G Q2 Each result of Q1 in G is a result of Q2 in G. homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1
78 Relative Query Containment Q1 G Q2 Each result of Q1 in G is a result of Q2 in G. GG DT2.1 DT2.2DT1.1 homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1
79 Relative Query Containment GG DT2.1 DT2.2DT1.1 Q1 G Q2 Each result of Q1 in G is a result of Q2 in G. => Checking relative query containment can be very slow (#dimension trees) homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1
80 Heuristic for Relative Cont. Q1 Dimension graph Q2 GG 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1 Q2
81 Heuristic for Relative Cont. Q1 Dimension graph : Q2 GG 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1 Q2
82 Heuristic for Relative Cont. Q1 Q2 Dimension graph GG : 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1 Q2
83 Heuristic for Relative Cont. Q1 Q2 Dimension graph GG : 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1 Q2 OK
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
85 Queries Used in the Experiments d c e b r a f d c e b r a f d e r a f c b d e r a f c b Q1/Q5Q2/Q6Q3/Q7Q4/Q8
86 Query Evaluation Execution time on Treebank… 2.5 million nodes
87 Query Evaluation path queries Execution time on Treebank… 2.5 million nodes
88 Query Evaluation too many results Execution time on Treebank… 2.5 million nodes
89 Query Evaluation 2.5 million nodes (IBM AlphaWorks XML generator) Execution time on Synthetic data…
90 Query Evaluation PartialMJ PartialPathStack PartialMJ PartialPathStack PartialMJ Q2 Q3 Q7 Execution time varying the size of the XML tree…
91 Query Containment Heuristic accuracy > 98% > 90% > 78% > 60% Time (sec) Number of Graph Paths Execution time varying the graph size… On-The-Fly Heuristic Relative Containment Precomputed Heuristics
92 Query Containment Time (sec) Number of Nodes per Query Path On-The-Fly Heuristic Relative Containment Precomputed Heuristics Heuristic accuracy > 98% > 79% > 39% > 32% Execution time varying the query size…
93 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion
95 Conclusions Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques
96 Contribution Partial Path Queries Partial Tree-Pattern Queries Evaluation CIKM ’07 WWW ’08 EDBT ’09?? Containment SSDBM ’06 VLDB Journal ’08 Heuristics for Containment CIKM ’06 CIKM ’08
97 Publications QUERY EVALUATION Stefanos Souldatos, Xiaoying Wu, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Evaluation of Partial Path Queries on XML Data. 16th CIKM Conference, Lisboa, Portugal, Xiaoying Wu, Stefanos Souldatos, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Efficient Evaluation of Generalized Path Pattern Queries on XML Data. 17th WWW Conference, Beijing, China, 2008.
98 Publications QUERY CONTAINMENT Dimitri Theodoratos, Theodore Dalamagas, Pawel Placek, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries. 18th SSDBM Conference, Vienna, Austria, Dimitri Theodoratos, Pawel Placek, Theodore Dalamagas, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries in the Presence of Dimension Graphs. VLDB Journal, 2008.
99 Publications HEURISTICS FOR CONTAINMENT Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Pawel Placek, Timos Sellis. Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. 15th CIKM Conference, Arlington, USA, Pawel Placek, Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Heuristic Approaches for Checking Containment of Generalized Tree-Pattern Queries. 17th CIKM Conference, Napa Valley, California, USA, 2008.
100 Publications WEB SEARCH PERSONALIZATION Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Learning in Web Search Workshop, 22nd ICML Conference, Bonn, Germany, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Captain Nemo: A Metasearch Engine with Personalized Hierarchical Search Space. Informatica Journal, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Internet Search Engines (book), ICFAI University (Institute of Chartered Financial Analysts of India). Reprint of the publication in Learning in Web Search Workshop, 2007.
Questions? Partial queries Query processing Query evaluation Query containment Experiments Conclusion