ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ.

Slides:



Advertisements
Similar presentations
Ting Chen, Jiaheng Lu, Tok Wang Ling
Advertisements

Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Evaluating Reachability Queries over Path Collections* P. Bouros 1, S. Skiadopoulos 2, T. Dalamagas 3, D. Sacharidis 3, T. Sellis 1,3 1 National Technical.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Compressed Accessibility Map: Efficient Access Control for XML Ting Yu : University of Illinois Divesh Srivastava : AT&T Labs Laks V.S. Lakshmanan : University.
Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside.
PatManQL: A language to manipulate patterns and data in hierarchical catalogs Panagiotis Bouros, Theodore Dalamagas, Timos Sellis, Manolis Terrovitis Knowledge.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
G. Gottlob, C. Koch & R. Pichler TU Wien, Vienna, Austria Elias Politarhos Advanced Databases M.Sc. in Information Systems Athens University of Economics.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)
CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.
Query Processing Presented by Aung S. Win.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
1 Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching Ting Chen, Tok Wang Ling, Chee-Yong Chan School of Computing, National University.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
Constructing evolutionary trees from rooted triples Bang Ye Wu Dept. of Computer Science and Information Engineering Shu-Te University.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
The Colorful Traveling Salesman Problem Yupei Xiong, Goldman, Sachs & Co. Bruce Golden, University of Maryland Edward Wasil, American University Presented.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
Containment of Partially Specified Tree-Pattern Queries
Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Trie Indexes for Efficient XML Query Processing
Efficient processing of path query with not-predicates on XML data
Semi-Structured Data and Agile Application Development
RE-Tree: An Efficient Index Structure for Regular Expressions
Spatio-temporal Pattern Queries
On Efficient Graph Substructure Selection
Design of Declarative Graph Query Languages: On the Choice between Value, Pattern and Object based Representations for Graphs Hasan Jamil Department of.
Lu Xing CS59000GDM Sept 7th, 2018.
Structure and Content Scoring for XML
2/18/2019.
MCN: A New Semantics Towards Effective XML Keyword Search
Structure and Content Scoring for XML
A Framework for Testing Query Transformation Rules
Presentation transcript:

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

3 Difficulties on Querying XML Data Creta Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta

4 Difficulties on Querying XML Data Creta Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem :  structural difference Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Problem :  structural difference Parthenon (438 BC) Phaistos’ Disk (1700 BC) Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta 

5 Difficulties on Querying XML Data Creta Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem:  structural inconsistency Search problem Name : Theodore Dalamagas Place: Islands Purpose: Sea sports Problem:  structural inconsistency Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta   Windsurf Jet ski

6 Difficulties on Querying XML Data Creta Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem:  unknown structure Search problem Name : Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem:  unknown structure Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta  HDMS 2008

7 Difficulties on Querying XML Data Creta theHotel.gr  Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem:  multiple sources Search problem Name : Stefanos Souldatos Place: Any island Purpose: Escape from PhD! Problem:  multiple sources hotels.gr holidays.gr 1400 islands

8 Difficulties on Querying XML Data Creta Hotels Creta City Chania Island Athens Island Location Poros City Heraklio Center AthensCreta Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries?

9 Partial Queries in XPath 1. //Hotels[descendant-or-self::*[ancestor-or-self::City][ancestor-or-self::Athens]] 2. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]] 3. //Hotels[/City//Athens] 4. //Hotels[/City[descendant-or-self::*[ancestor-or-self::Athens]]][//City [descendant-or-self::*[ancestor-or-self::Island]]] 5. //Hotels[/City//Athens][/City//Island] 0%100%structure Hotels City Athens 2 Hotels City Athens 3 1 Hotels City Athens Hotels City Island City Athens 5 Hotels City Island City Athens 4 Path queries Tree-pattern queries

10 Partial Queries root node (optional) query node labelled by “a” child relationship descendant relationship r a a b r c d a c

11 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

13 Query Processing a b r c d a c QUERY PROCESSING a b r c d a partial path query partial path query in canonical form QUERY EVALUATION

14 Query Processing a b r c d a c 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form

15 Query Processing a b r c d a c IR1 INFERENCE RULES (IR1) |- r//a i (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form

16 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b

17 Query Processing a b r c d a c IR4 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b

18 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b a b r c d a c IR6 IR8

19 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form INFERENCE RULES (IR1) |- r//ai (IR2) x/y |- x//y (IR3) x//y, y//z |- x//z (IR4) x/ai, x//bj |- ai//bj (IR5) ai/x, bj//x |- bj//ai (IR6) x/y, y/w, x//z, z//w |- x/z (IR7) x/y, x//z, w/z, w//y |- x/z (IR8) x/y, y/w, x/z |- z/w (IR9) x//y, y//w, x/z |- z//w (IR10) x/y, w/y, w/z |- x/z (IR11) x//y, w/y, w//z |- x//z (IR12) x/y, y/w, z/w |- x/z (IR13) x//y, y//w, z/w |- x//z x,y,z,w: query nodes ai/bj: nodes labelled by a/b a b r c d a c

20 Query Processing 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form yx A query is unsatisfiable if its full form contains a trivial cycle: a b r c d a c

21 Query Processing c a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form y x y y z y y x y z y A node y is redundant if one of the following patterns occur: a) b) c)

22 Query Processing a b r c d a 1.Full form 2.Satisfiability 3.Redundant nodes 4.Canonical form canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes canonical form of satisfiable query = full form – IR2 – IR3 – redundant nodes

23 Canonical Form partial tree-pattern query directed acyclic graph with same-path constraints partial path query directed acyclic graph with same-path constraint r d e b c d b r ce

24 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

26 Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PPJoin: Decompose into PPs PartialTreeStack: novel holistic r d e b c d b r ce

27 Partial Path Queries: PQGen Producing all possible path queries… d b r ce  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c

28 Partial Path Queries: PQGen Producing all possible path queries… d b r ce  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c

29 Partial Path Queries: PQGen Producing all possible path queries… d b r ce  1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results b r d c e b r d e c d r b c e d r b e c d r e b c

30 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce  b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )

31 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce  b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )

32 Partial Path Queries: PathJoin Decomposing into root-to-leaf paths… d b r ce  b r c d r c d r e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity, path )

33 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

34 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

35 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

36 Partial Path Queries: PartialMJ Using a spanning tree… d b r ce  b r c d r e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of PathStack 4. Join conditions (identity, structural, path ) d b r ce 

37 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe Results: PathStack PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes leaf node r b dc e SrSr SbSb SdSd ScSc SeSe d b r ce

38 Partial Path Queries: PartialPathStack SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: leaf nodes tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 r r Results: PathStack r b dc e SrSr SbSb SdSd ScSc SeSe leaf node d b r ce

39 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe Results: PathStack PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes r r b1b1 r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 leaf node d b r ce

40 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 Results: PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 leaf node d b r ce

41 Results: PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 c1c1 leaf node d b r ce

42 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 e1e1 d b r ce

43 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 d b r ce

44 Results: ra 1 b 1 d 1 c 1 e 1 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 c2c2 c2c2 d b r ce

45 Partial Path Queries: PartialPathStack tree SrSr SbSb SdSd ScSc SeSe PartialPathStack Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1, ra 1 b 1 d 1 c 1 e 2 d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 leaf nodes rb1b1 d1d1 c1c1 e1e1 d2d2 c2c2 e2e2 Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 1 e 2 PathStack r r b dc e SrSr SbSb SdSd ScSc SeSe b1b1 d1d1 c1c1 leaf node d2d2 c2c2 e2e2 d b r ce

46 Partial Path Queries: PartialPathStack tree PartialPathStack d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 PathStack r b dc e Optimal for path queries: O(input + output) Optimal for partial path queries: O(input*indegree+output*outdegree) [Bruno et al, 2002] [Souldatos et al, 2007] d b r ce

47 Partial Path Queries: Comparison Problems: Algorithm: Many queries to evaluate Path overlaps Intermediate results PQGen (path queries)  PathJoin (dec. to paths)  PartialMJ (spanning tree)  PartialPathStack

48 Evaluation Algorithms Partial Path Queries PQGen: Produce path queries PathJoin: Decompose into paths PartialMJ: Dec. into spanning tree paths PartialPathStack: novel holistic Partial Tree-Pattern Queries TPQGen: Produce TPQs PartialPathJoin: Decompose into PPs PartialTreeStack: novel holistic r d e b c d b r ce

49 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce d r b c e 

50 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce  d r b c e

51 Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results r d e b c b r d ce  d r b c e

52 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity )  r d e b c r d b c r d e

53 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity )  r d e b c r d b c r d e

54 Partial Tree-Pattern Queries: PartialPathJoin Decomposing into partial paths… 1. Decompose into partial paths 2. Evaluate partial paths using PartialPathStack 3. Join conditions (identity )  r d e b c r d b c r d e

55 Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe

56 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 rr

57 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 rr b1b1 b1b1

58 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 d1d1 rr d1d1 rr b1b1 b1b1

59 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1

60 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1 e1e1 rb 1 d 1 e 1 e1e1 rd 1 e 1

61 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rd 1 b 1 c 1 rb 1 d 1 e 1 e1e1 d2d2 d2d2 rd 1 e 1

62 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 e1e1 d2d2 d2d2 c2c2 c2c2 rd 1 e 1

63 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 e1e1 rd 1 e 1 rd 1 e 2 rd 2 e 2 d2d2 d2d2 c2c2 e2e2 e2e2

64 TwigStack b r d ce PartialTreeStack SrSr SbSb SdSd ScSc SeSe r d e b c SrSr SbSb SdSd ScSc SeSe Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 c1c1 d1d1 rr d1d1 rr b1b1 b1b1 rb 1 d 1 c 1 rb 1 d 1 c 2 rb 1 d 2 c 2 rb 1 d 1 c 1 e 1, rb 1 d 1 c 1 e 2, rb 1 d 1 c 2 e 1, rb 1 d 1 c 2 e 2, rb 1 d 2 c 2 e 2 rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 e1e1 rd 1 e 1 rd 1 e 2 rd 2 e 2 d2d2 d2d2 c2c2 e2e2

65 TwigStack b r d ce PartialTreeStack r d e b c Partial Tree-Pattern Queries: PartialTreeStack tree d2d2 e1e1 c1c1 c2c2 e2e2 r d1d1 b1b1 O(input + output) Optimal for tree-pattern queries O(input*|Q|*|PP|+output*N) Optimal for “small” partial tree-pattern queries |Q|=nodes+edges |PP|=No of PPs N=nodes

66 Partial Tree-Pattern Queries: Comparison Problems: Algorithm: Many queries to evaluate Path overlaps Intermediate results TPQGen (TPQs)  PartialPathJoin (dec. to PPs)  PartialTreeStack

67 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag We proposed algorithms for their evaluation

Partial queries Query processing Query evaluation Query containment Experiments Conclusion ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML 

69 Absolute Query Containment Q2 Q1  a c r b a r b c Q1  Q2 Each result of Q1 is a result of Q2. 

70 Absolute Query Containment Q2 Q1  a c r b a r b c Q1  Q2 Each result of Q1 is a result of Q2.  homomorphism from Q2 to the full form of Q1

71 Absolute Query Containment Q2 Q1  a c r b a r b c Q1  Q2 Each result of Q1 is a result of Q2.  homomorphism from Q2 to the full form of Q1

72 Absolute Query Containment Q2 Q1  a c r b a r b c => Checking absolute query containment is very fast (homomorphism) Q1  Q2 Each result of Q1 is a result of Q2.  homomorphism from Q2 to the full form of Q1

73 Relative Query Containment Some important stuff first: 1. Dimension graphs: summarize the structure of an XML tree: XML Tree Dimension graph

74 Relative Query Containment Some important stuff first: 2. Dimension trees: equivalent to a query in a specific dimension graph DT1.1 Dimension graph = + Q1

75 Relative Query Containment Some important stuff first: Q2 DT2.1 DT2.2 Dimension graph = + 2. Dimension trees: equivalent to a query in a specific dimension graph

76 Relative Query Containment Q1 Q2 Dimension graph GG Q1  G Q2 Each result of Q1 in G is a result of Q2 in G. 

77 Relative Query Containment Q1 Q2 Dimension graph GG Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

78 Relative Query Containment Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  GG DT2.1 DT2.2DT1.1 homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

79 Relative Query Containment GG DT2.1 DT2.2DT1.1 Q1  G Q2 Each result of Q1 in G is a result of Q2 in G.  => Checking relative query containment can be very slow (#dimension trees) homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1

80 Heuristic for Relative Cont. Q1 Dimension graph Q2 GG 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2

81 Heuristic for Relative Cont. Q1 Dimension graph : Q2 GG 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2

82 Heuristic for Relative Cont. Q1 Q2 Dimension graph GG : 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2

83 Heuristic for Relative Cont. Q1 Q2 Dimension graph GG : 1. Extract info from the dimension graph 2. Add it to Q1 3. Check Q1  Q2 OK

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

85 Queries Used in the Experiments d c e b r a f d c e b r a f d e r a f c b d e r a f c b Q1/Q5Q2/Q6Q3/Q7Q4/Q8

86 Query Evaluation Execution time on Treebank… 2.5 million nodes

87 Query Evaluation path queries Execution time on Treebank… 2.5 million nodes

88 Query Evaluation too many results Execution time on Treebank… 2.5 million nodes

89 Query Evaluation 2.5 million nodes (IBM AlphaWorks XML generator) Execution time on Synthetic data…

90 Query Evaluation PartialMJ PartialPathStack PartialMJ PartialPathStack PartialMJ Q2 Q3 Q7 Execution time varying the size of the XML tree…

91 Query Containment Heuristic accuracy > 98% > 90% > 78% > 60% Time (sec) Number of Graph Paths Execution time varying the graph size… On-The-Fly Heuristic Relative Containment Precomputed Heuristics

92 Query Containment Time (sec) Number of Nodes per Query Path On-The-Fly Heuristic Relative Containment Precomputed Heuristics Heuristic accuracy > 98% > 79% > 39% > 32% Execution time varying the query size…

93 Conclusions (up to now) Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion 

95 Conclusions Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query  dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques

96 Contribution Partial Path Queries Partial Tree-Pattern Queries Evaluation CIKM ’07 WWW ’08 EDBT ’09?? Containment SSDBM ’06 VLDB Journal ’08 Heuristics for Containment CIKM ’06 CIKM ’08

97 Publications QUERY EVALUATION  Stefanos Souldatos, Xiaoying Wu, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Evaluation of Partial Path Queries on XML Data. 16th CIKM Conference, Lisboa, Portugal,  Xiaoying Wu, Stefanos Souldatos, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Efficient Evaluation of Generalized Path Pattern Queries on XML Data. 17th WWW Conference, Beijing, China, 2008.

98 Publications QUERY CONTAINMENT  Dimitri Theodoratos, Theodore Dalamagas, Pawel Placek, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries. 18th SSDBM Conference, Vienna, Austria,  Dimitri Theodoratos, Pawel Placek, Theodore Dalamagas, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries in the Presence of Dimension Graphs. VLDB Journal, 2008.

99 Publications HEURISTICS FOR CONTAINMENT  Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Pawel Placek, Timos Sellis. Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. 15th CIKM Conference, Arlington, USA,  Pawel Placek, Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Heuristic Approaches for Checking Containment of Generalized Tree-Pattern Queries. 17th CIKM Conference, Napa Valley, California, USA, 2008.

100 Publications WEB SEARCH PERSONALIZATION  Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Learning in Web Search Workshop, 22nd ICML Conference, Bonn, Germany,  Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Captain Nemo: A Metasearch Engine with Personalized Hierarchical Search Space. Informatica Journal,  Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Internet Search Engines (book), ICFAI University (Institute of Chartered Financial Analysts of India). Reprint of the publication in Learning in Web Search Workshop, 2007.

Questions? Partial queries Query processing Query evaluation Query containment Experiments Conclusion