Containment of Partially Specified Tree-Pattern Queries

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
XSEarch XML Search Engine Jonathan MAMOU October 2002.
Michael Armbrust A Functional Query Optimization Framework.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Approximate XML Query Answers Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas)
Evaluating Reachability Queries over Path Collections P. Bouros 1, S. Skiadopoulos 2, T. Dalamagas 3, D. Sacharidis 3, T. Sellis 1,3 1 National Technical.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.
Presented by Zeehasham Rasheed
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
SD2520 Databases using XML and JQuery
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML ΣΤΕΦΑΝΟΣ ΣΟΥΛΔΑΤΟΣ.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Querying Structured Text in an XML Database By Xuemei Luo.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
XML and Database.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
QUERY PROCESSING RELATIONAL DATABASE KUSUMA AYU LAKSITOWENING
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Hierarchical Search in SemantEco Support Varied Ontology Design Patterns Session: "Semantics for Biodiversity: Interoperability with genomic and ecological.
Ferdowsi University of Mashhad 1 Automatic Semantic Web Service Composition based on owl-s Research Proposal presented by : Toktam ghafarian.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
1 Ontology Evolution within Ontology Editors Presentation at EKAW, Sigüenza, October 2002 L. Stojanovic, B. Motik FZI Research Center for Information Technologies.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Probabilistic Data Management
2/18/2019.
Two – One Problem Legal Moves: Slide Rules: 1s’ move right Hop
Two – One Problem Legal Moves: Slide Rules: 1s’ move right Hop
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Deniz Beser A Fundamental Tradeoff in Knowledge Representation and Reasoning Hector J. Levesque and Ronald J. Brachman.
Presentation transcript:

Containment of Partially Specified Tree-Pattern Queries Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas (NTUA, GREECE) Pawel Placek (NJIT, USA) Stefanos Souldatos (NTUA, GREECE) Timos Sellis (NTUA, GREECE)

Introduction

Stefanos Souldatos - SSDBM 2006 Motivating example Consider a database including shops that sell spare parts for motorbikes. The authors of the paper need spare parts for their motorbikes. BUT… r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ Stefanos Souldatos - SSDBM 2006

1: Structural Differences Dimitri Theodoratos lives in NY. He owns a Yamaha Serrow motorbike in Greece. So, he searches for spare parts in Greece or USA. A structural difference emerges… r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ ? Stefanos Souldatos - SSDBM 2006

2: Structural Inconsistencies Theodore Dalamagas owns a BMW motorbike and looks for spare parts worldwide. A structural inconsistency emerges… r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ 650cc/F650GS vs F650GS/650cc Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 3: Unknown Structure Stefanos Souldatos owns a Honda Varadero. But, he is not fully aware of the hierarchy... r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 4: Source Integration Pawel Placek wants to buy a motorbike that he can easily find spare parts for. He searches in many hierarchies with different structure… r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Motivation Querying tree-structured data requires exact knowledge of the structure. BUT, structure is not always strictly defined! MOREOVER, the user needs to set queries without always dealing with structure: Find shops for Honda spare parts in Athens, Greece …where Athens is a descendant of Greece …but I don’t know if Athens is an ancestor or a descendant of Honda. Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Previous Approaches Keyword-based search approach Total absence of structure. Approximation techniques Relax the initial query and retrieve more answers. Naive approach All possible query patterns are generated and evaluated against the tree structure. Traditional integration approach Global structure and mapping rules. Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Our Approach We have designed a Query Language that allows for partial specification of the structure (Partially Specified Tree-Pattern Queries). Query formulation and evaluation is based on Dimension Graphs that summarize the structure of the trees. Dimension graphs group sets of semantically related nodes called Dimensions. We study the problem of Query Containment for Partially Specified Tree-Pattern Queries. Stefanos Souldatos - SSDBM 2006

Data Model

Stefanos Souldatos - SSDBM 2006 Dimension Graphs R C B T M L E Dimension graphs summarize the structure of the tree. r ATHENS HONDA GREECE USA YAMAHA BMW TRAVEL VARADERO 125cc 1000cc ON-OFF 200cc SERROW 650cc F650 F650GS NJ DIMENSIONS R (oot) C (ountry) = {GREECE, USA} L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Dimension Graphs… R C B T M L E Offer a summary of the structure of the tree. Provide the necessary semantics for query formulation. Set the framework for querying sources with structural differences and inconsistencies. Support query evaluation and optimization. DIMENSIONS R (oot) C (ountry) L (ocation) B (rand) T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

Partially Specified Tree-pattern Queries B T M L E C = {Greece} B = {BMW} M = ? E = ? DIMENSIONS R (oot) C (ountry) L (ocation) B (rand) Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

Partially Specified Tree-pattern Queries B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 DIMENSIONS R (oot) C (ountry) partially specified paths (PSP) L (ocation) B (rand) Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

Partially-Specified Tree-pattern Queries B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 DIMENSIONS R (oot) C (ountry) partially specified paths (PSP) output path (*) L (ocation) B (rand) Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

Partially-Specified Tree-pattern Queries B T M L E parent child ancestor descendant C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 DIMENSIONS R (oot) C (ountry) partially specified paths (PSP) output path (*) L (ocation) B (rand) Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

Partially-Specified Tree-pattern Queries B T M L E node sharing expression (NSE) parent child ancestor descendant C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 DIMENSIONS R (oot) C (ountry) partially specified paths (PSP) output path (*) L (ocation) B (rand) Query: Find shops with spare parts for all models and engines of BMW motorbikes in Greece. T (ype) M (odel) E (ngine) Stefanos Souldatos - SSDBM 2006

Dimension Trees and Query Evaluation

Stefanos Souldatos - SSDBM 2006 Dimension Trees R C B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 We apply Inference Rules to the query to produce its Full Form. We search for paths in the graph that match the paths of the query. Paths in the graph correspond to Dimension Trees. Dimension trees are translated into XPath queries. Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Dimension Trees R C B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 We apply Inference Rules to the query to produce its Full Form. We search for paths in the graph that match the paths of the query. Paths in the graph correspond to Dimension Trees. Dimension trees are translated into XPath queries. Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Dimension Trees R C B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 R C = {Greece} B = {BMW} T M E We apply Inference Rules to the query to produce its Full Form. We search for paths in the graph that match the paths of the query. Paths in the graph correspond to Dimension Trees. Dimension trees are translated into XPath queries. Stefanos Souldatos - SSDBM 2006

r/Greece/BMW/*T[*E]/*M Dimension Trees R C B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 R C = {Greece} B = {BMW} T M E We apply Inference Rules to the query to produce its Full Form. We search for paths in the graph that match the paths of the query. Paths in the graph correspond to Dimension Trees. Dimension trees are translated into XPath queries. r/Greece/BMW/*T[*E]/*M Stefanos Souldatos - SSDBM 2006

r/Greece/BMW/*T[*E]/*M Dimension Trees R C B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 R C = {Greece} B = {BMW} T M E So, Dimension Trees are equivalent to the Query in a specific Dimension Graph: “Dimension Trees = Query + Graph” r/Greece/BMW/*T[*E]/*M Stefanos Souldatos - SSDBM 2006

Query Containment

Stefanos Souldatos - SSDBM 2006 Query Containment Absolute Containment Query Q1 is absolutely contained in query Q2 if the answers of Q1 in every dimension graph are also answers of Q2. Relative Containment (with respect to graph) Query Q1 is contained in query Q2 with respect to graph G if the answers of Q1 in G are also answers of Q2 in G. Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Query Containment Absolute Containment Query Q1 is absolutely contained in query Q2 if the answers of Q1 in every dimension graph are also answers of Q2. Relative Containment (with respect to graph) Query Q1 is contained in query Q2 with respect to graph G if the answers of Q1 in G are also answers of Q2 in G. Stefanos Souldatos - SSDBM 2006

Checking Absolute Containment Query Q1 is absolutely contained in query Q2 if there is a mapping from Q2 to Q1.  Q1 Q2 C = ? B = ? M = ? E = ? PSP p2 PSP *p1 C = ? B = ? E = ? PSP p4 PSP *p3 Stefanos Souldatos - SSDBM 2006

Checking Relative Containment Query Q1 is contained in query Q2 with respect to graph G if every dimension tree of Q1 is contained in a dimension tree of Q2.  A dimension tree of Q1 A dimension tree of Q2 R R C C B B T T E M E Stefanos Souldatos - SSDBM 2006

Absolute & Relative Containment Checking Absolute Containment: 1msec Checking Relative Containment: 100msec – 1sec We suggest a technique that improves the performance of Relative Containment check. (Relative Containment 2 – RC2) It is a sound but not complete technique However, it maintains high accuracy for containment check. Stefanos Souldatos - SSDBM 2006

Relative Containment 2 (RC2) For each possible parent-child or ancestor-descendant relation in the query, we extract relations that hold in all graph paths that include it: R=>C : RC LB : RC, CL We add these relations in Q1 and check absolutely whether Q1’ Q2. R C B T M L E Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Example Q1 is NOT absolutely contained in Q2. But, Q1 is relatively contained in Q2. Q1 Q2 C = ? L = ? L = ? B = ? PSP *p1 PSP *p2 Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Example Q1 is NOT absolutely contained in Q2. But, Q1 is relatively contained in Q2. Relative Containment 2 LB : RC, CL G Q1 Q2 R = ? C = ? C = ? L = ? L = ? B = ? PSP *p1 PSP *p2 Stefanos Souldatos - SSDBM 2006

Experiments

Stefanos Souldatos - SSDBM 2006 Experiments We measured… execution time for checking Absolute Containment, Relative Containment and Relative Containment 2 (RC2) the accuracy of RC2 …varying the density of the dimension graphs (dimensions, root-to-leaf paths) the size of the queries (PSPs, nodes per PSP) Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Varying Graphs RC2 accuracy > 80% RC2 accuracy > 50% 30 dimensions Relative Containment Relative Containment 2 time (sec) Absolute Containment Stefanos Souldatos - SSDBM 2006 graph paths

Stefanos Souldatos - SSDBM 2006 Varying Queries RC2 accuracy > 93% 2 PSPs Relative Containment time (sec) Relative Containment 2 Absolute Containment Stefanos Souldatos - SSDBM 2006 dimensions per PSP

Conclusion and Future Work

Stefanos Souldatos - SSDBM 2006 Conclusion We addressed the problem of Query Containment for Partially Specified Tree-Pattern Queries (PSTPQs). We suggested a sound technique for checking Relative Query Containment. Experiments showed that the technique improves checking of relative containment one order of magnitude with accuracy over 80% for graphs of common sizes (based on XML benchmarks e.g. XMark, XMach etc). Stefanos Souldatos - SSDBM 2006

Stefanos Souldatos - SSDBM 2006 Future Work Suggest and experiment with a set of heuristics for checking Relative Query Containment. Discuss the trade-off between time and accuracy for each of those heuristics. Stefanos Souldatos - SSDBM 2006

Questions?

Stefanos Souldatos - SSDBM 2006 Links Introduction (2-10) Data Model (11-18) Query Evaluation (19-24) Query Containment (25-32) Experiments (33-36) Conclusion – Future Work (37-39) Stefanos Souldatos - SSDBM 2006

Appendix

Stefanos Souldatos - SSDBM 2006 Dimension Trees R C B T M L E C = {Greece} B = {BMW} M = ? E = ? PSP *p2 PSP p1 R C = {Greece} B = {BMW} T M E R C = {Greece} B = {BMW} T M E R C = {Greece} B = {BMW} T M E R C = {Greece} B = {BMW} T E M r/Greece/BMW/ *T[*E]/*M r/Greece/BMW/ *T/*M [*E] r/Greece/BMW/ *T[*M/*E]/*E*M r/Greece/BMW/ *T/*E/*M Stefanos Souldatos - SSDBM 2006