Trie Indexes for Efficient XML Query Processing

Slides:

Advertisements

Similar presentations

Chapter 2 Revision of Mathematical Notations and Techniques

Advertisements

Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.

Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.

Database Group – CSE - UNSW 1 Efficient Error-tolerant Query Autocompletion Chuan Xiao 1, Jianbin Qin 2, Wei Wang 2, Yoshiharu Ishikawa 1, Koji Tsuda 3,

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.

TREECHOP: A Tree- based Query-able Compressor for XML Gregory Leighton, Tomasz Müldner, James Diamond Acadia University June 6, 2005.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search: suffix trees)

ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,

BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y. Νίκος Λούτας.

Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.

Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.

Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.

Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen.

Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.

Topics Automata Theory Grammars and Languages Complexities

CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.

Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant.

TEDI: Efficient Shortest Path Query Answering on Graphs Author: Fang Wei SIGMOD 2010 Presentation: Dr. Greg Speegle.

Introduction Chapter 0. Three Central Areas 1.Automata 2.Computability 3.Complexity.

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,

The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.

Similarity based Retrieval from Sequence Databases using Automata as Queries 作者 : A. Prasad Sistla, Tao Hu, Vikas howdhry 出處 :CIKM 2002 ACM 指導教授 : 郭煌政老師.

Querying Structured Text in an XML Database By Xuemei Luo.

Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.

Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.

Database Systems Part VII: XML Querying Software School of Hunan University

5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.

BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu.

BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.

KAIST2002 SIGDB Tutorial1 Indexing Methods for Efficient XML Query Processing Jun-Ki Min KAIST

Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.

Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.

Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.

Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe

Streaming XPath Engine Oleg Slezberg Amruta Joshi.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.

1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.

1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda.

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.

CONTEXT-FREE LANGUAGES

Logic as a Query Language: from Frege to XML

By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01

Efficient Filtering of XML Documents with XPath Expressions

RE-Tree: An Efficient Index Structure for Regular Expressions

Dynamic Indexing in SpatialHadoop

OrientX: an Integrated, Schema-Based Native XML Database System

XML-Based RDF Data Management for Efficient Query Processing

On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.

Querying XML XPath.

Querying XML XPath.

Early Profile Pruning on XML-aware Publish-Subscribe Systems

MCN: A New Semantics Towards Effective XML Keyword Search

Incremental Maintenance of XML Structural Indexes

Indexing Methods for Efficient XML Query Processing

Query Optimization.

Wei Wang University of New South Wales, Australia

Introduction to XML IR XML Group.

CoXML: A Cooperative XML Query Answering System

Presentation transcript:

Trie Indexes for Efficient XML Query Processing Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz Indiana University, Bloomington {sbrenesb, yuqwu, vgucht, psantacr}@cs.indiana.edu

XML and Queries – An Example Query 1: //A/B/C Query 2: //B/C Query 3: //A/B[./D]/C Query 4: //A[./B[./D]]/B/C

Index and XML Query Evaluation Challenges  Structure Data: containment relationship Query: pattern matching (nested) predicates

Structural Indices for XML Data Consider both value and structure Index Features Structural Indices Pure structural summaries DataGuide, T-index Local bi-similarity A(k), UD(k,i), D(k), M(k) Workload-aware D(k), M(k), M*(k) Encoded sequence ViST, Index Fabric Index chooser XIST

Expected Features for an XML Index Reasonable size Easy to construct and adjust Query evaluation Index-only plan for most queries.

Outline Introduction Methodology Partition induced by structural characteristics of XML Partition induced by fragments of XPath Algebra Coupling and Block Union Theorems Trie Indices and Query Evaluation Experimental Evaluation Future Directions

Rewind – back to the world of RDB RDBMS Engineering Techniques RDBMS Theory

Our approach Study XML query language and its fragments Study the indistinguishibility of components in an XML documents Reason about existing XML indices Design new XML indices.

Outline Introduction Methodology Partition induced by structural characteristics of XML Partition induced by fragments of XPath Algebra Coupling and Block Union Theorems Trie Indices and Query Evaluation Experimental Evaluation Future Directions

XML Data Model Represent XML document D as a finite unordered node-labeled tree D = (V, Ed, r, ) Nodes: V Edges: Ed Root: r Labels:

Label Path LP(m,n) LP(n, k) LP(m,n) = (A,B,C) LP(n,0) = (C) LP(n, 1) = (B,C) LP(n,4) = (A,A,B,C) LP(n,7) = (A,A,B,C) m n

N [k] Equivalence Given an XML document and value k

N [k] Partition N [1][(A,B)] = {B1, B2, B3, B4} N [1] Label Path (A) (A,A) (A,B) (B,B) (B,C) (B,D) {A1} {A2} {B1, B2, B3, B4} {B5} {C1, C2, C3, C4} {D1} Label Path N [1][(A,B)] = {B1, B2, B3, B4}

P [k] Equivalence Given an XML document and value k

P [k] Partition P [1][(A,A)] = {(A1, A2)} P [1] (A) (B) (C) (D) {(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)} {(C1, C1), (C2, C2), (C3, C3), (C4, C4)} {(D1, D1)} (A,A) (A,B) (B,B) (B,C) (B,D) {(A1, A2)} {(A1, B1), (A2, B2), (A2, B3), (A1, B4)} {(B4, B5)} {(B1, C1), (B2, C2), (B3, C3), (B5, C4)} {(B2, D1)} P [1][(A,A)] = {(A1, A2)}

P [k] Partition P [2][(A,B,C)] = {(A1, C1), (A2, C2), (A2, C3)} P [2] (D) {(A1, A1), (A2, A2)} {(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)} {(C1, C1), (C2, C2), (C3, C3), (C4, C4)} {(D1, D1)} (A,A) (A,B) (B,B) (B,C) (B,D) {(A1, A2)} {(A1, B1), (A2, B2), (A2, B3), (A1, B4)} {(B4, B5)} {(B1, C1), (B2, C2), (B3, C3), (B5, C4)} {(B2, D1)} (A,A,B) (A,B,B) (A,B,C) (A,B,D) (B,B,C) {(A1, B2), (A1, B3)} {(A1, B5)} {(A1, C1), (A2, C2), (A2, C3)} {(A2, D1)} {(B4, C4)} P [2][(A,B,C)] = {(A1, C1), (A2, C2), (A2, C3)}

Outline Introduction Methodology Partition induced by structural characteristics of XML Partition induced by fragments of XPath Algebra Coupling and Block Union Theorems Trie Indices and Query Evaluation Experimental Evaluation Future Directions

XPath Algebra Path semantics Node semantics

Fragments of XPath Algebra D algebra XPath algebra - ↑, π1 D [ ] algebra XPath algebra - ↑ D [k] algebra D algebra up to length k D [ ][k] algebra D [ ] algebra up to length k

D [k] Equivalence Given an XML document and value k and (m1, n1), (m2, n2) in DownPairs(D) For any E in D [k]

Outline Introduction Methodology Partition induced by structural characteristics of XML Partition induced by fragments of XPath Algebra Coupling and Block Union Theorems Trie Indices and Query Evaluation Experimental Evaluation Future Directions

Coupling Theorem Let D be a document and k is an integer. The P[k]-partition of D and the D[k]- partition of D are the same under the path semantics The N[k]-partition of D and the D[k]-partition of D are the same under the node semantics

k-Label-Path Set The set of label-paths of length k in an XML document that satisfies an XPath expression in algebra D.

Label-Union Theorem Let D be a document, k an integer, and E is an D[k] expression. Then there exists a class of partition blocks of the P[k]-partition (N[k]- partition) of D such that

Query Evaluation Using Label-Union Theorem Query 2: //B/C LPS(E,2) = {(A,B,C), (B,B,C)} N [2] (A) (A,A) (A,B) (A,A,B) (A,B,B) (A,B,C) (B,B,C) (A,B,D) {A1,} {A2} {B1, B4} {B2, B3,} {B5} {C1, C2, C3} {C4} {D1}

Outline Introduction Methodology Partition induced by structural characteristics of XML Partition induced by fragments of XPath Algebra Coupling and Block Union Theorems Trie Indices and Query Evaluation Experimental Evaluation Future Directions

N[k]-Trie Index Keep track of the N [k]-partitions Use the reverse label path as key N [2] (A) (A,A) (A,B) (A,A,B) (A,B,B) (A,B,C) (B,B,C) (A,B,D) {A1,} {A2} {B1, B4} {B2, B3,} {B5} {C1, C2, C3} {C4} {D1}

Query Evaluation with N [k]-Trie Index Query 1: //A/B/C LPS(E,2) = {(A,B,C)} N [2] (A) (A,A) (A,B) (A,A,B) (A,B,B) (A,B,C) (B,B,C) (A,B,D) {A1,} {A2} {B1, B4} {B2, B3,} {B5} {C1, C2, C3} {C4} {D1}

Query Evaluation with N [k]-Trie Index Query 2: //B/C LPS(E,2) = {(A,B,C), (B,B,C)} N [2] (A) (A,A) (A,B) (A,A,B) (A,B,B) (A,B,C) (B,B,C) (A,B,D) {A1,} {A2} {B1, B4} {B2, B3,} {B5} {C1, C2, C3} {C4} {D1}

P[k]-Trie Index Keep track of the P[k]-partitions Use the reverse label path as key P [2] (A) (B) (C) (D) {(A1, A1), (A2, A2)} {(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)} {(C1, C1), (C2, C2), (C3, C3), (C4, C4)} {(D1, D1)} (A,A) (A,B) (B,B) (B,C) (B,D) {(A1, A2)} {(A1, B1), (A2, B2), (A2, B3), (A1, B4)} {(B4, B5)} {(B1, C1), (B2, C2), (B3, C3), (B5, C4)} {(B2, D1)} (A,A,B) (A,B,B) (A,B,C) (A,B,D) (B,B,C) {(A1, B2), (A1, B3)} {(A1, B5)} {(A1, C1), (A2, C2), (A2, C3)} {(A2, D1)} {(B4, C4)}

Query Evaluation with P[k]-Trie Index Query 1: //A/B/C

Query Evaluation with P[k]-Trie Index Query 2: //B/C

Query Evaluation with P[k]-Trie Index Query 3: //A/B[./D]/C

Query Evaluation with P[k]-Trie Index Query 3: //A/B[./D]/C

Outline Introduction Methodology Partition induced by structural characteristics of XML Partition induced by fragments of XPath Algebra Coupling and Block Union Theorems Trie Indices and Query Evaluation Experimental Evaluation Future Directions

Experimental Setup Indices prototyped in TIMBER system Report results on DBLP data 127M bytes 3.3M nodes

Index Sizes

Index Creation Time

Query Evaluation //dblp/inproceedings/title/i/sub

Query Evaluation //dblp/inproceedings[./title[./i]/sub]/ee

Outline Introduction Methodology Partition induced by structural characteristics of XML Partition induced by fragments of XPath Algebra Coupling and Block Union Theorems Trie Indices and Query Evaluation Experimental Evaluation Conclustion

Conclusion P [k]-Trie index is able to facilitate index-only plan for most queries  consistently and significantly outperform N[k]-Trie and A(k)- index. A modest k value is sufficient for providing significant performance improvements.

Thanks!! Questions?

Research Direction Further study of query decomposition and inversion algorithms Study workload driven index creation Develop other appropriate index structures