ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML DOCUMENTS AND DATABASES
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
Indexing Strategies for the Linguist’s Search Engine Aaron Elkiss and Philip Resnik UMIACS.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
1 Suffix Trees and Suffix Arrays Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 8)
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Presentation for Cmpe-521 VIST – Virtual Suffix Tree Prepared by: Evren CEYLAN – Aslı UYAR
Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)
CS 171: Introduction to Computer Science II
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Midterm 2 Overview Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Indexing Data Relationships Michael J. Franklin University of California, Berkeley & RightOrder Inc.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
XML and Database.
S EQUENCE I NDEXING S CHEMES Roman Čížek Erasmus 2687, Nelly Vouzoukidou MET601.
Chapter 4: Trees Part I: General Tree Concepts Mark Allen Weiss: Data Structures and Algorithm Analysis in Java.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
2004/12/31 報告人 : 邱紹禎 1 Mining Frequent Query Patterns from XML Queries L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya. Proc. of 8th Int. Conf. on Database.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
ARRAYS IN C/C++ (1-Dimensional & 2-Dimensional) Introduction 1-D 2-D Applications Operations Limitations Conclusion Bibliography.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Tries 07/28/16 11:04 Text Compression
Indexing Structures for Files and Physical Database Design
Tries 5/27/2018 3:08 AM Tries Tries.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
CS 430: Information Discovery
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
CS223 Advanced Data Structures and Algorithms
OrientX: an Integrated, Schema-Based Native XML Database System
A Fast Algorithm for Subspace Clustering by Pattern Similarity
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Chapter 11 Indexing And Hashing (1)
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Tries 2/27/2019 5:37 PM Tries Tries.
Important Problem Types and Fundamental Data Structures
Recuperação de Informação B
Presentation transcript:

ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva, November 2003

Overview Modeling XML Queries Structure-encoded sequences Indexing ViST Experimental Results

Modeling XML Queries

DTD of purchase records: (!ELEMENT purchases (purchase*)) (!ELEMENT purchase (seller, buyer)) (!ATTRIST seller ID ID location CDATA name CDATA) (!ELEMENT seller (item*)) (!ATTRIST buyer ID ID location CDATA name CDATA) (!ELEMENT item (item*)) (!ATTRIST item name CDATA manufacturer CDATA)

Modeling XML Queries Focus in XML query language design: ability to express complex structural or graphical queries

Modeling XML Queries Querying XML data = finding sub structures of the data graph that match the sequence Structure-encoded sequences: a sequential representation of both XML data and XML queries

Structure-Encoded Sequences

Maps the data and the queries Matches the subsequence Purpose: to avoid as many join operations as possible Def. Sequence of (symbol, prefix) pairs

Mapping Data Represent XML document/tree in preorder Represent in structure-encoded seq

Mapping Queries Benefit of sequence matching: query gets processed as whole Path Expression

Structure-Encoded Sequences Query Data

Querying XML through Structure-Encoded Sequence Matching

Indexing

Role of Indexing To provide an algorithm to perform this sequence matching Desired features for algorithm: –Efficient support for subsequence matching –Use well-supported DB indexing techniques such as B+ trees –Allow dynamic index insertion

What is indexing useful for Auxiliary access structures –Used to speed up the retrieval of records –In response to certain search conditions Provide efficient support for arbitrary structured queries –Using wild-cards // and *

Indexing State-of the-art approaches –Indexes on paths –Indexes on nodes –Indexes on both (structures) – ViST

ViST

Algorithms Naïve Algorithm based on Suffix Trees RIST: Relationships Indexed Suffix Tree ViST: Virtual Suffix Tree

Algorithm Using Suffix Trees Suffix Tree: a compact index to all distinct, contiguous substrings of a string D-Ancestorship – in XML doc tree Through structure-encoded sequence S-Ancestorship – in suffix tree

Example Using Suffix Trees

Algorithm Using Suffix Trees Searches –first by S-Ancestorship: searching under suffix tree –then by D-Ancestorship: matching nodes and prefixes Disadvantages: –Costly – traverse large portion of subtree –Most commercial DBMSs do not support

RIST: Indexing by Ancestor- Descendant Relationships Jumps directly to the nodes Y to which X is both a D-Ancestor and S-Ancestor Index Construction: uses B+ trees

RIST: Indexing by Ancestor- Descendant Relationships Subsequence Matching Determine D-Ancestorship by prefixes Determine S-Ancestorship by label x – suffix tree node (root of S-tree) nx – prefix traversal order sizex – number of descendants

ViST: the Virtual Suffix Tree Same sequence algorithm as RIST BUT supports dynamic insertions Uses dynamic method to assign labels Once assigned, the labels are fixed and are not affected by subsequent data insertion or deletion Labeling the suffix tree w/o building it Relies on statistical information about the XML data

ViST: the Virtual Suffix Tree Index structure contains the sequence: Sequence to be inserted: Dynamic scope of x =

ViST: the Virtual Suffix Tree

Experimental Results Datasets used –DBLP: CS bibliography DBDBLP 289,627 records/publications Each publication – tree of max depth 6 Avg length of structure-encoded seq = 31 –XMARKXMARK 1 record Complicated tree structure –Synthetic

Experimental Results Comparison Methods –Index Fabric Algorithm – XML paths –XISS – uses nodes as basic query unit –ViST – appx. 1/10 of time to perform queries due to (multiple) join operations

Experimental Results - remove Index Structure and Size (1/3 less from suffix tree) –DocId B+ Tree – N elements –Combined D-ancestor and S-ancestor B+ tree - N x L elements Index Construction

Conclusion XML Queries = Subsequence Matching Advantages of ViST – algorithm for subsequence matching –Avoids expensive join operations –Index on both content and structure of XML documents –B+ trees – supported by disk-based data –Dynamic data insertion and deletion