Efficient Filtering of XML Documents with XPath Expressions

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

XML: Extensible Markup Language
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
On the Memory Requirements of XPath Evaluation over XML Streams Ziv Bar-Yossef Marcus Fontoura Vanja Josifovski IBM Almaden Research Center.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Boosting XML filtering through a scalable FPGA-based architecture A. Mitra, M. Vieira, P. Bakalov, V. Tsotras, W. Najjar.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J.
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y. Νίκος Λούτας.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Buffering in Query Evaluation over XML Streams Ziv Bar-Yossef Technion Marcus Fontoura Vanja Josifovski IBM Almaden Research Center.
XML Transformations and Content-based Crawling Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Buffering in Query Evaluation over XML Streams Ziv Bar-Yossef Technion Marcus Fontoura Vanja Josifovski IBM Almaden Research Center.
HKU CSIS DB Seminar: HKU CSIS DB Seminar: Efficient Filtering of XML Documents for Selective Dissemination of Information Mehmet Altinel, Micheal J. Franklin.
1 On Efficient Matching of Streaming XML Documents and Queries Laks V.S. Lakshmanan 1 P. Sailaja 2 1 University of British Columbia, Canada 2 Indian Inst.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
1 Large-Scale Information Filtering Systems Fatma Ozcan May 9, 2000 University of Maryland, College Park.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
ICDCS Beijing China Routing of XML and XPath Queries in Data Dissemination Networks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
What is a tree really? Patrick Durusau Society of Biblical Literature TEI 2003 Nancy, France.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
Information Retrieval in Practice
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Fast Data Analytics with FPGAs
Trie Indexes for Efficient XML Query Processing
By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01
Tries 07/28/16 11:04 Text Compression
CPS216: Data-intensive Computing Systems
Tree-Pattern Aggregation for Scalable XML Data Dissemination
Efficient processing of path query with not-predicates on XML data
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
High-Performance XML Filtering with YFilter
RE-Tree: An Efficient Index Structure for Regular Expressions
{ XML Technologies } BY: DR. M’HAMED MATAOUI
Structure and Value Synopses for XML Data Graphs
Lecture 18. Basics and types of Trees
Optimal Configuration of OSPF Aggregates
TT-Join: Efficient Set Containment Join
OrientX: an Integrated, Schema-Based Native XML Database System
(b) Tree representation
Regular Expression Acceleration at Multiple Tens of Gb/s
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Query Processing for High-Volume XML Message Brokering
Towards an Internet-Scale XML Dissemination Service
Early Profile Pruning on XML-aware Publish-Subscribe Systems
MCN: A New Semantics Towards Effective XML Keyword Search
A Framework for Testing Query Transformation Rules
Tree-Pattern Similarity Estimation for Scalable Content-based Routing
Wei Wang University of New South Wales, Australia
High-Performance Pattern Matching for Intrusion Detection
Presentation transcript:

Efficient Filtering of XML Documents with XPath Expressions Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Information Sciences Research Center Bell Laboratories, Lucent Technologies

Motivation Efficient Filtering of XML Documents with XPath Expressions Growing interest in content-based filtering & routing of data. Data Publishers Subscription Table Filtering Engine Subset of Relevant Data Consumers data Content-based Router XML => More expressive XPath-based subscriptions (e.g., Intel’s NetStructure XML Accelerator). Challenge: How to efficiently filter XML data with XPath-based subscriptions?

Problem Abstraction document XPath Filter Subset of S that match D D Efficient Filtering of XML Documents with XPath Expressions Problem Abstraction XML document D Subset of S that match D XPath Filter S, Set of XPath expressions (XPEs)

Challenges Filtering with XPath expressions (XPEs) is non-trivial: Efficient Filtering of XML Documents with XPath Expressions Challenges Filtering with XPath expressions (XPEs) is non-trivial: Complexity of XPEs -- tree-structured patterns that include ``*’’ and ``//’’ operators. Need for both unordered & ordered matchings. //a /b /f //e /*/d /c Example: p = // a / b [ c / * / d ] / / e / f XPE tree of p

Efficient Filtering of XML Documents with XPath Expressions Our Solution: XTrie Speed up XPE filtering with a novel index called XTrie. Key idea: Decompose Complex, tree- structured XPE Set of simple, linear patterns (substrings) XTrie Index with trie

XTrie Index Construction Algorithm Architecture of XTrie Efficient Filtering of XML Documents with XPath Expressions XTrie Index Construction Algorithm Complex, tree - structured XPEs XML document D XML Parser (SAX based) Start/End Element Events XTrie Index Set of XPEs that match D XTrie Matching Algorithm

Architecture of XTrie Complex, tree - structured XPEs Set of simple, Efficient Filtering of XML Documents with XPath Expressions Architecture of XTrie Complex, tree - structured XPEs Set of simple, linear patterns (substrings) Decompose XPEs Build XTrie index XML document D XML Parser (SAX based) XTrie Index Start/End Element Events Trie Set of XPEs that match D XTrie Matching Algorithm Substring Table

Decomposition of XPEs Efficient Filtering of XML Documents with XPath Expressions Decompose each XPE p into a set of substrings that “cover” p. Substring = Sequence of element names along some path in XPE tree, where each consecutive pair of nodes is related by a “/” operator (without any “*” or “//”). Example: p = // a / b [ c / * / d ] // e / f Substrings in p = {a, b, c, d, e, f, ab, bc, ef, abc }. //a /b /f //e /*/d /c

Decomposition of XPEs Efficient Filtering of XML Documents with XPath Expressions Decompose each XPE p into a set of substrings that “cover” p. Substring = Sequence of element names along some path in XPE tree, where each consecutive pair of nodes is related by a “/” operator (without any “*” or “//”). Example: p = // a / b [ c / * / d ] // e / f Substrings in p = {a, b, c, d, e, f, ab, bc, ef, abc }. One possible decomposition of p is { a, bc, d, ef }. //a /b /f //e /*/d /c

Efficient Filtering of XML Documents with XPath Expressions Decomposition of XPEs In general, there are many possible decompositions. Single-Element Decomposition Minimal Decomposition //a /b /f //e /*/d /c //a /b . . . . . . . . . /c //e /*/d /f

Decomposition of XPEs Efficient Filtering of XML Documents with XPath Expressions “Enhanced” min. decomp. = min. decomp. with a substring ending at each branching node. //a /b /f //e /*/d /c Single-Element Decomposition Minimal . . . “Enhanced”

XTrie XTrie index consists of 2 components: XTrie Index Trie Substring Efficient Filtering of XML Documents with XPath Expressions XTrie XTrie index consists of 2 components: Trie Substring Table XTrie Index

XTrie XPEs p = // a / a / b / * / a / b q = / a / b [ c] // b / c Efficient Filtering of XML Documents with XPath Expressions XPEs p = // a / a / b / * / a / b q = / a / b [ c] // b / c

XTrie Decomposed Substrings XPEs /a /b /c //b //a /*/a p q Efficient Filtering of XML Documents with XPath Expressions Decomposed Substrings /a /b /c //b //a /*/a p q XPEs p = // a / a / b / * / a / b q = / a / b [ c] // b / c

XTrie Decomposed Substrings Substring-Table /a /b /c //b //a /*/a p q Efficient Filtering of XML Documents with XPath Expressions Decomposed Substrings /a /b /c //b //a /*/a p q Parent Row Rel. Level Num Child Rank aab 1 2 3 4 5 1 3 1 2 1 2 ab ab abc bc Substring-Table

XTrie Trie Substring-Table aab ab abc bc a b a b c Next Row Parent Efficient Filtering of XML Documents with XPath Expressions Trie 1 a b 2 3 a b c Substring-Table 4 5 6 Next Row Parent 1 3 Rank 2 Rel. Level Num Child 4 5 b c aab ab 7 8 abc Child Node Ptr bc Substring Table Ptr

XTrie Trie Substring-Table aab ab abc bc a b a b c Next Row Parent Efficient Filtering of XML Documents with XPath Expressions Trie 1 a b 2 3 a b c Substring-Table 4 5 6 Next Row Parent 1 3 Rank 2 Rel. Level Num Child 4 5 b c aab ab 7 8 abc Child Node Ptr Substring Table Ptr Max. Suffix Ptr bc

Optimizations for XTrie Efficient Filtering of XML Documents with XPath Expressions Optimizations for XTrie “Lazy” variant of XTrie Reduce number of accesses to substring-table by probing it only when the matched substring is a leaf substring of some XPE. XTrie for single-path XPEs Optimize data structures & algorithms by exploiting the simpler structures of single-path XPEs.

Related Work Commercial Products (e.g. BEA, Intel, etc). Efficient Filtering of XML Documents with XPath Expressions Related Work Commercial Products (e.g. BEA, Intel, etc). XFilter [ Altinel & Franklin, VLDB’00] Model single-path XPEs as finite state machines (FSMs). /a /c //b p = / a // b / c Build a hash index on FSMs’ transitions (ie element names). a b c candidate-list wait-list Optimizations XFilter-LB = XFilter with list balancing Prefiltering = 2 parses over XML data to pre-filter some XPEs.

Experimental Evaluation Efficient Filtering of XML Documents with XPath Expressions Experimental Evaluation DTD: NITF (News Industry Text Format) 123 elements, 513 attributes XML data: Generated with IBM’s XML Generator (size = 20, 100, 1000 tag pairs) XPath expressions: Generated using our own generator (P = #XPEs, L = max. depth, Pw = prob. of ‘*’, Pd = prob of ‘//’, z = skew of element names) Algorithms: Eager & Lazy XTrie, XFilter & XFilter-LB [Altinel & Franklin, VLDB00] System: Sun Ultra-250 (296MHz) with 512 MB memory running Solaris 2.7 NITF: News Industry Text Format

Efficient Filtering of XML Documents with XPath Expressions Scalability (# XPEs)

Efficient Filtering of XML Documents with XPath Expressions Scalability (# tags)

Efficient Filtering of XML Documents with XPath Expressions Conclusions XTrie -- A novel index structure that supports the efficient filtering of streaming XML data based on XPath expressions. Features: Index both simple single-path as well as complex tree-structured XPath expressions. Handles ordered, unordered, and hybrid modes of matching.

Efficient Filtering of XML Documents with XPath Expressions Speedup / # XPEs (m)

Efficient Filtering of XML Documents with XPath Expressions Wildcards (m)

Efficient Filtering of XML Documents with XPath Expressions Descendants (m)

Efficient Filtering of XML Documents with XPath Expressions Number of levels (m)

Efficient Filtering of XML Documents with XPath Expressions Skew (m)