Early Profile Pruning on XML-aware Publish-Subscribe Systems

Slides:

Advertisements

Similar presentations

Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.

Advertisements

XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.

Computing Structural Similarity of Source XML Schemas against Domain XML Schema Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Jixue Liu 3 Guoren Wang 4 Chi.

XML: Extensible Markup Language

A General Algorithm for Subtree Similarity-Search The Hebrew University of Jerusalem ICDE 2014, Chicago, USA Sara Cohen, Nerya Or 1.

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.

TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.

Boosting XML filtering through a scalable FPGA-based architecture A. Mitra, M. Vieira, P. Bakalov, V. Tsotras, W. Najjar.

Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,

An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J.

ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,

Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.

Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.

Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.

From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University

Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.

1 COS 425: Database and Information Management Systems XML and information exchange.

Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.

1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.

Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,

XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,

Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)

Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.

HKU CSIS DB Seminar: HKU CSIS DB Seminar: Efficient Filtering of XML Documents for Selective Dissemination of Information Mehmet Altinel, Micheal J. Franklin.

A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

Querying Structured Text in an XML Database By Xuemei Luo.

Database Systems Part VII: XML Querying Software School of Hunan University

5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.

BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.

Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.

Jennifer Widom XML Data Introduction, Well-formed XML.

Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.

Algorithms and data structures Protected by

QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.

Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.

Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.

Martin Kruliš by Martin Kruliš (v1.1)1.

Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.

XML DOM Week 11 Web site:

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,

1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.

Information Retrieval in Practice

Efficient Evaluation of XQuery over Streaming Data

Query Optimization Heuristic Optimization

Efficient Filtering of XML Documents with XPath Expressions

RE-Tree: An Efficient Index Structure for Regular Expressions

Computing Full Disjunctions

Week 11 Web site: XML DOM Week 11 Web site:

(b) Tree representation

XML Data Introduction, Well-formed XML.

Query Processing for High-Volume XML Message Brokering

Towards an Internet-Scale XML Dissemination Service

Structure and Content Scoring for XML

Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs

XML Query Processing Yaw-Huei Chen

Time Relaxed Spatiotemporal Trajectory Joins

Structure and Content Scoring for XML

Continuous Motion Pattern Query

Efficient Aggregation over Objects with Extent

CoXML: A Cooperative XML Query Answering System

Presentation transcript:

Early Profile Pruning on XML-aware Publish-Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California, Riverside 2/2/2019

Overview Motivation Bottom-up Filtering FSM (BUFF) Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019

Motivation Publish-subscribe systems: The message transmission is defined by the message content Examples: notification websites hotwire.com or ticketmaster.com Publisher Publisher Publisher Publisher Docu ments Docu ments Docu ments Docu ments Matching algorithm Re su l t Re su l t Re su l t Re su l t Prof ile Prof ile Prof ile Prof ile Submit, Update, Delete Submit, Update, Delete Submit, Update, Delete Submit, Update, Delete Subscriber Subscriber Subscriber Subscriber 2/2/2019

Publish-subscribe systems The data is exchanged in XML format. Nodes - correspond to elements, attributes or text values Edges represent immediate element-subelement or element-value relationships <Bib> <article vol=“7” no=“11”> <title>t1</title> <author> <last>DeWitt</last> <mi>J</mi> <first>David</first> </author> <journal>TPDS</journal> <year>1996</year> </article> <article> <title>t2</title> <last>Florescu</last> <first>Daniela</first> <proceedings>SIGMOD </proceedings> <year>2006</year> </Bib> Bib article title journal author last first David DeWitt TPDS t1 proceedings Daniela Florescu SIGMOD t2 mi J year 1996 2006 no 11 vol 7 Overview of the talk (a) Document (b) Tree representation 2/2/2019

Publish-subscribe systems (cont.) The user profiles are expressed in XML query language (XPath, XQuery) XML query contains structural constraints value-based constraints Structural constraints: ////article[/author[@last=``Smith'']]//procs[@conf=``VLDB''] Tree pattern: article author proceedings last conf Overview of the talk 2/2/2019

Related Work/Our Contribution Current work Construction of overlay network Dissemination/indexing of profiles (queries) Processing of stream of messages We focus on the matching process that takes place within a broker Improves the performance of regular FSM by using a bottom-up evaluation of the document Develop index-based filtering technique that performs early pruning of the query profile 2/2/2019

Overview Motivation Bottom-up Filtering FSM (BUFF) Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019

Bottom-up vs. Top-down filtering State machines are among the most common methods for the XML matching process Top-down approach: (i.e. in-order traversal or depth first order): advancing the state machine for each XML element (or attribute) read. Do not consider any form of early pruning Bottom-up approach: This approach takes into consideration the (usual) fact that an XML document has its more selective elements located at its leaves On this slide we have definition of the problem. 2/2/2019

Example Top-down approach groups the queries according to their common prefixes Bottom up: groups them according to their common suffixes. root Q1 a b c d Q2 a Q3 a Q4 a e f h Q5 e Q6 e g h a a a a a a a a a a a b b b b b b b b b b b c e f c c c c c c c c c c c d f h d (a) Document (b) Queries c d a 2 3 4 b 3 4 b Q1 c 1 2 Q1 c d d a 1 5 6 5 On this slide we have definition of the problem. a e Q2 Q2 f h 7 8 9 f e a 6 7 8 Q3 Q4 h Q3 e f h 11 12 f e 10 11 a 12 10 Q5 9 Q5 Q4 g h g e 13 14 13 14 Q6 Q6 (c) Top-down (d) Bottom up 2/2/2019

BUFF FSM-based Bottom-up approach for XML filtering. BUFF avoids translating documents and queries to Prüfer sequences (as the other algorithms do), and employs a more direct evaluation algorithm. The document is parsed through a SAX parser, which triggers events for specific marks (tags) in the XML document The machine keeps a runtime stack that stores the current document path being processed. Overview of the talk 2/2/2019

BUFF Example d4 a1 b2 c3 d7 b5 c6 e8 f10 e9 </e> a c d b 1 Q1 f c b b a <b> 5 6 7 8 <a> a Q2 (a) Document and BUFF (b)‏ (c)‏ </d> a c b 2 1 a c b e 1,2 </f> 5 </e> a c b 1 1,2 5 </c> a b 3,6 1,2,5 Overview of the talk (d)‏ (e)‏ (f)‏ (g)‏ 2/2/2019

Overview Motivation Bottom-up Filtering FSM (BUFF) Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019

Bounding-based XML Filtering Two major processes working asynchronously Profile Management Profile Matching Profile Index Profiles P1 P2 P3 Prüfer Sequence Profile Manager Matching Algorithm Overview of the talk Matching Module Profiles (queries)‏ Input Documents Matched Documents 2/2/2019

Prüfer Sequence A unique sequential encoding of a labeled tree Algorithm: Iteratively removes nodes from the tree until all nodes but the last two have been removed. At each iteration, the algorithm finds and removes the leaf with the smallest label and adds to the Prüfer sequence the label of that leaf's parent. Theorem: If a query tree Q is a subgraph of a document tree D then the Prüfer sequence of Q is a subsequence of the Prüfer sequence of D 2/2/2019

Sequence Envelope Assume a set of k Prüfer sequences representing user profiles S1,..,Sk We can derive two new sequences Upper bound U: for each position take largest element Lower bound L: for each position take smallest element L and U form the smallest possible bounding envelope that encompasses all members of the set of sequences from above and below. Overview of the talk 2/2/2019

Example Assume 3 sequences with 11 symbols each abcabababcd cdcdecdcdec dedededebab Overview of the talk 2/2/2019

Sequence Envelope (Cont.) The sequence envelope structure is that it can be used as an aggregation of the sustaining set of sequences Overview of the talk 2/2/2019

BoXFilter Tree Sequence envelopes can be nested forming BoXFilter tree Overview of the talk 2/2/2019

Filtering algorithms The profiles in the system are organized in BoXFilter tree. Documents are traversed thought the tree There are two variations of the filtering algorithm Sequential – documents are processed one by one Batch processing – documents are organized in a tree like the queries and both trees are joined After the traversal of the BoXFilter tree, there is a verification step Overview of the talk 2/2/2019

Overview Motivation Bottom-up Filtering FSM (BUFF) Bounding-based XML Filtering (BoxFilter) Core Modules Filtering algorithms Experimental results Overview of the talk 2/2/2019

Experimental Results We have generated datasets with 1000, 10000 and 100000 small documents (with up to 8KB) We generated up to 100000 queries with selectivity fixed to 50% Overview of the talk (a)‏ (b)‏ (c)‏ 2/2/2019

Experimental Results (cont.) In this set of experiments, we vary the number of documents that match any of the profile queries. (selectivity 1\% means that one percent of the documents satisfy \textit{any} of the queries.) Overview of the talk 2/2/2019

Thank You! 2/2/2019