R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.

Slides:



Advertisements
Similar presentations
Research Issues in Web Services CS 4244 Lecture Zaki Malik Department of Computer Science Virginia Tech
Advertisements

HUX: Handling Updates in XML DataBase Systems Research Group Departmet of Computer Science Worcester Polytechnic Institute, Worcester, MA 01609, USA
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.
Maintaining Sliding Widow Skylines on Data Streams.
Fine Grained Access Control in XML DataBase Systems Naveen Yajamanam April 27,2006.
Information Retrieval in Practice
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Xyleme A Dynamic Warehouse for XML Data of the Web.
RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS
A Compiler-Based Approach to Schema-Specific Parsing Kenneth Chiu Grid Computing Research Laboratory SUNY Binghamton Sponsored by NSF ANI
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries Song Wang Elke Rundensteiner Database Systems Research Group Worcester.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
Compiler Summary Mooly Sagiv html://
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
1 Augmenting MatML with Heat Treating Semantics Aparna Varde, Elke Rundensteiner, Murali Mani Mohammed Maniruzzaman and Richard D. Sisson Jr. Worcester.
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
AGGREGATE PATH INDEX FOR INCREMENTL WEB VIEW MAINTENANCE Author: Li Chen and Elke Rundensteiner Department of Computer Science Worcester Polytechnic Institure.
Prefetching for Visual Data Exploration Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.
Overview of Search Engines
SS ZG653Second Semester, Topic Architectural Patterns Pipe and Filter.
Query Processing Presented by Aung S. Win.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
1 Static Type Analysis of Path Expressions in XQuery Using Rho-Calculus Wang Zhen (Selina) Oct 26, 2006.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity Elke A. Rundensteiner, Luping Ding, Timothy Sutherland, Yali Zhu Brad Pielech, Nishant.
1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A. Rundensteiner.
- 1 - Embedded Systems - SDL Some general properties of languages 1. Synchronous vs. asynchronous languages Description of several processes in many languages.
Dimitrios Skoutas Alkis Simitsis
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
CCCT-041 Semantic Extensions to Domain- Specific Markup Languages Aparna Varde, Elke Rundensteiner, Murali Mani, Mohammed Maniruzzaman and Richard D. Sisson.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
____________________________ XML Access Control for Semantically Related XML Documents & A Role-Based Approach to Access Control For XML Databases BY Asheesh.
Csci 490 / Engr 596 Special Topics / Special Projects Software Design and Scala Programming Spring Semester 2010 Lecture Notes.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.
Improvement of Schema-Informed XML Binary Encoding Using Schema Optimization Method BumSuk Jang and Young-guk Ha' Konkuk University, Department of Computer.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
David Chiu and Gagan Agrawal Department of Computer Science and Engineering The Ohio State University 1 Supporting Workflows through Data-driven Service.
Research Overview Gagan Agrawal Associate Professor.
XML Stream Processing Yanlei Diao University of Massachusetts Amherst.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
System Software Theory (5KS03).
Parallel Programming By J. H. Wang May 2, 2017.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Adaptive Query Processing (Background)
Presentation transcript:

R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner and Murali Mani D atabase S ystems R esearch G roup Department of Computer Science Worcester Polytechnic Institute Worcester, Massachusetts, USA VLDB 2006 Seoul, Korea

Background: XML Stream Applications Wide-range and growing applications  Examples: news publishing and on-line auction systems Characteristics  Real-time processing: short response time  Limited resources: minimize memory News Publishing On-line Auction

Constraint Properties  Document Type Definition (DTD) or XML Schema  Constraints are statically available beforehand General XML Semantic Query Optimization (SQO)  Tree minimization  Recursion optimization Stream-specific XML SQO  Context-aware shortcutting  Token-granularity data output Background: Optimization Using Constraints

Motivation  Scenarios where static schema cannot be applied  Challenges when schema comes dynamically: - how to represent and manage runtime schema - how to exploit dynamic schema for runtime optimization - how to propagate runtime schema down stream Goals  Runtime schema encoding and synchronization  Semantic query optimization techniques  Runtime schema propagation R-SOX: Motivation and Goal

R-SOX: Architecture and Workflow Input Stream RSI Schema knowledge XQuery Result Schema Query Plan Plan Refinement Extended Raindrop XQuery Engine R-SOX System Annotated Output Stream Result Stream Schema Inf. Manager Query Plan Adaptor Query Plan Generator Stream Annotator Basic XQuery Evaluation Runtime Schema Refinement Runtime Semantic Query Optimization Downstream Schema Propagation R-SOX Contributions Future Work Raindrop Engine Demon Focus

Raindrop XQuery Engine  Construction of Raindrop plan  Automaton-based query evaluation Basic XQuery Evaluation stream s0s1s2 s3 s5 content s4 content s6 Query Automata comments source XQuery Q1-1: FOR $o in document(“news.xml")/stream/news RETURN $o/source, $o/comments SJoin on $x ExtractNest $b Nav stream//news -> $x Nav $x//source-> $b Stream Data Nav $x//comments->$c ExtractNest $c Raindrop XQuery Plan Input Token Stream: CNN… … … President… … …… …… news

Runtime Schema Information (RSI )  Representing RSI: RSI Grammar  Encoding RSI: - embedded into input XML token stream - extracted using DFA stream loader Managing Schema Information  Schema Graph: directed ordered graph  Schema graph synchronization with the newly received RSIs  History-aware RSI rollback R untime Schema Refinement Example of RSI: News ((source | comment)+, date+) RSI 1: ((news,inf,TIME), (/news/comment,, ),-) News (source+, date+) RSI 2: ((/news,200,COUNT), (/news/comment, /news/source, *), +) News (source*, comment+, date+)

Runtime Plan Adaptor  Incremental plan migration  Rule library  Rule applier Query Execution  Modifying automata computations  Switching execution modes  Performing event-condition actions Runtime SQO: Overview Supporting Following SQO Techniques: ( 1 ) Tree Minimization ( 2 ) Recursion Optimization ( 3 ) Fast Data Output ( 4 ) Navigation Shortcutting

Benefits  Expedite document traversal on pattern retrieval by avoiding unnecessary navigation  Change query plan at run-time by adjusting automata Query Execution  Temporarily removing and adding automaton states Runtime SQO: Tree Minimization RSIs: P1: ((stream,inf,Count), (/news, source, ), -) P2: ((stream,inf,Count), (/news, comments,), -) stream s0s1 news s2 s3 s5 content s4 content s6 Disable the transition by P2 Disable the transition by P1 news stream source date comments (1,∞) …… (1, ∞) Cut by P2 Cut by P1 …… Schema Graph Refinement Query Automata Refinement XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN $o/source, $o/comments (1, ∞ ) source comments

Benefits  Improve performance by avoiding unneces- sary over-head on recursive handling Optimization Processing  Detect recursion by analyze the runtime schema knowledge  Switch between recursion-aware/non- recursive operators  Characterize safe moments of runtime migration Runtime SQO: Recursion Optimization RecurSJoin on $x RecurExtractNest $b RecurNav stream//news -> $x RecurNav $x//source-> $b Stream Data RecurNav $x//comments->$c RecurExtractNest $c Operator Switching in the Query Plan XQuery Q2: (slightly different with Q1) FOR $o in document(“news.xml") stream//news RETURN $o/source, $o/comments Recursive-aware operators will be switched to the non-recursive operator if input XML data isn’t recursive Recursive Operator Non-recursive Operator RSIs: P1: ((news,inf,Count), (/news, news, ), - ) P2: ((news,inf,Count), (/news, news, ), +) P1P2

Benefits  Minimize memory consumption by avoiding unnecessary data storage and releasing buffered data at the earliest moment Optimization Processing  Augment query automata with Glushkov automata  Encode event-condition actions Runtime SQO: Fast Data Output Case 1: Overall Schema Knowledge as news((source | comments | date)+)  No order constraints can be used. Storing comments/content Case 2: Overall Schema Knowledge as News(source+,comments+,date+)  Global order constraint: Order( source, comments ) No storage is needed Case3: Overall Schema Knowledge as News( (source | comment)+, date+, comment+ )  Local order constraint: LocalOrder( source, comments ) Same as Case 1 at the beginning. Glushkov automata on the type “news” is used to indicate the completeness of source elements. After that, storage on comments/content is not needed XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN $o/source, $o/comments stream s1s2 news s3 source s4 s6 comments content s5 content s7 Actions Encoded into the Automata start S1 S4 comments source S2S3 datecomments source commentsdate Glushkov Automata for Type “News”

Benefit  Expedite document-order traversal on pattern retrieval by early filtering of failed patterns Optimization Rules  Order, occurrence and exclusive rules  Completeness and minimal cost optimization is guaranteed Query Execution  Introduce new pattern look-up into query automata  Encode event-condition actions Runtime SQO: Navigation Shortcut (I)

Runtime SQO: Navigation Shortcut (II) Utilizing Occurrence Constraints XQuery Q3: FOR $a in stream(bids)/auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” RETURN $b, $c Overall Schema Knowledge as: Occurrenc( phone, 2 ) when is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3 Overall Schema Knowledge as: Order( primary, homepage) when is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2 Utilizing Order Constraints Actions Encoded into the Automata

R-SOX System Demonstration Runtime Schema Refinement Runtime SQO Algebraic Query Plan Generation Application Scenarios: On-line auction data News publishing data

Recent Publications S.Wang etc. R-SOX: Runtime Semantic Query Optimization over XML Streams. VLDB H.Su etc. Automata Meets Algebra. DKE Journal M.Wei etc. Processing Recursive XQuery over XML Streams: the Raindrop Approach. XSDM H.Su etc. Semantic Query Optimization in an Automata-Algebra Combined XQuery Engine. VLDB H.Su etc. Semantic Query Optimization for XQuery over XML Streams. VLDB Acknowledgement NSF for the Support on Grants IIS and CNS Source Code Release Raindrop 1.0 is released: Raindrop Project