RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS 0414567.

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Fine Grained Access Control in XML DataBase Systems Naveen Yajamanam April 27,2006.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
UnInformed Search What to do when you don’t know anything.
Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
Ling Wang, Mukesh Mulchandani Advisor: Elke A. Rundensteiner Rainbow Research group, DSRG, WPI Updating XQuery Views over Relational Data.
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
1 Murali Mani Topics projects in databases and web applications and XML Database Systems Research Lab @cs.wpi.eduWebpages:
Database Systems and XML David Wu CS 632 April 23, 2001.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
Prefetching for Visual Data Exploration Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.
Query Processing Presented by Aung S. Win.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Querying Structured Text in an XML Database By Xuemei Luo.
Cayuga: A General Purpose Event Monitoring System Mirek Riedewald Joint work with Alan Demers, Johannes Gehrke, Biswanath Panda, Varun Sharma (IIT Delhi),
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Dive into the Query Optimizer Dive into the Query Optimizer: Undocumented Insight Benjamin Nevarez Blog: benjaminnevarez.com
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Database Systems Part VII: XML Querying Software School of Hunan University
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.
Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Processing CS 405G Introduction to Database Systems.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Self Healing and Dynamic Construction Framework:
Applying Control Theory to Stream Processing Systems
Chapter 12: Query Processing
Data Stream Management System (DSMS)
OrientX: an Integrated, Schema-Based Native XML Database System
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Adaptive Query Processing (Background)
Presentation transcript:

RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS

June 08, 2006DSRG, WPI2 Acknowledgements NSF for the financial support Joint work with several others Prof. Elke A. Rundensteiner Graduate students – Hong Su, Ming Li, Mingzhu Wei, Shoushen Wang, Jinhui Jian Undergraduate students – Drew Ditto, Bogomil Tselkov …

June 08, 2006DSRG, WPI3 Applications Need for efficient stream data processing Monitor patient data in real time Sensor networks – fire detection; battle field deployment; traffic congestion Others – news delivery, monitor network traffic, …

June 08, 2006DSRG, WPI4 No Calendar of French Impressionism by Monet $20 … Token-by-Token access manner timeline Pattern retrieval + Filtering + Restructuring for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e No XML Stream Processing

June 08, 2006DSRG, WPI5 Option 1: Automata-Based Pattern Retrieval for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 privacy 3 5 description 4 emph 2 auction 0 When patterns are retrieved depends on the data Additional Data Structures for Buffering Filtering Restructuring …

June 08, 2006DSRG, WPI6 Option 2: “DOM” Based Pattern Retrieval Navigate $a, /description/emph->$e Navigate $a, /privacy-> $p Tagger Select $e = “French Impressionism” Logic Plan Navigate-Index $a, /description/emph -> $e Select $e = “French Impressionism” Tagger Navigate-Scan $a, /privacy -> $p Physical Plan Choose low-level implementation alternatives for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e Navigate $a, /privacy->$p Rewrite by “pushing down selection” Navigate $a,/description/emph->$e Select $e=“French Impressionism” Tagger Rewritten Logic Plan When patterns are retrieved depends on other patterns

June 08, 2006DSRG, WPI7 Which paradigm is better? Minimal pushdown plans win over maximal pushdown when selectivity < 50%

June 08, 2006DSRG, WPI8 Problem How to provide the framework to choose between these paradigms? Model both paradigms uniformly as algebraic operators. Use a cost model to choose optimal plan given data statistics.

June 08, 2006DSRG, WPI9 Automaton as TokenNav StructuralJoin $a Extract $a TokenNav $s, /auctions/auction->$a for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 privacy 3 5 description 4 emph 2 auction 0 Select non-empty($b) Select $e=“French …” Extract $b Extract $e TokenNav $a, /privacy->$b TokenNav $a,/desc/emph->$e

June 08, 2006DSRG, WPI10 DOM Navigation as NodeNav Extract $a TokenNav $s, /auctions/auction->$a for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 2 auction 0 Select non-empty($b) Select $e=“French …” NodeNav $a, /privacy->$b NodeNav $a,/desc/emph->$e

June 08, 2006DSRG, WPI11 Exploring the Search Space A pattern can be retrieved inside the automaton or outside the automaton However there are dependencies for $a in …/a, $b in $a/…, $c in $b/… NodeNav for $b => NodeNav for $c TokenNav for $b => TokenNav/NodeNav for $c

June 08, 2006DSRG, WPI12 Run-time Optimization Statistics unknown before data arrives Statistics could change over time We need techniques for efficient statistics monitoring, search space exploration and plan migration (safe points for migration)

June 08, 2006DSRG, WPI13 Run-time Optimization Create an initial plan Run initial plan and collect statistics at same time Generate new plan using statistics collected Pause receiving stream Migrate to new plan Resume receiving stream Stream Query plan executor statistics Initial Query plan Query Optimizer New Query plan Plan Migrator New Query plan

June 08, 2006DSRG, WPI14 Executing a Raindrop Plan

June 08, 2006DSRG, WPI15 Key Ideas Minimum Memory requirements Discard data early Output data early

June 08, 2006DSRG, WPI16 In-Time Structural Join for $a in open_auctions/auction[privacy] let $e := $a/description/emph where $e = “French Impressionism” return $a, $e auctions 1 privacy 3 5 description 4 emph 2 auction 0 StructuralJoin $a Extract $a TokenNav $s, /auctions/auction->$a Select non-empty($b) Select $e=“French …” Extract $b Extract $e TokenNav $a, /privacy->$b TokenNav $a,/desc/emph->$e

June 08, 2006DSRG, WPI17 Better than In-Time Structural Join StructuralJoin $r Extract $a TokenNav $s, /root->$r for $r in /root return $r/a $r/b root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b “a” tokens need not be stored

June 08, 2006DSRG, WPI18 Evaluating Predicates StructuralJoin $r Extract $a TokenNav $s, /root->$r for $r in /root where $r/a = “value” return $r/b root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b Once $a=“value” is satisfied, “b” tokens need not be stored Select $a=“value”

June 08, 2006DSRG, WPI19 Using schema knowledge StructuralJoin $a Extract $a TokenNav $s, /root->$r for $r in /root return $r/a $r/b root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b “a”, “b” tokens need not be stored root -> (a*, b*)

June 08, 2006DSRG, WPI20 Using Schema Knowledge for Predicates StructuralJoin $r Extract $a TokenNav $s, /root->$r for $r in /root where $r/a = “value” return $r/b root a b Extract $b TokenNav $r, /a->$a TokenNav $r, /b->$b Once “c” is seen and $a=“value” is not yet satisfied, “b” tokens can be discarded Select $a=“value” root -> (b*, a*, c)

June 08, 2006DSRG, WPI21 Conclusions Raindrop integrates automaton and “DOM” navigation into one algebraic framework. Cost-based optimization possible. Execution minimizes memory requirements.

June 08, 2006DSRG, WPI22 Ongoing Work Load shedding in XML stream processing. Utilizing Dynamic schema changes for optimization.

June 08, 2006DSRG, WPI23 Fragment of XQuery supported FLWR expressions (no conditionals/user defined functions) Path expressions use only forward axes (child, descendant, descendant or self, attribute) Predicates supported are of the form: pathExpr relOp constant

June 08, 2006DSRG, WPI24 Issues with correlated queries for $r in /root return for $a in $r/a return $r/b

June 08, 2006DSRG, WPI25 Visit our XQuery engine over XML stream project (RAINDROP) website