A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with.
XQuery Or, what about REAL databases?. XQuery - its place in the XML team XLink XSLT XQuery XPath XPointer.
XQUERY. What is XQuery? XQuery is the language for querying XML data The best way to explain XQuery is to say that XQuery is to XML what SQL is to database.
Fine Grained Access Control in XML DataBase Systems Naveen Yajamanam April 27,2006.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.
RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS
11/08/2002WIDM20021 An Algebraic Approach For Incremental Maintenance of Materialized XQuery Views Maged EL-Sayed, Ling Wang, Luping Ding, and Elke A.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
Introduction to XML Algebra
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
XQuery: 1 W3C (World Wide Web Consortium) What is W3C? –An industry consortium, best known for standardizing HTML and XML. –Working Groups create or adopt.
A Graphical Environment to Query XML Data with XQuery
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.
1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.
Processing of structured documents Spring 2003, Part 8 Helena Ahonen-Myka.
1 Mashroom: End-User Mashup Programming Using Nested Tables Guiling Wang, Shaohua Yang, Yanbo Han Institute of Computing Technology (ICT) Chinese Academy.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.
1 XPath XPath became a W3C Recommendation 16. November 1999 XPath is a language for finding information in an XML document XPath is used to navigate through.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Querying Structured Text in an XML Database By Xuemei Luo.
Database Systems Part VII: XML Querying Software School of Hunan University
1 SIGMOD 2000 Christophides Vassilis On Wrapping Query Languages and Efficient XML Integration V. Christophides, S. Cluet, J Simeon Computer Science Department,
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
1 Relational Algebra and Calculas Chapter 4, Part A.
WPI, MOHAMED ELTABAKH PROCESSING AND QUERYING XML 1.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
IST 210 The Relational Language Todd S. Bacastow January 2004.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
Chapter 13: Query Processing
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Relational Calculus Chapter 4, Section 4.3.
Efficient Evaluation of XQuery over Streaming Data
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Query Processing for High-Volume XML Message Brokering
Querying XML XPath.
Structure and Content Scoring for XML
Querying XML XPath.
Probabilistic Databases
Structure and Content Scoring for XML
XQuery Leonidas Fegaras.
Adaptive Query Processing (Background)
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Presentation transcript:

A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

Need for Stream Processing New environment  Data sources are everywhere  Data requests are everywhere New applications  Sensor networks  Analysis of XML web logs  Selective dissemination of XML information (e.g., news)

Specific Challenges for XML Streams Dream Catcher King S. Bt Bound 20 … Token-by-Token access manner timeline Dream Catcher … Token: not a direct counterpart of a tuple Pattern retrieval + Filtering/Restructuring FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t

Two Computation Paradigms Automata-based [yfilter02, x-scan01, xsm02, xsq03, xpush03…] Algebraic [niagara00, …] This project intends to integrate both paradigms into one

Automata Paradigm: FOR $b in stream(biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN $t 1 book * 2 4 title 3 price 5 Text() Auxiliary structures for: 1.Buffering data 2.Evaluating predicates 3.Restructuring buffered data … //book //book/title //book/price/text()

Algebraic Computation book title author last first publisherprice Text Navigate //book, price Tagger Navigate //book, title Select price < 30 Navigate //book, price Select price < 30 Tagger Navigate //book, title Selection push-down enabled FOR $b in doc (biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN $t … … … … Navigate //book, /title

Observations Automata paradigm  Good and long studied for pattern retrieval on tokens  Patches needed for complex filtering and restructuring Algebraic paradigm  Good and long studied for expressing and optimizing query plans on sets of tuples  Tokenized inputs not accommodated yet Either paradigm has deficiencies Both patterns complement each other

Research Challenges How to integrate the two models? How to optimize a query within the integrated query model?

Raindrop Approach: Uniform Modeling in an Algebraic Framework

Uniform Algebraic Plan XML data stream Query answer Algebraic Plan

Uniform Algebraic Plan Token-based plan (automata plan) Tuple-based plan Tuple stream XML data stream Query answer

Modeling the Automata in Algebraic Plan: Black Box[xscan] vs. White Box $b := //book $p := $b/price $t := $b/title SJoin //book Extract //book/price Extract //book/title Black Box White Box Xscan FOR $b in stream(biditems.xml) //book LET $p = $b/price/text(), $t = $b/title WHERE $p < 30 RETURN $t

A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t Token-based plan (automata plan) Tuple-based plan

A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t Tuple-based plan SJoin //book Extract $p, //book/price Extract $t, //book/title

A Unified Process at the Logical View FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t SJoin //book Extract //book/price Extract //book/title Select //book/price >5 0 Navigate //book, //book/title

The Algebra Core OpSymbolSemantic Selection Filter tuples based on the predicate pred Projection Filter columns in the input tuples based on the variable list v Join Join input tuples based on the predicate pred Aggregate Aggregate over input tuples with the aggregate function f, e.g., sum and average Tagger Format outputs based on the pattern pt, i.e., reconstruct XML tags Navigate Take input elements of path p1 and output ancestor elements of path p2 Extract Identify elements of path p from the input stream Structural Join Join input tuples on their structural relationship, e.g, the common parent relationship p Relational -like XML- Specific SJ

Extract Operator 12 book * Extract //book/title Dream Catcher … … 1 title Dream Catcher

Structural Join Operator 12 book 3 title * 4 price Extract //book/title Extract //book/price SJoin //book … Dream Catcher … … … FOR $b in doc (biditems.xml) //book LET $p := $b/price/text() $t := $b/title WHERE $p < 30 Return $t

Optimization via Query Rewriting

In or Out? Token-based plan (automata plan) Tuple-based Plan Tuple stream XML data stream Query answer Pattern retrieval

Plan Alternatives Extract //book Navigate /price Select price<30 Navigate book/title The pull-out plan Extract //book/price Extract //book/title SJoin //book Select price < 30 The push-in plan Tagger

Pattern Retrieval Alternatives … In Automata (/title, /price) 1 book * 2 4 title 3 price … … Dream Catcher King S. Bt Bound 20 … … … … … … … Out of Automata(/title, /price) 1 book * 2 t2 t10 t2 t10 SJ

Experiment: Selectivity = 5%Selectivity = 90%

Related Work

Camp 1: Complete Automata Model [XSQ, XSM, XPush] For $x in $R/a return for $Y in $X/b return $Y, $X 0,0,0 1,0,0 2,1,0 2,2,1 2,2,2 2,1,3 1,1,3 1,2,2 1,2,1 1,1,0 *r=er|r++ *r=sr|r++ *r!= |r++ *r= |w(x,sx),w(x, ), r++,x”++ *r= |w(x, ), w(x,ex),r++,xs=x *r!= &*r!= | w(x,*r),r++,x”++ *r= |w(x, ),r++ * true|xm=x’, w(o, ),w(o, ),x’++ *r!= &*r!= |w(x,*r),w(o,*r), x”++,r++ *r= |w(x, ),w(o, ),r++,x”++ !AE(x’)&*x’!=ex| w(o,*x’),x’++ AE(x’)&*r!= |w(x,*r), w(o,*r),r++,x”++ AE(x’)&*r= |w(x, ),w(o, ),w(x,ex),r++,x’++ !AE(x’)&x’!=ex|w(o,*x’),x’++ !AE(x”)&x”= |w(o, ),x”++ !AE(x”)&*x”!= | w(o,*x”),x”++ True|xm=x’,w(o, ), w(o, ),x’++ !AE(x”)&*x”= |x”++ !AE(x”)&*x”!= &*x”!=ex|x”++ !AE(x”)&*x” =ex|xs=x”

Camp 1: Complete Automata Model [XSQ, XSM, XPush] All details are presented on the same level (and low level!)  Hard to understand  Not suitable for optimizing at different levels Little has been studied for using automata as query processing paradigm

Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter] Fixed interface for automata computation (all pattern retrieval pushed down)  No opportunity of pushing/pulling computation into/from automata Bloated, black box operator  Algebraic rewriting impossible for internal optimization Automata Plan $b := //book $p := //book/price $t := //book/title $b$p$t

Contributions Combining automata and algebra leads to a powerful query processing model  Modeling: Uniform, simple logical view – better understandability  Optimization: Uniform rewriting – more optimization opportunities (e.g., pushin/pullout) Optimization necessity is verified by experiments

Experiment 2 Number of patterns = 2Number of patterns = 20