Download presentation
Presentation is loading. Please wait.
1
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003
2
2 Need for Stream Processing New environment Data source is everywhere Data request is everywhere New applications Sensor networks Analysis of XML web logs Selective dissemination of XML information (e.g., news) New features On-line arriving data Potentially unstable data Real-time response requirement Scalability requirement
3
3 Specific Challenges for XML Streams Pattern retrieval on nested data + filtering/restructuring FOR $b in doc (bib.xml) //book LET $p := $b/price $t := $b/title WHERE $p > 50 Return $t TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 … Token-by-Token access manner timeline TCP/IP Illustrated … A token: can be an open tag/close tag/PCDATA is not a direct counterpart of a tuple
4
4 Observations and Questions Observations Pattern retrieval->The Automata model is long studied for pattern retrieval on tokens Filtering/Structuring->The Algebraic model is long studied for optimizing query plan on tuples Questions How to integrate the two models? How to optimize a query within the integrated query model?
5
5 Uniform Modeling in an Algebraic Framework
6
6 A Running Example Give me book titles whose price is greater than 50: FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 Languages and Machines Sudkamp T. Addison-Wesley 39.95 … TCP/IP Illustrated … timeline TCP/IP Illustrated Stevens … … Input XML stream
7
7 Automata Computation: NFAs + Buffers FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title 1 book * 2 4 title 3 price TCP/IP Illustrated 65.96 Buffer for title Buffer for price t0 t1 t2 t3 t4 t5 t6 t7 TCP/IP Illustrated 65.95 … input active states+0+1+1,2+1,4-1,4+1,3…… stack[0] [1] [0] [1] [1,2] [0] [1] [1,2] [1,4] [0] [1] [1,2] [0] [1] [1,2] [1,3] …… No materialization needed Multiple patterns resolved in one pass
8
8 Algebraic Computation FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title Extract //book Navigate //book, price Select price > 50 Tagger Navigate //book, title book title author last first publisherprice Text Selection push- down enabled
9
9 The Raindrop Approach Uniform Automata computation modeled in an algebraic manner Tight-coupling Automata and regular tuple-based computation interchangeable
10
10 Path Bindings in XQuery FOR $b in doc (bib.xml) //book LET $p := $b/price, $t := $b/title WHERE $p > 50 RETURN $t FLWR expression: FOR…LET...WHERE…RETURN… Path bindingsFiltering and restructuring “The purpose of path bindings is to produce a tuple stream in which each tuple consists of one or more bound variables” [W3C]
11
11 Data Flow Automata plan Regular algebraic plan Tuple stream XML data stream Query answer
12
12 Modeling the Automata Plan: Black Box[xscan] vs. White Box Automata Plan Q1 := //book Q2 := //book/price Q3 := //book/title SJoin //book Extract //book/price Extract //book/title Black Box White Box
13
13 A Unified Process at the Logical View Select //book/price >5 0 Navigate //book, //book/title SJoin //book Extract //book/price Extract //book/title
14
The Algebra Core OpSymbolSemantic Selection Filter tuples based on the predicate pred Projection Filter columns in the input tuples based on the variable list v Join Join input tuples based on the predicate pred Aggregate Aggregate over input tuples with the aggregate function f, e.g., sum and average Tagger Format outputs based on the pattern pt, i.e., reconstruct XML tags Navigate Take input elements of path p1 and output ancestor elements of path p2 Extract Identify elements of path p from the input stream Structural Join Join input tuples on their structural relationship, e.g, the common parent relationship p Relatinal- like XML- Specific
15
15 The Extract Operator 12 book * Extract //book/title TCP/IP Illustrated … … 1 title TCP/IP Illustrated Data on the Web Advanced Programming in the Unix environment
16
16 The Structural Join Operator 12 book 3 title * 4 price Extract //book/title Extract //book/price SJoin //book FOR $b in doc (bib.xml) //book LET $p := $b/price, $t := $b/title WHERE $p > 50 RETURN $t … TCP/IP Illustrated … … … Tight coupling …
17
17 The Navigate Operator TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 … … … … … … … … … Navigate //book, title
18
18 Optimization
19
19 In or Out? Automata plan Regular algebraic plan Tuple stream XML data stream Query answer Pattern retrieval
20
Pattern Retrieval Alternatives … …</price … TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 … … … … … … … … … … … … In Automata (/title, /price) Out of Automata(/title, /price) 1 book * 2 4 title 3 price 1 book * 2
21
21 Plan Alternatives 1 Extract //book * Navigate //book, price 2 book Select price >5 0 Navigate //book, title The pull-out plan Extract //book/price 1 3 4 title price Extract //book/title * SJoin //book 2 book Select //book/price >50 The push-in plan Tagger
22
22 Experiment 1:
23
23 Experiment 2
24
24 Camp 1: Complete Automata Model [XSQ, XSM, XPush] All details on the same level Hard to understand Not suitable for optimizing at different levels Little studied for using automata as query processing paradigm For $x in $R/a return for $Y in $X/b return $Y, $X 0,0,0 1,0,0 2,1,0 2,2,1 2,2,2 2,1,3 1,1,3 1,2,2 1,2,1 1,1,0 *r=er|r++ *r=sr|r++ *r!= |r++ *r= |w(x,sx),w(x, ), r++,x”++ *r= |w(x, ), w(x,ex),r++,xs=x *r!= &*r!= | w(x,*r),r++,x”++ *r= |w(x, ),r++ * true|xm=x’, w(o, ),w(o, ),x’++ *r!= &*r!= |w(x,*r),w(o,*r), x”++,r++ *r= |w(x, ),w(o, ),r++,x”++ !AE(x’)&*x’!=ex| w(o,*x’),x’++ AE(x’)&*r!= |w(x,*r), w(o,*r),r++,x”++ AE(x’)&*r= |w(x, ),w(o, ),w(x,ex),r++,x’++ !AE(x’)&x’!=ex|w(o,*x’),x’++ !AE(x”)&x”= |w(o, ),x”++ !AE(x”)&*x”!= | w(o,*x”),x”++ True|xm=x’,w(o, ), w(o, ),x’++ !AE(x”)&*x”= |x”++ !AE(x”)&*x”!= &*x”!=ex|x”++ !AE(x”)&*x” =ex|xs=x”
25
25 Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter] Fixed interface for automata computation (all pattern retrieval pushed down) No opportunity of pushing/pulling computation into/from automata Bloated, black box operator Algebraic rewriting impossible for internal optimization Automata Plan $b := //book $p := //book/price $t := //book/title $b$p$t
26
26 Contribution Automata and algebra modeled into one framework allowing a uniform logical view Opportunity of push-into-automata and pull-out of- automata provided via query rewriting Optimization necessity verified by experiments
27
27 http://davis.wpi.edu/dsrg/raindrop/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.