Download presentation
Presentation is loading. Please wait.
1
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov 5, 2003
2
2 Need for Stream Processing New computing environment Data sources can be anywhere/anytime On-line arriving data Data requests can be anywhere/anytime Real-time response requirement New applications Relational Sensor networks XML Analysis of XML web logs Selective dissemination of XML information (e.g., news)
3
3 What ’ s Special for XML Stream Processing Dream Catcher King S. Bt Bound 30 … Dream Catcher … Token-by-Token access manner timeline Pattern retrieval + Filtering + Restructuring FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t Token: not a direct counterpart of a tuple 30Bt BoundS.KingDream2001 pricepublisherfirstlasttitleyear Pattern Retrieval on Token Streams
4
4 Two Computation Paradigms Automata-based [yfilter02, xscan01, xsm02, xsq03, xpush03…] Algebraic [niagara00, …] This Raindrop framework intends to integrate both paradigms into one
5
5 Automata-Based Paradigm FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t 1 book * 2 4 title price Auxiliary structures for: 1.Buffering data 2.Filtering 3.Restructuring … //book //book/title //book/price 3
6
6 Algebraic Computation book title author last first publisherprice Text … Navigate $b, /title -> $t Navigate $b, /price->$p Navigate $b, /title-> $t Tagger Select $p < 30 Logic Plan Navigate //$b, /title->$t Rewrite by “pushing down selection” Navigate $b,/price->$p Select $p < 30 Tagger Rewritten Logic Plan Navigate-Index $b, /price -> $p Select $p < 30 Tagger Navigate-Scan $b, /title -> $t Physical Plan Choose low- level implementation alternatives FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t $b $t … …
7
7 Observations Either paradigm has deficiencies Both paradigms complement each other Automata ParadigmAlgebra Paradigm Good for pattern retrieval on tokensDoes not support token inputs Need patches for filtering and restructuring Good for filtering and restructuring Present all details on same low levelSupport multiple descriptive levels (declarative->procedural) Little studied as query processing paradigm Well studied as query process paradigm
8
8 How to Integrate Two Paradigms
9
9 How to Integrate Two Models? Design choices Extend algebraic paradigm to support automata? Extend automata paradigm to support algebra? Come up with completely new paradigm? Extend algebraic paradigm to support automata Practical Reuse & extend existing algebraic query processing engines Natural Present details of automata computation at low level Present semantics of automata computation (target patterns) at high level
10
10 Raindrop: Four-Level Framework Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan Abstraction Level High (Declarative) Low (Procedural)
11
11 Level I: Semantics-focused Plan [Rainbow- ZPR02] Express query semantics regardless of stored or stream input sources Reuse existing techniques for stored XML processing Query parser Initial plan constructor Rewriting optimization Decorrelation Selection push down …
12
12 FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t Dream Catcher King S. Bt Bound 30 … $S1 … $S1 … $b … … … $S1 … $b … $p 30 … … … $S1 … $b … $p 30 $t Dream Catcher …... …… NavUnnest $S1, //book ->$b NavNest $b, /price/text() ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Example Semantics-focused Plan
13
13 Level II: Stream Logical Plan Extend semantics-focused plan to accommodate tokenized stream inputs New input data format: contextualized tokens New operators: StreamSource, Nav, ExtractUnnest, ExtractNest, StructuralJoin New rewrite rules: Push-into-Automata
14
14 One Uniform Algebraic View Token-based plan (automata plan) Tuple-based plan Tuple stream XML data stream Query answer Algebraic Stream Logical Plan
15
15 Modeling the Automata in Algebraic Plan: Black Box[XScan01] vs. White Box $b := //book $p := $b/price $t := $b/title Black Box FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t XScan StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t White Box Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b
16
16 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t Tuple-based plan Token-based plan (automata plan)
17
17 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b Tuple-based plan
18
18 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b Select $p<30 Tagger “Inexpensive”, $t->$r
19
19 From Semantics-focused Plan to Stream Logical Plan StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Nav $b, /price/text()->$p Nav $b, /title->$t Nav $S1, //book ->$b Select $p<30 Tagger “Inexpensive”, $t->$r NavUnnest $S1, //book ->$b NavNest $b, /price/text() ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Apply “push into automata”
20
20 Level III: Stream Physical Plan For each stream logical operator, define how to generate outputs when given some inputs Multiple physical implementations may be provided for a single logical operator Automata details of some physical implementation are exposed at this level Nav, ExtractNest, ExtractUnnest, Structural Join
21
21 One Implementation of Extract/Structural Join 1 book title * 4 price 3 2 ExtractNest $b, $t ExtractNest /$b, $p SJoin //book … Dream Catcher … … … Nav $b, /price->$p Nav $b, /title->$t Nav., //book ->$b
22
22 Level IV: Stream Execution Plan Describe coordination between operators regarding when to fetch the inputs When input operator generates one output tuple When input operator generates a batch When a time period has elapsed … Potentially unstable data arrival rate in stream makes fixed scheduling strategy unsuitable Delayed data under scheduling may stall engine Bursty data not under scheduling may cause overflow
23
23 Raindrop: Four-Level Framework (Recap) Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan Express the semantics of query regardless of input sources Accommodate tokenized input streams Describe how operators manipulate given data Decides the Coordination among operators
24
24 Optimization Opportunities
25
25 Optimization Opportunities Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan General rewriting (e.g., selection push down) Break-linear-navigation rewriting Physical implementations choosing Execution strategy choosing
26
26 From Semantics-focused to Stream Logical Plan: In or Out? Token-based plan (automata plan) Tuple-based Plan Tuple stream XML data stream Query answer Pattern retrieval in Semantics- focused plan Apply “push into automata”
27
27 Plan Alternatives Nav $b, /price->$p ExtractNest $b, $p ExtractNest $b, $t SJoin //book Select price < 30 Tagger Nav $b, /title->$t Nav $S1, //book->$b ExtractNest $S1, $b Navigate /price Select price<30 Navigate book/title Tagger Nav $S1, //book->$b NavUnnest $S1, //book ->$b NavNest $b, /price ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Out In
28
28 Experimentation Results
29
29 Contributions Combined automata and algebra based paradigms into one uniform algebraic paradigm Provided four layers in algebraic paradigm Query semantics expressed at high layer Automata computation on streams hidden at low layer Supported optimization at an iterative manner (from high abstraction level to low abstraction level) Illustrated enriched optimization opportunities by experiments
30
30 Email: suhong@cs.wpi.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.