Download presentation
Presentation is loading. Please wait.
1
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation Michael Schmidt Stefanie Scherzinger Christoph Koch Saarland University Database Group Saarbrücken, Germany 2007 IEEE 23rd International Conference on Data Engineering - April 17, 2007
2
2 Outline I. Streaming XQuery Evaluation I. Streaming XQuery Evaluation –Motivation and Requirements –Desiderata to streaming and in-memory XQuery Engines –Existing Approaches II. Combining Static and Dynamic Buffer Minimization II. Combining Static and Dynamic Buffer Minimization –Query Normalization –The Concept of Roles –Active Garbage Collection –System Architecture –Optimizations III. The GCX XQuery Engine III. The GCX XQuery Engine –Prototype Implementation –Benchmark Result IV. Summary IV. Summary
3
3 Motivation and Requirements Growing importance of streaming XML processing comes along with the profileration of the WWW Growing importance of streaming XML processing comes along with the profileration of the WWW Streams may arrive at very high rates Streams may arrive at very high rates storing incoming data to disk often unfeasible Main memory DOM tree representation of XML documents very space-consuming Main memory DOM tree representation of XML documents very space-consuming buffer management becomes the key prerequisite to performance Problem becomes even more urgent when evaluating (powerful fragments of) XQuery rather than simple filters on data streams Problem becomes even more urgent when evaluating (powerful fragments of) XQuery rather than simple filters on data streams Streaming techniques very useful for in-memory XQuery enginges Streaming techniques very useful for in-memory XQuery enginges I.
4
4 Desiderata for in-memory XQuery Engines (1) Only buffer data that is relevant for query evaluation (2) Avoid multiple copies of the data in main memory (3) Do not keep data buffered longer than necessary Claim: Combination of static and dynamic analysis required to satisfy all desiderata I.
5
5 (1) Only buffer data that is relevant for query evaluation Document Projection Statical query analysis Detect parts of the document that are relevant to query evaluation Project away those parts of the document that are not relevant to query evaluation Existing Approaches (1) A. Marian and J. Siméon “Projecting XML Documents” In Proc. VLDB’03, pages 213–224, 2003. S. Bréssan, B. Catania, Z. Lacroix, Y. G. Li and A. Maddalena “Accelerating Queries by Pruning XML Documents” TKDE, 54(2):211–240, 2005. V. Benzaken, G. Castagna, D. Colazzo, and K. Nguyen “Type-Based XML Projection” In Proc. VLDB’06, 2006. I.
6
6 Existing Approaches (2) Document Projection { for $b in /bib/book where ($b/author= “ A. Turing ” and fn:exists($b/price)) return $b/title } XQuery Projection Paths { /bib/book, /bib/book/author/ dos::node(), /bib/book/price, /bib/book/title/ dos::node() } bib book authorpricetitle book authorpricetitle … … … … article ……… isbn …… … … XML document I. dos:=descendant-or-self
7
7 Existing Approaches (3) (2) Avoid multiple copies of the data in main memory (3) Do not keep data buffered longer than necessary Hard to satisfy both paradigms in combination { for $x1 in //book return for $x2 in //* return for $x3 in //article return } XQuery Two approaches: (1) Single DOM-tree (2) Buffers for variables I.
8
8 The Big Picture II. XQuery Normalized XQuery Projection Tree Roles Buffer (nodes annotated with roles) input stream Evaluator output stream Rewritten XQuery (role updates) transformation, extraction input, output communication variable bindings role removals, active garbage collection
9
9 Query Normalization (1) Rewriting where-expressions to if-statements (2) Pushing down if-statements { for $b in /bib where (fn:exists($b/book)) return { $b/book } } { for $b in /bib return ( if (fn:exists($b/book)) then else (), if (fn:exists($b/book)) then $b/book else (), if (fn:exists($b/book)) then else () ) } II.
10
10 Deriving Roles { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title ) } /bib /*/book / /title/dos::node()/price[1]dos::node() r1r1r1r1/ r2r2r2r2/bib$bib r3r3r3r3/bib/*$x r4r4r4r4/bib/*/price[1]$x/price r5r5r5r5/bib/*/dos::node()$x r6r6r6r6/bib/book$b r7r7r7r7/bib/book/title/dos::node()$b/title II.
11
11 Assigning Roles Matching document nodes get assigned roles when projected into the buffer Matching document nodes get assigned roles when projected into the buffer Roles assigned on-the-fly while reading the input Roles assigned on-the-fly while reading the input Nodes without roles and role-carrying ancestors need not to be buffered (projection) Nodes without roles and role-carrying ancestors need not to be buffered (projection) bib book author title { r 2 } { r 3, r 5, r 6 } { r 5 }{ r 5, r 7 } r 1 / r 2 /bib r 3 /bib/* r 4 /bib/*/price[1] r 5 /bib/*/dos::node() r 6 /bib/book r 7 /bib/book/title/dos::node() XML documentRoles II.
12
12 Inserting Role Updates { for $bib in /bib return (for $x in $bib/* return if (not(fn:exists($x/price))) then $x else (), for $b in $bib/book return $b/title) } { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } r 1 / r 2 /bib$bib r 3 /bib/*$x r 4 /bib/*/price[1]$x/price r 5 /bib/*/dos::node() $x r 6 /bib/book$b r 7 /bib/book/title/dos::node()$b/title II.
13
13 Active Garbage Collection { for $bib in /bib return ( for $x in $bib/* return ( if (not(exists($x/price))) then $x else (), signOff($x,r3), signOff($x/price[1],r4), signOff($x/dos::node(),r5) ), for $b in $bib/book return ( $b/title, signOff($b,r6), signOff($b/title/dos::node(),r7))) ), signOff($bib,r2) ) } Buffer: Output stream: Input stream: … bib book title {r 2 } {r 3, r 5, r 6 } {r 5, r 7 } author {r 5 } {r 5, r 6 } {r 7 }{} {r 6 } II.
14
14 { for $bib in /bib return (for $_1 in $bib/book (return $_1/book, signOff($_1/book/dos::node(),r 2 )), signOff($bib,r 1 )) } { for $bib in /bib return for $_1 in $bib/book return $_1/book } Optimizations Rewrite path steps to for-expressions Rewrite path steps to for-expressions Use aggregated roles Use aggregated roles Remove redundant roles Remove redundant roles { for $bib in /bib return $bib/book } { for $bib in /bib (return $bib/book, signOff($bib,r 1 ), signOff($bib/book/dos::node(),r 2 )) } II.
15
15 Garbage Collected XQuery Garbage Collected XQuery Implemented in C++ for a fragment of composition-free XQuery Implemented in C++ for a fragment of composition-free XQuery –Arbitrary nested single step for-loops –FWR-expressions –Child and descendant axes –Node-tests for tags, wildcards, node(), text() –If-expressions with and, or, not, fn:exists –Let/some-expressions and aggregations not yet supported –No support for attributes (no restriction) Open Source (Berkeley Software Distribution Licence) Open Source (Berkeley Software Distribution Licence) GCX project page: GCX project page: http://www.infosys.uni-sb.de/projects/streams/gcx/index.php GCX download page: GCX download page: http://www.infosys.uni-sb.de/software/gcx/ III. The GCX XQuery Engine
16
16 Benchmark Results (1) Time and memory consumption Time and memory consumption Queries and documents from the XMark Benchmark Queries and documents from the XMark Benchmark Queries and documents modified to match the supported fragment Queries and documents modified to match the supported fragment 3GHz CPU Intel Pentium IV with 2GB RAM 3GHz CPU Intel Pentium IV with 2GB RAM SuSe Linux 10.0, J2RE v1.4.2 for Java-based systems SuSe Linux 10.0, J2RE v1.4.2 for Java-based systems Time limit: 1 hour Time limit: 1 hour Benchmarks against the following systems Benchmarks against the following systems –FluX Java in-memory engine for streaming XQuery evaluation. –MonetDB v4.12.0/XQuery v0.12.0 A secondary storage engine written in C++. Loading of the document is included in time measurements. –QizX/open v1.1 Free in-memory XQuery engine written in Java. –Saxon v8.7.1 Free in-memory XQuery engine written in Java. III.
17
17 Benchmark Results (2) { for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then { $pe/name } else () } XMark Q1: Running time (s) III.
18
18 Benchmark Results (3) Memory Consumption (MB) { for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then { $pe/name } else () } XMark Q1: III.
19
19 Benchmark Results (4) { for $root in (/) return for $site in $root/site return for $people in $site/people return for $person in $people/person return { ( { $person/name }, { for $site2 in $root/site return for $cas in $site2/closed_auctions return for $ca in $cas/closed_auction return for $buyer in $ca/buyer return if ($buyer/buyer_person=$person/person_id) then { $ca } else () } ) } XMark Q8: III.
20
20 Benchmark Results (5) XMark Q8 Running time (s) Memory Consumption (MB) Failure for 100MB: MonetDB – Failure for 200MB: GCX, FluxQuery, MonetDB III.
21
21 Summary Combination of static and dynamic buffer minimization Combination of static and dynamic buffer minimization Roles are derived from the XQuery and assigned to matching document nodes in the preprojection phase Roles are derived from the XQuery and assigned to matching document nodes in the preprojection phase XQuery expression statically rewritten: at runtime, signOff-statements cause buffered nodes to lose roles XQuery expression statically rewritten: at runtime, signOff-statements cause buffered nodes to lose roles An active garbage collection mechanism removes nodes from buffers that have lost their last role An active garbage collection mechanism removes nodes from buffers that have lost their last role Document projection integrated in the role concept Document projection integrated in the role concept Technique behaves very well for composition-free XQuery w.r.t. execution time and memory consumption Technique behaves very well for composition-free XQuery w.r.t. execution time and memory consumption Applicable in streaming contexts, but also useful for common in-memory XQuery engines Applicable in streaming contexts, but also useful for common in-memory XQuery engines IV.
22
22 Thank you for your attention!
23
Z. Bar-Yossef, M. Fontoura, and V. Josifovski “On the Memory Requirements of XPath Evaluation over XML Streams” In Proc. PODS’04, pages 177–188, 2004 M. Benedikt, W. Fan, and F. Geerts “XPath Satisfiability in the Presence of DTDs” In Proc. PODS, pages 25–36, 2005 V. Benzaken, G. Castagna, D. Colazzo, and K. Nguyen “Type-Based XML Projection” In Proc. VLDB’06, 2006 S. Bréssan, B. Catania, Z. Lacroix, Y. G. Li and A. Maddalena “Accelerating Queries by Pruning XML Documents” TKDE, 54(2):211–240, 2005 L. Fegaras, R. Dash, and Y. Wang “A Fully Pipelined XQuery Processor” In XIME-P, 2006 L. Fegaras, D. Levine, S. Bose, and V. Chaluvadi “Query Processing of Streamed XML Data” In Proc. CIKM 2002, pages 126–133, 2002 T. J. Green, G. Miklau, M. Onizuka, and D. Suciu “Processing XML Streams with Deterministic Automata” In Proc. ICDT’03, pages 173–189, 2003 C. Koch “On the complexity of nonrecursive XQuery and functional query languages on complex values” ACM Transactions on Database Systems, 31(4), 2006 C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier “Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams” In Proc. VLDB’04, pages 228–239, 2004 X. Li and G. Agrawal “Efficient evaluation of XQuery over streaming data” In Proc. VLDB’05, pages 265–276, 2005 A. Marian and J. Siméon “Projecting XML Documents” In Proc. VLDB’03, pages 213–224, 2003 D. Olteanu, H. Meuss, T. Furche, and F. Bry “XPath: Looking Forward” In EDBT 02: Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers, pages 109–127, 2002 D. Olteanu, T. Kiesling, and F. Bry “An Evaluation of Regular Path Expressions with Qualifiers against XML Streams” In Proc. ICDE’03, page 702, 2003 H. Su, E. A. Rundensteiner, and M. Mani “Semantic Query Optimization for XQuery over XML Streams” In Proc. VLDB, pages 277–288, 2005 P. R. Wilson “Uniprocessor Garbage Collection Techniques” In Proc. IWMM’92, pages 1–42, 1992
24
24 Additional Resources
25
25 Full Benchmark Results GCXFluxQueryGalaxMonetDBSaxonQizx/open Q1 10MB0.18s / 1.2MB1.59s / 50MB5.45s / 186MB0.86s / 30MB1.48s / 80MB1.20s / 38MB 50MB0.92s / 1.2MB3.96s / 111MB42.33s / 880MB3.69s / 98MB4.29s / 292MB3.74s / 195MB 100MB1.87s / 1.2MB6.94s / 111MB02:07m / 1,8GB7.19s / 225MB7.96s / 547MB6.56s / 285MB 200MB3.53s / 1.2MB12.27s / 111MBtimeout13.60s / 244MB14.30s / 973MB11.82s / 480MB Q6 10MB0.34s / 1.2MBn/a7.66s / 240MB0.98s / 29MB1.73s / 82MB1.56s / 33MB 50MB1.68s / 1.2MBn/a57.98s / 1.2GB5.06s / 111MB5.78s / 292MB6.13s / 169MB 100MB3.33s / 1.2MBn/a5:08m / 2GB9.94s / 253MB10.85s / 622MB11.74s / 484MB 200MB6.42s / 1.2MBn/atimeout19.95s / 337MB20.14s / 1.2GB20.33s / 805MB Q8 10MB13.15s / 9.8MB18.04s / 128MB01:04m / 377MB02:56m / 407MB6.61s / 145MB9.89s / 148MB 50MB05:13m / 43MB06:51m / 169MB33:08m / 1.8GB03:26m / 1.35GB02:02m / 352MB03:38m / 265MB 100MB22:07m / 86MB27:01m / 216MBtimeout-08:39m / 650MB14:27m / 397MB 200MBtimeout -32:43m / 1.15GB52:05m / 636MB Q13 10MB0.17s / 1.2MB1.60s / 52MB5.92s / 182MB0.80s / 31MB1.53s / 48MB1.26s / 28MB 50MB0.85s / 1.2MB3.98s / 111MB43.91s / 899MB3.64s / 98MB4.45s / 292MB3.85s / 195MB 100MB1.69s / 1.2MB7.00s / 111MB02:04m / 1.8GB7.34s / 224MB8.35s / 547MB6.81s / 285MB 200MB3.24s / 1.2MB12.33s / 111MBtimeout13.52s / 271MB15.02s / 1.05GB12.30s / 480MB Q20 10MB0.25s / 1.2MB1.65s / 48MB6.95s / 215MB0.85s / 34MB1.65s / 62MB1.43s / 39MB 50MB1.24s / 1.2MB4.19s / 111MB53.08s / 1,5GB4.17s / 120MB4.90s / 292MB4.18s / 195MB 100MB2.48s / 1.2MB7.37s / 111B03:14m / 2GB8.47s / 247MB9.13s / 622MB8.71s / 350MB 200MB4.74s / 1.2MB13.14s / 111MBtimeout16.40s / 296MB16.58s / 1.15GB15.80s / 628MB
26
26 Benchmark Queries (1) { for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then { $pe/name } else () } { for $site in //site return for $regions in $site/regions return $regions//item }
27
27 Benchmark Queries (2) { for $root in (/) return for $site in $root/site return for $people in $site/people return for $person in $people/person return { ( { $person/name }, { for $site2 in $root/site return for $cas in $site2/closed_auctions return for $ca in $cas/closed_auction return for $buyer in $ca/buyer return if ($buyer/buyer_person=$person/person_id) then { $ca } else () } ) }
28
28 Benchmark Queries (3) { for $site in /site return for $regions in $site/regions return for $australia in $regions/australia return for $item in $australia/item return { ( { $item/name }, { $item/description } ) } }
29
29 Benchmark Queries (4) { for $site in /site return for $people in $site/people return for $person in $people/person return if (fn:not(fn:exists($person/person_income))) then $person else () }
30
30 Buffer Plot (1) { for $site in //site return for $regions in $site/regions return $regions//item } Buffer plot for XMark Q6 on 10MB input document According to the DTD: all regions occur at the beginning of the document
31
31 Buffer Plot (2) { for $root in (/) return for $site in $root/site return for $people in $site/people return for $person in $people/person return { ( { $person/name }, { for $site2 in $root/site return for $cas in $site2/closed_auctions return for $ca in $cas/closed_auction return for $buyer in $ca/buyer return if ($buyer/buyer_person=$person/person_id) then { $ca } else () } ) } Buffer plot for XMark Q8 on 10MB input document first partition of join partners: persons second partition of join partners: buyers
32
32 Buffer Plot (3) { for $bib in /bib return (for $x in $bib/* return if (not(exists($x/price))) then $x else (), for $b in $bib/book return $b/title) } XQuery bib (book|article)* title author price 9 x article + 1 x book 9 x book + 1 x article
33
33 The GCX Runtime Engine Stream Preprojector Buffer Manager Evaluator XQuery input stream output stream nodes/roles node lookup garbage collection node/eos signOff($x/π,r) OK node/NULL getNext($x/π) Buffer nextNode()
34
34 System Architecture XQuery Normalized XQuery Evaluator Buffer (nodes & roles) role updates input input stream output stream Stream Preprojector Rewritten XQuery (role updates) Projection Paths Projection DFA ( constructed lazily, assigns roles) Roles
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.