Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with.

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
On the Memory Requirements of XPath Evaluation over XML Streams Ziv Bar-Yossef Marcus Fontoura Vanja Josifovski IBM Almaden Research Center.
Jennifer Widom Querying XML XSLT. Jennifer Widom XSLT Querying XML Not nearly as mature as Querying Relational  Newer  No underlying algebra Sequence.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation Michael Schmidt Stefanie Scherzinger Christoph Koch.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
XQuery: 1 W3C (World Wide Web Consortium) What is W3C? –An industry consortium, best known for standardizing HTML and XML. –Working Groups create or adopt.
A Graphical Environment to Query XML Data with XQuery
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
Query Languages - XQuery Slides partially from Dan Suciu.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
Buffering in Query Evaluation over XML Streams Ziv Bar-Yossef Technion Marcus Fontoura Vanja Josifovski IBM Almaden Research Center.
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Πανεπιστήμιο Κρήτης Σχολή Θετικών Επιστημών Τμήμα Επιστήμης Υπολογιστών ΗΥ-561: Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό Xquery Streaming à la Carte &
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Buffering in Query Evaluation over XML Streams Ziv Bar-Yossef Technion Marcus Fontoura Vanja Josifovski IBM Almaden Research Center.
Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral processing of large XML data using XSLT with optimal memory complexity formal model.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Database Systems Part VII: XML Querying Software School of Hunan University
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
XML Stream Processing Yanlei Diao University of Massachusetts Amherst.
Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.
Dan SuciuXML Toolkit1 XMLTK: An XML Toolkit for Scalable XML Stream Processing I. Avila-Campillo, T.J. Green, A. Gupta, M. Onizuka, D. Raven, D. Suciu.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Efficient Evaluation of XQuery over Streaming Data
Compressing XML Documents with Finite State Automata
Efficient Filtering of XML Documents with XPath Expressions
(b) Tree representation
Querying XML XPath.
Querying XML XPath.
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Early Profile Pruning on XML-aware Publish-Subscribe Systems
XQuery Leonidas Fegaras.
B. Stegmaier und R. Kuntschke TU München – Fakultät für Informatik
A. Kemper, R. Kuntschke, and B. Stegmaier
Adaptive Query Processing (Background)
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with Christoph Koch (TU Wien) Stefanie Scherzinger (TU Wien) Nicole Schweikardt (HU Berlin)

FluX – Intl. Conf. on Very Large Databases Outline Motivation FluX Query Language Translating XQuery into FluX Further Aspects Experiments Conclusion

FluX – Intl. Conf. on Very Large Databases Traditional Approach Bibliography DTD List title(s) and authors of books {for $b in /bib/book return {$b/title} {$b/author} } Evaluation of book -node 1.Print 2.Buffer titles and authors 3.Output titles 4.Output authors 5.Print … Kemper Datenbanksysteme Eickler 40€ … Example: Buffer: Kemper Datenbanksysteme Eickler Output: Datenbanksysteme Kemper Eickler

FluX – Intl. Conf. on Very Large Databases The FluX Approach Bibliography DTD List title(s) and authors of books {for $b in /bib/book return {$b/title} {$b/author} } FluX query (for book node) … {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $a in $b/author return $a}} … Kemper Datenbanksysteme Eickler 40€ … Example: Buffer: Kemper Eickler Output: Datenbanksysteme Kemper Eickler  Less buffering using order constraints

FluX – Intl. Conf. on Very Large Databases The FluX Approach II Bibliography DTD List title(s) and authors of books {for $b in /bib/book return {$b/title} {$b/author} } FluX query … {process-stream $b: on title as $t return $t; on author as $a return $a;} … Datenbanksysteme Kemper Eickler 40€ … Example: Buffer: Output: Datenbanksysteme Kemper Eickler  No buffering using order constraints!

FluX – Intl. Conf. on Very Large Databases Outline Motivation FluX Query Language Translating XQuery into FluX Further Aspects Experiments Conclusion

FluX – Intl. Conf. on Very Large Databases FluX Query Language Based on XQuery fragment XQuery -  ε (empty)  s (output fixed string)  α β (sequence)  {for $x in $y/π [where χ] return α} (for loop)  {$x/π} (output path)  {$x} (output)  {if χ then α} (conditional)

FluX – Intl. Conf. on Very Large Databases FluX Query Language XQuery - expression is simple  Can be executed without buffering the stream Example 1: {$x} {if $x/b = 5 then 5 } simple {$x} Example 2: not simple

FluX – Intl. Conf. on Very Large Databases FluX Query Language (ctd.) FluX expressions  Simple XQuery - expression  s {process-stream $y: H } s´ Event handlers H  on-first past( S ) return α α: XQuery - expression S: set of symbols  on a as $x return Q a: symbol name $x : variable Q: FluX expression α executed on buffers Q executed in event-based fashion

FluX – Intl. Conf. on Very Large Databases Safe FluX Queries FluX query is safe  No XQuery - expression refers to elements that may still be encountered in the stream Bibliography DTD FluX query … {process-stream $b: on title as $t return $t; on-first past (title,author) return {for $p in $b/price return $p}} … Data stream … Kemper Datenbanksysteme Eickler 39€ … execute Not safe!

FluX – Intl. Conf. on Very Large Databases Safe FluX Queries FluX query is safe  No XQuery - expression refers to elements that may still be encountered in the stream Bibliography DTD FluX query … {process-stream $b: on title as $t return $t; on-first past (title,author, price) return {for $p in $b/price return $p}} … Data stream … Kemper Datenbanksysteme Eickler 39€ … execute Safe!

FluX – Intl. Conf. on Very Large Databases Outline Motivation FluX Query Language Translating XQuery into FluX Further Aspects Experiments Conclusion

FluX – Intl. Conf. on Very Large Databases XQuery to FluX Rewrite XQuery - Q to FluX query F using (non-recursive) DTD  F is safe w.r.t. DTD  F is equivalent to Q  F has low memory consumption  Appropriate scheduling of event processors Steps 1. Normalization of Q 2. Rewriting into FluX

FluX – Intl. Conf. on Very Large Databases Normalization Rule-based rewriting of XQuery  Split paths in single step for loops  Eliminate where using if  Push down if expressions  Rewrite paths $x/a/… to for loops XMP, Q1 {for $b in $ROOT/bib/book where χ return {$b/year} {$b/title} } {for $bib in $ROOT/bib return {for $b in $bib/book return {if χ then } {for $year in $b/year return {if χ then {$year}}} {for $title in $b/title return {if χ then {$title}}} {if χ then }}}

FluX – Intl. Conf. on Very Large Databases Example {for $bib in $ROOT/bib return {for $b in $bib/book return {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} }} function rewrite(Variable parentVar, Set H, XQuery - β): FluX rewrite($ROOT, {}, Q) Delay execution of β Bibliography DTD

FluX – Intl. Conf. on Very Large Databases Example {for $bib in $ROOT/bib return {for $b in $bib/book return {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} }} rewrite($ROOT, {}, β 1 )  β 1 simple, no delay generate on-first past () return … β1β1 β2β2

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return {for $bib in $ROOT/bib return {for $b in $bib/book return {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} }} rewrite($ROOT, {}, β 2 ) β2β2

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return {for $bib in $ROOT/bib return {for $b in $bib/book return {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} }} rewrite($ROOT, {}, β 2 )  β 21, β 22 rewrite($ROOT, {}, β 21 )  no delay generate on bib as $bib return … β 21 β 22

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return on bib as $bib return {for $b in $bib/book return {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} }} rewrite($bib, {}, α 1 )  no delay generate on book as $b return … α1α1

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return on bib as $bib return {ps $bib: on book as $b return {for $t in $b/title return {$t}} {for $a in $b/author return {$a}} } rewrite($b, {}, α 2 )  as before, no delays generate on-first past() return … on title as $t return … α2α2

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return ; on title as $t return {$t}; {for $a in $b/author return {$a}} } Assure all titles before α 32 rewrite($b, {title}, α 32 ) rewrite($b, {title}, α 41 )  delay execution after title, buffered execution generate on-first past(title,author) return … α 32 α 41 α 42

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return ; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; } Assure all titles and authors before α 42 rewrite($b, {title,authors}, α 42 )  α 42 simple, delay execution after title,author generate on-first past(title,author) return … α 42

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return ; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return ;};

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return ; on title as $t return {$t}; on-first past(title,author) return {for $a in $b/author return {$a}}; on-first past(title,author) return ;} on-first past(bib) return ;}

FluX – Intl. Conf. on Very Large Databases Example – Order Constraints {ps $ROOT: on-first past() return on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return ; on title as $t return {$t}; {for $a in $b/author return {$a}} } Assure all titles before α 41 rewrite($b, {title}, α 41 )  DTD ensures titles before authors  generate on author as $a return … α 41 α 42

FluX – Intl. Conf. on Very Large Databases Example {ps $ROOT: on-first past() return on bib as $bib return {ps $bib: on book as $b return {ps $b: on-first past() return ; on title as $t return {$t}; on author as $a return {$a}; on-first past(title,author) return ;}; on-first past(bib) return ;} Assure all titles before α 41 rewrite($b, {title}, α 41 )  H={title}  DTD ensures titles before authors  generate on author as $a return …

FluX – Intl. Conf. on Very Large Databases Outline Motivation FluX Query Language Translating XQuery into FluX Further Aspects Experiments Conclusion

FluX – Intl. Conf. on Very Large Databases Further Aspects  Visit our demonstration (Group 3: XML) To Normal Form Algebraic Optimizations To FluX XQuery DTD Query Compiler Streamed Query Evaluator XSAX Memory Buffers Query Optimizer Runtime Engine XML Input StreamXML Output Stream

FluX – Intl. Conf. on Very Large Databases Outline Motivation FluX Query Language Translating XQuery into FluX Further Aspects Experiments Conclusion

FluX – Intl. Conf. on Very Large Databases Experiments Based on XMark Queries adapted to XQuery - fragment Environment  AMD Athlon XP 2000, 512MB RAM  Linux, Sun JDK 1.4.2_03 Measurements  Execution time  Memory consumption

FluX – Intl. Conf. on Very Large Databases Experiments FluXGalaxAnonX time [s]memorytime [s]memorytime [s] 5M2,1013,437M3,4 Q110M2,8029,883M6,7 50M7,80->500M38,3 100M14,00->500M- 5M6,81,54M296,950M143,8 Q810M17,23,16M1498,3100M534,8 50M357,816,00M->500M- 100M11566,932,25M->500M- 5M5,6374k277,050Mn/a Q1110M11,4741k1663,7100Mn/a 50M170,83,64M->500Mn/a 100M626,87,27M->500Mn/a 5M2,2012,838M3,0 Q1310M3,1027,273M5,2 50M7,90230,1344M88,0 100M13,90->500M- 5M2,84,66k13,236M2,5 Q2010M3,45,18k29,780M6,2 50M8,77,01k->500M151,9 100M15,47,02k->500M-

FluX – Intl. Conf. on Very Large Databases Outline Motivation FluX Query Language Translating XQuery into FluX Further Aspects Experiments Conclusion

FluX – Intl. Conf. on Very Large Databases Conclusion FluX  Event based extension of XQuery  Rewriting of XQuery into FluX  Usage of information of DTD FluX supports buffer-conscious query processing  Low main memory consumption  Efficient and scalable query execution on data streams Future work  Recursive DTDs  Extension of XQuery - subset (e.g., //, aggregate operators)  Improve execution (joins)

FluX – Intl. Conf. on Very Large Databases Related Work Altinel, Franklin. “Efficient Filtering of XML Documents for Selective Dissemination of Information”. VLDB 2000 Buneman, Grohe, Koch. “Path Queries on Compressed XML”. VLDB 2003 Chan, Felber, Garofalakis, Rastogi. “Efficient Filtering of XML Documents with XPath Expressions”. ICDE 2002 Deutsch, Tannen. “Reformulation of XML Queries and Constraints”. ICDT 2003 Fegaras, Levine, Bose, Chaluvadi. “Query Processing on Streamed XML Data”. CIKM 2002 Green, Miklau, Onizuka, Suciu. “Processing XML Streams with Deterministic Automata”. ICDT 2003 Gupta, Suciu. “Stream Processing of XPath Queries with Predicates”. SIGMOD 2003 Ludäscher, Mukhopadhyay, Papakonstantinou. “A Transducer-Based XML Query Processor”. VLDB 2002 Marian, Siméon. “Projecting XML Documents”. VLDB 2003 Olteanu, Kiesling, Bry. “An Evaluation of Regular Path Expressions with Qualifiers against XML Streams”. ICDE 2003

FluX – Intl. Conf. on Very Large Databases FluX Query Language Based on XQuery fragment XQuery -  ε (empty)  s (output fixed string)  α β (sequence)  {for $x in $y/π [where χ] return α} (for loop)  {$x/π} (output path)  {$x} (output)  {if χ then α} (conditional) Difference to XQuery in treating fixed strings {$ROOT/bib/book}

FluX – Intl. Conf. on Very Large Databases FluX Query Language (ctd.) XQuery - Expression α β γ is simple, if 1. α, γ (possibly empty) sequence of fixed string s {if χ then s} 2. β is empty or {$u} {if χ then {$u}} and $u not in condition of α γ  Can be executed without buffering on streams Example: {$x} {if $x/b=5 then 5 } simple {$x}{$x} not simple

FluX – Intl. Conf. on Very Large Databases Dependencies … { for $title in $book/title return { if $book/publisher = “Addison-Wesley” and $book/year > 1991 then {$title} } } … {ps $book: … on-first past(publisher,year,title) return { for $title in $book/title return { … } } }; … } dependencies($book, “{for …}”)

FluX – Intl. Conf. on Very Large Databases Further Aspects The XSAX (eXtended SAX) parser  Generates on-first events Execution of FluX queries  Using XSAX Projection scheme  Additional reduction of buffer size Algebraic pre-optimizations  Visit our demonstration (Group 3: XML)