Download presentation
Presentation is loading. Please wait.
Published bySamuel Barber Modified over 8 years ago
1
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner and Murali Mani D atabase S ystems R esearch G roup Department of Computer Science Worcester Polytechnic Institute Worcester, Massachusetts, USA VLDB 2006 Seoul, Korea
2
Background: XML Stream Applications Wide-range and growing applications Examples: news publishing and on-line auction systems Characteristics Real-time processing: short response time Limited resources: minimize memory News Publishing On-line Auction
3
Constraint Properties Document Type Definition (DTD) or XML Schema Constraints are statically available beforehand General XML Semantic Query Optimization (SQO) Tree minimization Recursion optimization Stream-specific XML SQO Context-aware shortcutting Token-granularity data output Background: Optimization Using Constraints
4
Motivation Scenarios where static schema cannot be applied Challenges when schema comes dynamically: - how to represent and manage runtime schema - how to exploit dynamic schema for runtime optimization - how to propagate runtime schema down stream Goals Runtime schema encoding and synchronization Semantic query optimization techniques Runtime schema propagation R-SOX: Motivation and Goal
5
R-SOX: Architecture and Workflow Input Stream RSI Schema knowledge XQuery Result Schema Query Plan Plan Refinement Extended Raindrop XQuery Engine R-SOX System Annotated Output Stream Result Stream Schema Inf. Manager Query Plan Adaptor Query Plan Generator Stream Annotator Basic XQuery Evaluation Runtime Schema Refinement Runtime Semantic Query Optimization Downstream Schema Propagation R-SOX Contributions Future Work Raindrop Engine Demon Focus
6
Raindrop XQuery Engine Construction of Raindrop plan Automaton-based query evaluation Basic XQuery Evaluation stream s0s1s2 s3 s5 content s4 content s6 Query Automata comments source XQuery Q1-1: FOR $o in document(“news.xml")/stream/news RETURN $o/source, $o/comments SJoin on $x ExtractNest $b Nav stream//news -> $x Nav $x//source-> $b Stream Data Nav $x//comments->$c ExtractNest $c Raindrop XQuery Plan Input Token Stream: CNN… … … President… … …… …… news
7
Runtime Schema Information (RSI ) Representing RSI: RSI Grammar Encoding RSI: - embedded into input XML token stream - extracted using DFA stream loader Managing Schema Information Schema Graph: directed ordered graph Schema graph synchronization with the newly received RSIs History-aware RSI rollback R untime Schema Refinement Example of RSI: News ((source | comment)+, date+) RSI 1: ((news,inf,TIME), (/news/comment,, ),-) News (source+, date+) RSI 2: ((/news,200,COUNT), (/news/comment, /news/source, *), +) News (source*, comment+, date+)
8
Runtime Plan Adaptor Incremental plan migration Rule library Rule applier Query Execution Modifying automata computations Switching execution modes Performing event-condition actions Runtime SQO: Overview Supporting Following SQO Techniques: ( 1 ) Tree Minimization ( 2 ) Recursion Optimization ( 3 ) Fast Data Output ( 4 ) Navigation Shortcutting
9
Benefits Expedite document traversal on pattern retrieval by avoiding unnecessary navigation Change query plan at run-time by adjusting automata Query Execution Temporarily removing and adding automaton states Runtime SQO: Tree Minimization RSIs: P1: ((stream,inf,Count), (/news, source, ), -) P2: ((stream,inf,Count), (/news, comments,), -) stream s0s1 news s2 s3 s5 content s4 content s6 Disable the transition by P2 Disable the transition by P1 news stream source date comments (1,∞) …… (1, ∞) Cut by P2 Cut by P1 …… Schema Graph Refinement Query Automata Refinement XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN $o/source, $o/comments (1, ∞ ) source comments
10
Benefits Improve performance by avoiding unneces- sary over-head on recursive handling Optimization Processing Detect recursion by analyze the runtime schema knowledge Switch between recursion-aware/non- recursive operators Characterize safe moments of runtime migration Runtime SQO: Recursion Optimization RecurSJoin on $x RecurExtractNest $b RecurNav stream//news -> $x RecurNav $x//source-> $b Stream Data RecurNav $x//comments->$c RecurExtractNest $c Operator Switching in the Query Plan XQuery Q2: (slightly different with Q1) FOR $o in document(“news.xml") stream//news RETURN $o/source, $o/comments Recursive-aware operators will be switched to the non-recursive operator if input XML data isn’t recursive Recursive Operator Non-recursive Operator RSIs: P1: ((news,inf,Count), (/news, news, ), - ) P2: ((news,inf,Count), (/news, news, ), +) P1P2
11
Benefits Minimize memory consumption by avoiding unnecessary data storage and releasing buffered data at the earliest moment Optimization Processing Augment query automata with Glushkov automata Encode event-condition actions Runtime SQO: Fast Data Output Case 1: Overall Schema Knowledge as news((source | comments | date)+) No order constraints can be used. Storing comments/content Case 2: Overall Schema Knowledge as News(source+,comments+,date+) Global order constraint: Order( source, comments ) No storage is needed Case3: Overall Schema Knowledge as News( (source | comment)+, date+, comment+ ) Local order constraint: LocalOrder( source, comments ) Same as Case 1 at the beginning. Glushkov automata on the type “news” is used to indicate the completeness of source elements. After that, storage on comments/content is not needed XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN $o/source, $o/comments stream s1s2 news s3 source s4 s6 comments content s5 content s7 Actions Encoded into the Automata start S1 S4 comments source S2S3 datecomments source commentsdate Glushkov Automata for Type “News”
12
Benefit Expedite document-order traversal on pattern retrieval by early filtering of failed patterns Optimization Rules Order, occurrence and exclusive rules Completeness and minimal cost optimization is guaranteed Query Execution Introduce new pattern look-up into query automata Encode event-condition actions Runtime SQO: Navigation Shortcut (I)
13
Runtime SQO: Navigation Shortcut (II) Utilizing Occurrence Constraints XQuery Q3: FOR $a in stream(bids)/auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr] WHERE $b/*/phone = “508” RETURN $b, $c Overall Schema Knowledge as: Occurrenc( phone, 2 ) when is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3 Overall Schema Knowledge as: Order( primary, homepage) when is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2 Utilizing Order Constraints Actions Encoded into the Automata
14
R-SOX System Demonstration Runtime Schema Refinement Runtime SQO Algebraic Query Plan Generation Application Scenarios: On-line auction data News publishing data
15
Recent Publications S.Wang etc. R-SOX: Runtime Semantic Query Optimization over XML Streams. VLDB 2006. H.Su etc. Automata Meets Algebra. DKE Journal 2006. M.Wei etc. Processing Recursive XQuery over XML Streams: the Raindrop Approach. XSDM 2006. H.Su etc. Semantic Query Optimization in an Automata-Algebra Combined XQuery Engine. VLDB 2004. H.Su etc. Semantic Query Optimization for XQuery over XML Streams. VLDB 2005. Acknowledgement NSF for the Support on Grants IIS 0414567 and CNS 0551584 Source Code Release Raindrop 1.0 is released: http://davis.wpi.edu/dsrg/raindrop/release Raindrop Project http://davis.wpi.edu/dsrg/raindrop
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.