Serge Abiteboul Omar Benjelloun Bogdan Cautis Ioana Manolescu Tova Milo Nicoleta Preda Lazy Query Evaluation for Active XML.

Slides:



Advertisements
Similar presentations
Chapter 7. Binary Search Trees
Advertisements

Chapter 5: Tree Constructions
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
XML: Extensible Markup Language
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
SSA.
Native-Conflict-Aware Wire Perturbation for Double Patterning Technology Szu-Yu Chen, Yao-Wen Chang ICCAD 2010.
Fast Algorithms For Hierarchical Range Histogram Constructions
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Review: Search problem formulation
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Chapter 4 Multiple Regression.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Catriel Beeri Pls/Winter 2004/5 type reconstruction 1 Type Reconstruction & Parametric Polymorphism  Introduction  Unification and type reconstruction.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Data Flow Analysis Compiler Design Nov. 8, 2005.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Detection and Resolution of Anomalies in Firewall Policy Rules
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
A Review of Recursion Dr. Jicheng Fu Department of Computer Science University of Central Oklahoma.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,
June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.
Finding Optimal Probabilistic Generators for XML Collections Serge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, Pierre Senellart BDA 2011.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Functional Programming Universitatea Politehnica Bucuresti Adina Magda Florea
AXML Transactions Debmalya Biswas. 16th AprSEIW Transactions A transaction can be considered as a group of operations encapsulated by the operations.
Querying Structured Text in an XML Database By Xuemei Luo.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
Database Systems Part VII: XML Querying Software School of Hunan University
Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on: Global Computing (GC) Proactive.
CSC 211 Data Structures Lecture 13
1 Relational Algebra and Calculas Chapter 4, Part A.
Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Chapter 10 Algorithmic Thinking. Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the.
Top-Down Parsing.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Exchange Intensional XML Data Tova MiloSerge Abiteboul Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannOmar Benjelloun Bernd Amann Cedric-CNAM.
A New Top-down Algorithm for Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
TOPIC : Fault detection and fault redundancy UNIT 2 : Fault modeling Module 2.3 Fault redundancy and Fault collapsing.
Chapter 13: Query Processing
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda.
Certifying and Synthesizing Membership Equational Proofs Patrick Lincoln (SRI) joint work with Steven Eker (SRI), Jose Meseguer (Urbana) and Grigore Rosu.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Introduction to Machine Learning, its potential usage in network area,
By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01
Database Management System
Applied Algorithms (Lecture 17) Recursion Fall-23
Resolution Proofs for Combinational Equivalence
XML indexing – A(k) indices
Presentation transcript:

Serge Abiteboul Omar Benjelloun Bogdan Cautis Ioana Manolescu Tova Milo Nicoleta Preda Lazy Query Evaluation for Active XML

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

AXML documents AXML stands for ‘Active’ XML XML documents with embedded calls to web services as a useful paradigm for distributed data management on the Web Contain both data as well as function calls to web services Supports many features to control the activation of service calls embedded in documents.

AXML structure function nodes

Tree pattern query

Approaches to query evaluation Naïve approach : Invoking all calls recursively - not efficient Not-so-naïve approach: build a query processor that traverses the document top-down, and invokes the calls encountered while evaluating the query. - mixing of query processing and service invocation would result in bad performance - forcing a particular query processing strategy eliminates the opportunities for query optimization.

AXML query evaluation Service calls may appear anywhere in the data Service calls may appear dynamically in the results of previously invoked calls Relevance of one call may depend on the result of another Return types of the service calls should be taken into consideration

Lazy evaluation What is ‘lazy’ evaluation? - Avoid invocation of service elements which do not bring relevant data - A service call specified as lazy is invoked only when its result may participate in the answer to a pending query Example: query/goingOut/movies//show[title="The Hours"]/schedule A voided: /goingOut/restaurants also, services under /goingOut/movies whose signature indicates that the returned data is irrelevant The main contribution of this paper is an algorithm for the lazy evaluation of queries on AXML documents

The Lazy evaluation technique The approach of lazy query evaluation makes use of the following techniques: - Computing the set of relevant service calls - Service calls sequencing - Pruning via typing - Service calls guide - Pushing queries

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

Preliminaries Embedding - Embedding of query ‘q’ into document ‘d’ is a match of a given tree pattern query to the document ‘d’ - Result of embedding: bindings of the output variables on the witness tree Relevant calls - Function calls that bring relevant data Relevant Rewriting - Expansion of the document after retrieval of data from the relevant calls

Formal definition of Embedding

Example of embedding

If we evaluate getNearbyRestos()

Example of embedding (cont’d)

XY

Relevant calls Relevant calls: Calls that indeed contribute to the resultant embedding. For example, getNearbyRestos() is a relevant call, and getNearbyMeuseums() is not. In general, a function node is relevant, if there exists some rewriting of the document where some of the nodes it produces belongs to a match

Relevant calls : Formal definition

Relevant rewriting and Completeness Rewriting the document by invoking relevant function nodes produces relevant rewritings In a process of making successive relevant calls, there might come a termination point where there can be no more relevant calls. The document ‘d’ is said to be complete for the given query ‘q’.

Relevant rewriting and Completeness

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

Efficient evaluation Given an Active XML document d and a query q, find an efficient way to evaluate the query over the document Naïve approach: interleave query evaluation with function calls Better: try to compute (a superset of) the relevant functions calls for q and execute q over the rewriting of d (that results from executing these function calls) Efficiency tradeoff: time to compute approximation of set of relevant functions (larger for more accurate approx) Time to execute the function calls (smaller for more accurate approx) and time to execute query over resulting rewriting of document (smaller document for more accurate approx)

Linear path queries (LPQ) /*() /nyHotels/*() /nyHotels/hotel/*() /nyHotels/hotel/name/*() /nyHotels/hotel/rating/*() /nyHotels/hotel/nearby/*() /nyHotels/hotel/nearby//*() /nyHotels/hotel/nearby//restaurant/*() /nyHotels/hotel/nearby//restaurant/name/*() /nyHotels/hotel/nearby//restaurant/address/*() /nyHotels/hotel/nearby//restaurant/rating/*()

LPQ continued Correct, but usually inefficient Ignores filtering conditions in the path from the root or in other branches that could make some of the functions irrelevant (e.g. there is no chance that a getNearbyRestos() function node under a hotel is relevant, if the hotel rating is not “*****”)

Node Focused Queries (NFQ’s) For each node in the query tree, replace it with an OR node (to add a branch *() to match any functions, similarly with LPQs) Then, for every node v in the resulting query tree, create q v = q – {v and its subtree}, with output node f v pointing at the position of the *() OR-sibling of v - Each such query tree involves the path from the root to the node (as in LPQ) + any parts of the tree that would have to be matched anyway, for the whole query tree to match.

Example of NFQ nyHotels hotel namenearby “*****” restaurant nameaddress rating “*****”XY * * * * * * * ** *

NFQ Example contd. nyHotels hotel namenearby “*****” restaurant nameaddress rating “*****”XY * * * * * ** *

Example contd. nyHotels *

Example contd. nyHotels

Example contd. nyHotels *

Example contd. nyHotels hotel namenearby “*****” restaurant nameaddress rating “*****”XY * * * * * * * ** *

Example contd. nyHotels hotel name nearby “*****” rating * * * * *

Example contd. nyHotels hotel namenearby “*****” rating * * * * * * * “Best Western”

Example contd. nyHotels hotel name nearby “*****” rating * * * * * “Best Western”

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

Sequencing relevant calls Naïve NFQA algorithm: 1. Evaluate all NFQs 2. Pick one of the returned functions, say f v 3. Evaluate the function and rewrite the document (d d’) 4. Until all NFQs return empty results (i.e., there are no more relevant calls) After every loop, although the NFQs remain the same, their result can change (since evaluating functions at step 3 above can introduce new function nodes or make some results irrelevant)

Improving NFQA “Predict” when NFQ results could not have possibly changed and avoid reevaluating them. Identify dependences between NFQs and the effect of executing functions they return

Influence of NFQ’s nyHotels * hotel name nearby “*****” rating * * * * * “Best Western” NFQ 1 NFQ 2 NFQ 1 can influence NFQ 2, but not vice versa

Influence of NFQ’s NFQ 1 may influence NFQ 2 iff the output function node of NFQ 1 is an ancestor (in the query tree) of the output node of NFQ 2 Two NFQs belong in the same layer if they may influence (directly or transitively) each other. Inside every layer, we have to reevaluate every NFQ after every function call

NFQ Layers L 1 < L 2 iff some NFQ in L 1 may influence (directly or transitively) some NFQ in L 2 We have to process L 1 before L 2 (without having to process L 1 again afterwards) When processing L 1 has finished, OR-nodes corresponding to returned functions are redundant and thus NFQs in L 2 can be simplified by removing them

Parallelizing calls Let q lin be the linear path from the root to the output node of NFQ q, not inclusive (note: q lin is a regular expression) Two NFQs q, q’ that belong to the same layer are independent iff there are no common words in the regular languages of q lin, q’ lin E.g: //a, //b are independent But //a//c and //b//c are not: (e.g. both match /a/b/c) If all NFQs in a layer are independent, we can call all functions returned by the same NFQ in a step of NFQA in parallel.

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

Using types Use function return type to “predict” shape of data that a function call can return If this shape cannot match the (corresponding part of) the query pattern, they can be discarded Refined NFQs Use set of function names of appropriate return type instead of *() Use F-guides (later) to make them even more refined

Refined NFQ’s nyHotels hotel name nearby “*****” rating * * * *

Refined NFQ’s nyHotels hotel name nearby “*****” rating * * getRating getNearbyRestos * “Best Western”

Pushing queries Add sub-queries to the service calls to get precise data rather than getting all the data For instance, rather than invoking getNearbyRestos() and getting all the near by restaurants, we would like to also ship the sub-query //restaurant[rating="*****",name=X,address=Y] (with X and Y marked as result nodes) with the call. Reduce amount of (useless) data that are transferred (assuming functions correspond to remote (web) services), by filtering irrelevant matches and projecting only on output variable nodes

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

Function call guides Function call guide: An access structure that provides information about the function calls that are present in the document. Used to speed up processing. We use a tree structure that summarizes the paths that occur in a given document, containing a single occurrence of each path leading to a function call. For each path we keep pointers to the corresponding function call nodes in the document.

Filtering techniques Type based filtering: Discarding the functions whose output type does not satisfy the query sub-tree rooted at the end of the linear path by which they were retrieved. NFQ filtering: Filter the function call candidates using the NFQs.

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

Implementation and experiments Used ToXgene to generate AXML documents with function calls Used the algorithms described here to implement a module that decides which services need to be invoked for query evaluation, and pushes, when possible, the relevant sub- queries to them. Used customized query processor similar to ‘Xquery’ Experiments show effects of using NFQ filtering, F-Guides, Type checking and Sub-query pushing and also combination of all of them

Experimental setting The document size is fixed to be 1.6MB and varied the number of function calls in the document by instructing ToXgene to set the content of, e.g., rating elements f% of the time as a call to getRating, and (100-f)% of the time as materialized data. Queries: Linear: hotels//hotel Tree Pattern: hotels/hotel[rating=‘‘***’’] /nearby/hotel[rating=‘‘***’’] /nearby/hotel[rating=’’***’’]

Experiments and results

Effect of pusing subqueries

Outline Introduction Preliminaries Finding relevant calls Sequencing relevant calls Using types Faster relevance detection Implementation and experiments Conclusion

AXML – emerging technology The paper emphasizes on optimizations to query evaluation on AXML documents using several algorithms There are other works on the usage of AXML documents in special environments (eg. P2P data sharing networks). The techniques presented in this paper deals with a specific problem and is complimentary to these works The techniques mentioned in this paper can be modified and customized for optimizing evaluation based the usage of AXML documents to obtain better results Several techniques used in this paper are inspired by other works, but the novelty here is in customizing them to the problem at hand.

Thank you!