1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda.

Slides:



Advertisements
Similar presentations
Ting Chen, Jiaheng Lu, Tok Wang Ling
Advertisements

Chapter 5: Tree Constructions
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Jeffrey D. Ullman Stanford University. 2  A set of nodes N and edges E is a region if: 1.There is a header h in N that dominates all nodes in N. 2.If.
SSA.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Lecture 3: Parallel Algorithm Design
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
Serge Abiteboul Omar Benjelloun Bogdan Cautis Ioana Manolescu Tova Milo Nicoleta Preda Lazy Query Evaluation for Active XML.
Discussion #36 Spanning Trees
Xyleme A Dynamic Warehouse for XML Data of the Web.
Efficiency of Algorithms
A note on generating text with the xsl:value-of instruction.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Data Flow Analysis Compiler Design Nov. 8, 2005.
College Algebra Prerequisite Topics Review
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Data Structures and Algorithms Graphs Minimum Spanning Tree PLSD210.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
The Active XML project: an overview Serge Abiteboul · Omar Benjelloun · Tova Milo Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu,
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
10.4 How to Find a Perfect Matching We have a condition for the existence of a perfect matching in a graph that is necessary and sufficient. Does this.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Database Management 9. course. Execution of queries.
Functional Programming Universitatea Politehnica Bucuresti Adina Magda Florea
Querying Structured Text in an XML Database By Xuemei Luo.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Lauren Milne Spring 2015.
Database Systems Part VII: XML Querying Software School of Hunan University
CSC 211 Data Structures Lecture 13
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Process-oriented System Analysis Process Mining. BPM Lifecycle.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Binary Tree 10/22/081. Tree A nonlinear data structure Contain a distinguished node R, called the root of tree and a set of subtrees. Two nodes n1 and.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
SF-Tree: An Efficient and Flexible Structure for Estimating Selectivity of Simple Path Expressions with Accuracy Guarantee Ho Wai Shing.
CSE373: Data Structures & Algorithms Lecture 9: Disjoint Sets and the Union-Find ADT Lauren Milne Summer 2015.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Exchange Intensional XML Data Tova MiloSerge Abiteboul Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannOmar Benjelloun Bernd Amann Cedric-CNAM.
A New Top-down Algorithm for Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 The tree data structure Outline In this topic, we will cover: –Definition of a tree data structure and its components –Concepts of: Root, internal, and.
Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10 th, SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference.
Week 1 Real Numbers and Their Properties (Section 1.6, 1.7, 1.8)
Certifying and Synthesizing Membership Equational Proofs Patrick Lincoln (SRI) joint work with Steven Eker (SRI), Jose Meseguer (Urbana) and Grigore Rosu.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01
A Linear-Space Top-down Algorithm for Tree Inclusion Problem
CSE373: Data Structures & Algorithms Lecture 10: Disjoint Sets and the Union-Find ADT Linda Shapiro Spring 2016.
Resolution Proofs for Combinational Equivalence
Optimizations using SSA
XML indexing – A(k) indices
Presentation transcript:

1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda INRIA Futurs presented by: Grigoris Karvounarakis Univ. of Pennsylvania CIS 650 October 14, 2004

2 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Active XML function nodes

3 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Tree Pattern Queries result nodes descendant edge

4 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Tree Pattern Queries Similar to Pattern Trees from TAX/TLC algebra + variable nodes, used to bind variables to sub-trees (variable nodes with the same name must be mapped to elements with the same tag name) + result nodes Embedding (of a query q into a doc d) = Match Result of embedding = bindings of output variables on witness tree

5 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 No embedding …

6 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 No embedding … … but if we evaluate 1

7 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Embedding Example

8 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Embedding Example

9 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Embedding Example XY

10 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Relevant rewriting (getNearbyRestos) is a relevant function node In general, a function node is relevant, if there exists some rewriting of the document where some of the nodes it produces belongs to a match Rewriting the document by invoking relevant function nodes produces relevant rewritings d 1 ! v 1 d 2 ! v 2 … d n A document that contains no calls that are relevant to a query q is said to be complete for q 1

11 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Problem definition Given an Active XML document d and a query q, find an efficient way to evaluate the query over the document  Naïve approach: interleave query evaluation with function calls  Better: try to compute (a superset of) the relevant functions calls for q and execute q over the rewriting of d (that results from executing these function calls)

12 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Problem definition Given an Active XML document d and a query q, find an efficient way to evaluate the query over the document  Naïve approach: interleave query evaluation with function calls  Better: try to compute (a superset of) the relevant functions calls for q and execute q over the rewriting of d (that results from executing these function calls) Efficiency tradeoff  time to compute approximation of set of relevant functions (larger for more accurate approx)  time to execute the function calls (smaller for more accurate approx) and time to execute query over resulting rewriting of document (smaller document for more accurate approx)

13 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

14 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Linear Path Queries /*() /nyHotels/*() /nyHotels/hotel/*() /nyHotels/hotel/name/*() /nyHotels/hotel/rating/*() /nyHotels/hotel/nearby/*() /nyHotels/hotel/nearby//*() /nyHotels/hotel/nearby//restaurant/*() /nyHotels/hotel/nearby//restaurant/name/*() /nyHotels/hotel/nearby//restaurant/address/*() /nyHotels/hotel/nearby//restaurant/rating/*()

15 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Linear Path Queries Correct, but usually inaccurate  Ignores filtering conditions in the path from the root or in other branches that could make some of the functions irrelevant (e.g. there is no chance that a getNearbyRestos() function node under a hotel is relevant, if the hotel rating is not “*****”)

16 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Node Focused Queries For each node in the query tree, replace it with an OR node (to add a branch *() to match any functions, similarly with LPQs) Then, for every node v in the resulting query tree, create q v = q – {v and its subtree}, with output node f v pointing at the position of the *() OR-sibling of v  Each such query tree involves the path from the root to the node (as in LPQ) + any parts of the tree that would have to be matched anyway, for the whole query tree to match.

17 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 NFQ Example nyHotels hotel namenearby “Best Western”“*****” restaurant nameaddress rating “*****”XY * * * * * * * ** *

18 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels hotel namenearby “Best Western”“*****” restaurant nameaddress rating “*****”XY * * * * * * * ** * NFQ Example

19 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels NFQ Example *

20 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels NFQ Example *

21 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels * NFQ Example

22 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels hotel namenearby “*****” restaurant nameaddress rating “*****”XY * * * * * * * ** * Another NFQ Example “Best Western”

23 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Another NFQ Example nyHotels hotel namenearby “*****” rating * * * * * * * “Best Western”

24 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Another NFQ Example nyHotels hotel namenearby “*****” rating * * * * * * * “Best Western”

25 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Another NFQ Example nyHotels hotel name nearby “*****” rating * * * * * “Best Western”

26 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Node Focused Queries Assuming that functions can return data of arbitrary type, the function nodes that are relevant for a query q are precisely the ones retrieved by the NFQs of q

27 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

28 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Sequencing relevant calls Naïve NFQA algorithm: 1. Evaluate all NFQs 2. Pick one of the returned functions, say f v 3. Evaluate the function and rewrite the document (d ! f v d’) 4. Until all NFQs return empty results (i.e., there are no more relevant calls) After every loop, although the NFQs remain the same, their result can change (since evaluating functions at step 3 above can introduce new function nodes or make some results irrelevant)

29 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Improving NFQA “Predict” when NFQ results could not have possibly changed and avoid reevaluating them  Identify dependences between NFQs and the effect of executing functions they return

30 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Influence of NFQs nyHotels * hotel name nearby “*****” rating * * * * * “Best Western” NFQ 1 NFQ 2 NFQ 1 can influence NFQ 2, but not vice versa

31 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Influence of NFQs NFQ 1 may influence NFQ 2 iff the output function node of NFQ 1 is an ancestor (in the query tree) of the output node of NFQ 2 Two NFQs belong in the same layer if they may influence (directly or transitively) each other.  Inside every layer, we have to reevaluate every NFQ after every function call  Multiple equivalent NFQs (i.e., in the same layer) can only exist under //– so that, not knowing the output type, both nodes could appear as descendants of each other, e.g. //a, //b: in /a/b, //a matches /a and //b matches /a/b, while in /b/a, //b matches /b and //a matches /b/a

32 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Influence of NFQs L 1 < L 2 iff some NFQ in L 1 may influence (directly or transitively) some NFQ in  We have to process L 1 before L 2 (without having to process L 1 again afterwards)  When processing L 1 has finished, OR-nodes corresponding to returned functions are redundant and thus NFQs in L 2 can be simplified by removing them

33 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Parallelizing calls Let q lin be the linear path from the root to the output node of NFQ q, not inclusive (note: q lin is a regular expression) Two NFQs q, q’ that belong to the same layer are independent iff there are no common words in the regular languages of q lin, q’ lin  E.g: //a, //b are independent  But //a//c and //b//c are not: (e.g. both match /a/b/c) If all NFQs in a layer are independent, we can call all functions returned by the same NFQ in a step of NFQA in parallel.  Other sufficient conditions could exist, too …

34 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

35 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Using types Use function return type to “predict” shape of data that a function call can return  Similar to check for existence of a possible rewriting  If this shape cannot match the (corresponding part of) the query pattern, they can be discarded  In some cases, one can go further and restrict not only the output type but also the specific names of functions that could match Refined NFQs  Use set of function names of appropriate return type instead of *()  Use F-guides (later) to make them even more refined

36 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Refined NFQ example nyHotels hotel name nearby “*****” rating * * * * “Best Western” *

37 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Refined NFQ example nyHotels hotel name nearby “*****” rating * * getRating getNearbyRestos * “Best Western”

38 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Pushing queries Similar to pushing selections on scans in relational queries or pushing queries to data sources in mediator systems Reduce amount of (useless) data that are transferred (assuming functions correspond to remote (web) services), by filtering irrelevant matches and projecting only on output variable nodes

39 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

40 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Lenient rewriting Trade accuracy for efficiency  Use XPath or LPQs instead of NFQ (faster processing)  Use a lenient form of type checking (ignoring order and cardinality of elements)

41 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Similar to dataguides for function calls  One occurrence for each path that leads to some function node + pointers to function nodes

42 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Similar to dataguides for function calls  One occurrence for each path that leads to some function node + pointers to function nodes paths that don’t lead to functions are left out

43 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Similar to dataguides for function calls  One occurrence for each path that leads to some function node + pointers to function nodes pointers to getRating callspointers to getNearbyRestos, getNearbyMuseums calls pointers to getHotels calls

44 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Use F-guides for:  Generation of Refined NFQs (use return type within appropriate F-guide part to get only function names that can indeed appear in the corresponding tree fragment)  Efficient approximation of relevant function nodes: evaluate queries (NFQs) on F-guide  evaluate queries on original document using LPQs  Initial filtering: Can get rid of NFQs for nodes that don’t have any children in the F-guide

45 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Conclusions Active XML: Interesting new area  Nothing fundamentally novel  Applies known tools (distributed processing, lazy evaluation) in a new context, giving new life to documents Greatest challenge: formulate the right research questions well  Answers to these well-formulated questions are fairly easy. Contributions of this paper:  Formulates such an interesting question  Thorough understanding of different aspects of the problem (accuracy vs. performance and their effect to overall efficiency)

46 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Questions?