Ting Chen, Jiaheng Lu, Tok Wang Ling

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Mathematical Preliminaries
AP STUDY SESSION 2.
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.
APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Jiaheng Lu, Ting Chen, Tok Wang Ling National University of.
On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
David Burdett May 11, 2004 Package Binding for WS CDL.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Polygon Scan Conversion – 11b
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Break Time Remaining 10:00.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
XML and Databases Exercise Session 3 (courtesy of Ghislain Fourny/ETH)
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
: 3 00.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Exponents and Radicals
Clock will move after 1 minute
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Profile. 1.Open an Internet web browser and type into the web browser address bar. 2.You will see a web page similar to the one on.
Instructor: Shengyu Zhang 1. Content Two problems  Minimum Spanning Tree  Huffman encoding One approach: greedy algorithms 2.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
1 Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching Ting Chen, Tok Wang Ling, Chee-Yong Chan School of Computing, National University.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Presentation transcript:

Ting Chen, Jiaheng Lu, Tok Wang Ling On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques Ting Chen, Jiaheng Lu, Tok Wang Ling

Outline Background Our holistic Twig Pattern Matching algorithms XML Twig Pattern Query Previous Twig Join algorithms Limit of the original holistic method TwigStack Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching theory iTwigJoin: a generalized holistic matching algorithm Experiments Conclusion On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Background: XML and Region coding XML document is modeled as a tree in our work Region Coding for XML document tree <start, end, level> label for each element Containment Property: a.start < b.start AND a.end > b.end if and only if a is an ancestor of b On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Background: XML twig pattern queries An XML twig query is a small tree, whose edges include parent-child or ancestor-descendant relationships. Given an XML document D, and an XML twig query Q, our problem is to find all occurrences of Q on D. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Previous XML Twig Join algorithms Techniques Edge Based Binary Structural Join [Al-Khalifa et al ICDE02] Join Order Selection [Wu et al ICDE03] Path Based BLAS [Chen et al SIGMOD04] Tree (Holistic) Based TwigStack [Bruno et al SIGMOD02] TwigStackList [Lu et al CIKM04] Index Based B tree [[Chien et al VLDB02] XR tree[Jiang et al ICDE02] TSGeneric+[Jiang et al VLDB03] On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Holistic Twig Matching TwigStack [Bruno et al SIGMOD02] A holistic twig join algorithm E.g: For query A[.//C]//B, there may be many matches only to A//B. But TwigStack only output results for A with descendants B and C. No join order selection required TwigStack is optimal for only ancestor-descendant twig patterns. Reordering of elements in a stream does not help. [Choi et al DEXA03] On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Sub-optimality of TwigStack Not optimal for twigs with parent-child edge a1 a1 a2 … an A b1 a2 an cn B C b1 b2 … bn c1 c2 … cn … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Two Refined Streaming Schemes(1) To enlarge the optimality of TwigStack, in our paper we proposed two refined streaming schemes. Tag + Level: elements with the same tag and level are grouped together a1 A a1 … b1 a2 an cn b1 a2 a3 … an cn B C … b2 b3 … bn c1 c2 … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Two Refined Streaming Schemes(1) For this query, tag+level streaming scheme can guarantee the optimality. a1 A a1 … b1 a2 an cn b1 a2 a3 … an cn B C … b2 b3 … bn c1 c2 … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Two Refined Streaming Schemes(1) But given a more complex query and document, tag+level cannot guarantee the optimality. For example: a1 A a1 e1 a2 b2 a2 b2 D B d3 d1 d2,d3 d1 d2 b1 b1 C c1 c2 Query c1 c2 Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Two Refined Streaming Schemes(2) Prefix Path Streaming (PPS): elements with the same root-to-node path are grouped together Every element in the document is stored as an individual stream in this example. D: a1 a1 e1 a2 b2 e1 a2 b2 d1 d2 b1 d3 d3 d1 d2 b1 c1 c2 Document c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Two Refined Streaming Schemes(2) PPS is optimal for the following example. d1,d2,c1,c2 are separated to different streams a1 A a1 e1 a2 b2 a2 b2 D B d3 d1 d2 d1 d2 b1 b1 C c1 c2 Query c1 c2 Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Two Refined Streaming Schemes(2) A natural question : Can PPS guarantee to be optimal for all queries and data? On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Two Refined Streaming Schemes(2) A natural question : Can PPS guarantee to be optimal for all queries and data? The answer is NO. For example: c1, c2 are in the same stream. Similarly, e1, e2 are also in the same stream. A a1 b1 b2 b3 C B a2 a3 a4 d2 E D c1 c2 b4 b5 e1 d1 e2 Query : head element Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

A general algorithm: iTwigJoin We propose a general algorithm, called iTwigJoin , which can be used on various data streaming schemes. Our key idea is to classify all current head elements to three classes: Subtree-matching Useless Blocked On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Classifying Head Elements Subtree-Matching Element Element e of tag E is called a subtree-matching element for query Q e is in a match to QE (QE is the sub-tree of Q rooted at E); and NOT in any future match to QP where P is the parent of E in Q Useless Element Element e is called a useless element if e is not in any future match to QE. Blocked Element An element which is neither subtree-matching nor useless On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching useless blocked d1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching useless blocked d1,c1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching - useless blocked d1,c1,a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching - useless blocked d1,c1,a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching useless blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked c1 D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked c1, b1, a2, D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Example: Classifying Head Elements (Tag+Level Streaming) Subtree-matching - useless blocked a1,a2,b1,b2,c1,d1 e1 a2 b2 D B A C d1 d2 b1 d3 B D Subtree-matching d1, useless a1,b2 blocked a2,b1,c1 C c1 c2 Useless element can be discarded safely sub-tree Matching element is pushed to the corresponding stack Blocked element causes problem CANNOT be discarded because it may cause loss of results CANNOT be pushed to stack because it may cause useless results When all head elements are blocked; optimal holistic matching CANNOT be guaranteed On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 Since all head elements are blocked, we have to push a1 to stack and output one path solution (a1,d1). A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

If there is no c2, then (a1,d1) is a useless path solution. iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 Since all head elements are blocked, we have to push a1 to stack and output one path solution (a1,d1). A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 If there is no c2, then (a1,d1) is a useless path solution. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

iTwigJoin Two Main Components Stream Manager: Control the advance operation of streams and send elements for temporary storage Temporary Storage: Push elements to stack and output intermediate paths. Stream Manager Temporary Storage a1 SA a2 b2 SB SC c1 c2 c3 … b1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Flowchart of iTwigJoin Label current head elements as either subtree-Matching, Useless or Blocked If useless element is found Discard Useless elements If not all streams end Select a subtree-Matching or blocked element e Pop some elements from stack Push e to the stack and output intermediate paths if e is the leaf On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Optimal classes of iTwigJoin for three streaming schemes Tag Streaming A-D only pattern A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Optimal classes of iTwigJoin for three streaming schemes Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Optimal classes of iTwigJoin for three streaming schemes Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern Prefix Path Streaming A-D/P-C only or 1-Branch A-D/P-C only or 1-Branch node A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Optimal classes of iTwigJoin for three streaming schemes Optimal class:Larger More refined Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern Prefix Path Streaming A-D/P-C only or 1-Branch A-D/P-C only or 1-Branch node A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Experiments Benchmarks XMark: Synthetic Data Treebank: Real Data from Wall Street Journal On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Experiments: I/O Performance Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode By pruning irrelevant streams, PPS usually scan the fewest number of elements. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Experiments: Number of Intermediate Paths Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode For treebank 5, there is no matching results. So Tag+Level and PPS do not output any intermediate results. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Experiments: Running Time XMark1: Path Pattern, XMark2: A-D only, XMark3: P-C only, XMark4: 1-branchnode, XMark5: Non-optimal, Tag+level and PPS have better performance than TwigStack and TwigStackList in XMark data. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Experiments: Summary Both PPS and Tag+Level help to reduce I/O costs. while PPS saves more. PPS may result in too many streams for deep XML data; Tag+Level seems to be a good compromise. PPS and Tag+Level completely avoid the output of redundant intermediate paths in all cases we tested, though they cannot guarantee the optimality in theory. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Conclusions We develop a general algorithm to perform holistic twig join on Tag+Level and PPS streaming schemes. We identify two I/O optimal classes for Tag+Level and PPS streaming schemes. Since our experiments show that Tag+Level streaming schemes can guarantee to produce very few useless intermediate results in most cases, we recommend to use Tag+Level scheme for efficient XML twig pattern matching. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

END Thank you! Q & A On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

Backup iTwigJoin Algorithm While(not all streams end) Label current head elements as either Matching, Useless or Blocked If any head element is Useless, discard it and continue Let e1 be the matching element with the smallest startPos; Let e2 be the blocked element with the smallest endPos; If e2.endPos < e1.startPos, let e be the blocked element with the smallest startPos; else let e be e1 Advance the stream e belongs to Pop out elements from e’s stack whose endPos < e.startPos Push e into its stack if e has a parent/ancestor in the temporary storage system, Output all paths involving e If the tag of e is a leaf node in Q On Boosting Holism in XML Twig Pattern Matching using Structural Indexing