1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos.

Slides:

Advertisements

Similar presentations

Ting Chen, Jiaheng Lu, Tok Wang Ling

Advertisements

The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.

APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.

The Volcano/Cascades Query Optimization Framework

Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.

Fast Algorithms For Hierarchical Range Histogram Constructions

CPSC 322, Lecture 5Slide 1 Uninformed Search Computer Science cpsc322, Lecture 5 (Textbook Chpt 3.4) January, 14, 2009.

QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.

XML Query Processing Talk prepared by Bhavana Dalvi ( ) Uma Sawant ( )

© 2004 Goodrich, Tamassia Quick-Sort     29  9.

Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.

1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Concept of Basic Time Complexity Problem size (Input size) Time complexity analysis.

Session 6: Introduction to cryptanalysis part 1. Contents Problem definition Symmetric systems cryptanalysis Particularities of block ciphers cryptanalysis.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.

Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.

Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.

1 Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching Ting Chen, Tok Wang Ling, Chee-Yong Chan School of Computing, National University.

Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

Querying Structured Text in an XML Database By Xuemei Luo.

Sorting with Heaps Observation: Removal of the largest item from a heap can be performed in O(log n) time Another observation: Nodes are removed in order.

The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.

Sorting Fun1 Chapter 4: Sorting     29  9.

The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.

University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

TAX: A Tree Algebra for XML H.V. Jagadish Laks V.S. Lakshmanan Univ. of Michigan Univ. of British Columbia Divesh Srivastava Keith Thompson AT&T Labs –

Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.

QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.

XML Access Control Koukis Dimitris Padeleris Pashalis.

Segment Trees Basic data structure in computational geometry. Computational geometry.  Computations with geometric objects.  Points in 1-, 2-, 3-, d-space.

CS4432: Database Systems II Query Processing- Part 2.

Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.

Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree

Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

Holistic Twig Joins: Optimal XML Pattern Matching Written by: Nicolas Bruno Nick Koudas Divesh Srivastava Presented by: Jose Luna John Bassett.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.

Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.

Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.

Ch03-Algorithms 1. Algorithms What is an algorithm? An algorithm is a finite set of precise instructions for performing a computation or for solving a.

1 Native Databases for XML. 2 Store XML as a tree Main Challenge: make querying efficient (recall the difficulties when storing XML as a file) –appropriate.

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,

Efficient processing of path query with not-predicates on XML data

Presented by Sandhya Rani Are Prabhas Kumar Samanta

Chapter 25: Advanced Data Types and New Applications

Holistic Twig Joins: Optimal XML Pattern Matching

Fast Trie Data Structures

CS 154, Lecture 6: Communication Complexity

Hierarchical clustering approaches for high-throughput data

Integer Programming (정수계획법)

Parallel Computation Patterns (Scan)

Structure and Content Scoring for XML

Structure and Content Scoring for XML

Structural Joins: A Primitive for Efficient XML Query Pattern Matching

Presentation transcript:

1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos < x.EndPos (and y.Level = x.Level+1). A node n for us is (D, S:E, L). Call this node id for convenience. What is structural join?  given lists Alist and Dlist of nodes  output pairs (x,y) of nodes [x in Alist, y in Dlist], s.t. x is a of y.  frequently, assume i/p lists are ordered by node id.  might want to order o/p by first operand(‘s node id) or second. (what diff. does it make?) TPQ = compute several SJs and stitch ‘em together.

2 SJ variants There is also a so-called holistic join algorithm (Bruno, Koudas, and Srivastava SIGMOD 2002).  Extend binary join ideas to finding matches for paths/twigs. In XML query processing, also need following variants of SJ:  Given Alist and Dlist, whenever x in Alist has a relative y in Dlist, output (x,y); else just output x. (structural outerjoin).  Given …, output x in Alist whenever there exists y in Dlist such that y is a relative of x. (structural semijoin.)  Given …, output x in Alist whenever it has no relative y in Dlist. (structural semi-antijoin.)

3 Tree-Merge Join (ordered by ancestor) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6),

4 a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6), (a2,d1), Tree-Merge Join (ordered by ancestor)

5 Tree-Merge Join (ordered by anc). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6), (a2,d1),(a3,d2), (a3,d3),

6 Tree-Merge Join (O.B. anc). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6), (a2,d1),(a3,d2), (a3,d3), (a4,d5), (a4,d6).

7 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1),

8 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2),

9 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3),

10 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4),

11 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5),

12 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5), (a1,d6),(a4,d6).

13 Which is more efficient? Tree-Merge-anc: time and space complexity – O(|Alist| + |Dlist| + |OutputList|). Note: it is not quadratic in input size. However, Tree-Merge-desc has quadratic worst-case time complexity.  Saw some evidence in previous example.  Here is another “bad” input: What is amount of the work done by Tree-Merge-desc on this input? a0 a1 a2 an d1d2dn

14 More analysis a1 a2 an d1 d2 dndn+1 d2n-1 d2n What about finding (par,child) pairs? Does the same upper bound apply for T-M-par? Consider the input below. The size of the o/p list is O(|Alist| + |Dlist|). What’s the amount of work done by T-M-par on this input? A breed of stack-tree SJ algorithms have been developed to overcome the deficiencies of T-M algorithms.

15 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1

16 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 a2

17 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 a2 (a1,d1), (a2,d1),

18 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1),

19 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), a3

20 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), a3

21 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), a3

22 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4),

23 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5), a4

24 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5),(a1,d6), (a4,d6). a4 Time & space complexity: O(|Alist| + |Dlist| + |Outputlist|). (for both ad and pc relationships!) Unlike T-M-anc, I/O complexity is similarly bounded (modulo blocking factor). Can handle streaming i/p lists: non-blocking algorithm. Stack-Tree-anc is similar with similar bounds.

25 Extensions Can you adapt the SJ algorithms to handle SJ variants mentioned before? Can you make the Tree-Merge algorithms more efficient, e.g., by bookkeeping? We have seen, a TPQ = a sequence of joins on the results of SJs; what’s the best way to order these joins? Can we reuse join order optimization from relational DB optimization? (what’s a right cost model?) What if (universal) quantifiers are present? How can we handle aggregation?