Efficiently Answering Reachability Queries on Large Directed Graphs Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU),

Slides:



Advertisements
Similar presentations
Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.
Advertisements

Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA.
The Volcano/Cascades Query Optimization Framework
GRAIL: Scalable Reachability Index for Large Graphs VLDB2010 Vineet Chaoji Mohammed J. Zaki.
More Graphs COL 106 Slides from Naveen. Some Terminology for Graph Search A vertex is white if it is undiscovered A vertex is gray if it has been discovered.
Graph Searching (Graph Traversal) Algorithm Design and Analysis Week 8 Bibliography: [CLRS] – chap 22.2 –
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lectures 3 Tuesday, 9/25/01 Graph Algorithms: Part 1 Shortest.
CS Lecture 9 Storeing and Querying Large Web Graphs.
An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage.
Connected Components, Directed Graphs, Topological Sort COMP171.
CS728 Lecture 16 Web indexes II. Last Time Indexes for answering text queries –given term produce all URLs containing –Compact representations for postings.
© 2004 Goodrich, Tamassia Directed Graphs1 JFK BOS MIA ORD LAX DFW SFO.
Efficient and Effective Itemset Pattern Summarization: Regression-based Approaches Ruoming Jin Kent State University Joint work with Muad Abu-Ata, Yang.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
1 Directed Graphs CSC401 – Analysis of Algorithms Lecture Notes 15 Directed Graphs Objectives: Introduce directed graphs and weighted graphs Present algorithms.
TTIT33 Alorithms and Optimization – DALG Lecture 4 Graphs HT TTIT33 Algorithms and optimization Algorithms Lecture 4 Graphs.
Connected Components, Directed Graphs, Topological Sort Lecture 25 COMP171 Fall 2006.
CS344: Lecture 16 S. Muthu Muthukrishnan. Graph Navigation BFS: DFS: DFS numbering by start time or finish time. –tree, back, forward and cross edges.
Directed Graphs1 JFK BOS MIA ORD LAX DFW SFO. Directed Graphs2 Outline and Reading (§6.4) Reachability (§6.4.1) Directed DFS Strong connectivity Transitive.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 10 Instructor: Paul Beame.
Connected Components, Directed graphs, Topological sort COMP171 Fall 2005.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
TDDB56 DALGOPT-D TDDB57 DALG-C – Lecture 11 – Graphs Graphs HT TDDB56 – DALGOPT-D Algorithms and optimization Lecture 11 Graphs.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
May 5, 2015Applied Discrete Mathematics Week 13: Boolean Algebra 1 Dijkstra’s Algorithm procedure Dijkstra(G: weighted connected simple graph with vertices.
TEDI: Efficient Shortest Path Query Answering on Graphs Author: Fang Wei SIGMOD 2010 Presentation: Dr. Greg Speegle.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Network Aware Resource Allocation in Distributed Clouds.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
An Introduction to Network Science and Network Data Management Ruoming Jin Department of Computer Science Kent State University.
May 1, 2002Applied Discrete Mathematics Week 13: Graphs and Trees 1News CSEMS Scholarships for CS and Math students (US citizens only) $3,125 per year.
Introduction to Graphs. Introduction Graphs are a generalization of trees –Nodes or verticies –Edges or arcs Two kinds of graphs –Directed –Undirected.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Graphs.
Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong.
Fast and practical indexing and querying of very large graphs Silke Triβl, Ulf Leser Humboldt-Universitat zu Berlin Presenter: Liwen Sun (Stephen) SIGMOD’07.
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Connectivity1 Connectivity and Biconnectivity connected components cutvertices biconnected components.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
LOCALIZED MINIMUM - ENERGY BROADCASTING IN AD - HOC NETWORKS Paper By : Julien Cartigny, David Simplot, And Ivan Stojmenovic Instructor : Dr Yingshu Li.
© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
Graph Representations And Traversals. Graphs Graph : – Set of Vertices (Nodes) – Set of Edges connecting vertices (u, v) : edge connecting Origin: u Destination:
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
CS 361 – Chapter 13 Graph Review Purpose Representation Traversal Comparison with tree.
Directed Graphs1 JFK BOS MIA ORD LAX DFW SFO. Directed Graphs2 Outline and Reading (§12.4) Reachability (§12.4.1) Directed DFS Strong connectivity Transitive.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
CSC317 1 At the same time: Breadth-first search tree: If node v is discovered after u then edge uv is added to the tree. We say that u is a predecessor.
Applied Discrete Mathematics Week 15: Trees
Greedy & Heuristic algorithms in Influence Maximization
Depth-First Search.
Special Graphs: Modeling and Algorithms
Directed Graphs Directed Graphs 1 Shortest Path Shortest Path
More Graph Algorithms.
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Connected Components, Directed Graphs, Topological Sort
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Incremental Maintenance of XML Structural Indexes
And the Final Subject is…
Relaxing Join and Selection Queries
Distance-Constraint Reachability Computation in Uncertain Graphs
For Friday Read chapter 9, sections 2-3 No homework
Presentation transcript:

Efficiently Answering Reachability Queries on Large Directed Graphs Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU), and Haixun Wang (IBM T.J. Watson)

Reachability Query ?Query(1,11) Yes ?Query(3,9) No The problem: Given two vertices u and v in a directed graph G, is there a path from u to v ? Directed Graph  DAG (directed acyclic graph) by coalescing the strongly connected components

Applications XML Biological networks Ontology Knowledge representation (Lattice operation) Object programming (Class relationship) Distributed systems (Reachable states) Graph Databases

MethodQuery timeConstructionIndex size DFS/BFSO(n+m) Transitive ClosureO(1)O(nm)/O(n 3 )O(n 2 ) Optimal Chain Cover (Jagadish, TODS’90) O(k)O(nm)O(nk) Optimal Tree Cover (Agrawal et al., SIGMOD’89) O(n)O(nm)O(n 2 ) Dual-Labeling (Wang et al., ICDE’06) O(1)O(n+m+t 3 )O(n+t 2 ) Labeling+SSPI (Chen et al., VLDB’05) O(m-n)O(n+m) GRIPP (Triβl et al., SIGMOD’07) O(m-n)O(n+m) Prior Work 2-HOP (O(nm 1/2 ), and O(n 4 )), HOPI, and heuristic algorithms

Limitation of Tree-based approaches Finding a good tree cover is expensive Tree cover cannot represent some common types of DAGs, like Grid Compression limitations –Chain (1-parent, 1-child) –Tree (1-parent, multiple children) –Most existing methods which utilize the tree cover are greatly affected by how many edges are left uncovered

Overview of Path-Tree Chain->Tree->Path-Tree (2 parents / multiple children) Path-tree cover is a spanning subgraph of G in a tree shape (T) A node in the tree T corresponds to a path in G and an edge in T corresponds to the edges between two paths in G 3-tuple labeling exists for any path-tree to answer reachability query in O(1)

Path-Tree in a Nutshell P1 P2 P3 P4 P1 P2 P3 P4 Path-Graph is not necessarily a planar graph The reachability between any two nodes can be answered in O(1)

Key Problems How to construct a path-tree? –Algorithm How can a path-tree help with reachability queries? –Labeling –Transitive Closure Compression How does path-tree compare with the existing methods? –Optimality

Constructing Path-Tree Step 1: Path-Decomposition of DAG Step 2: Minimal Equivalent Edge Set between any two paths Step 3: Path-Graph Construction Step 4: Path-Tree Cover Extraction

Step 1: Path-Decomposition P1 P2 P3 P4 (PID,SID) =(2, 5) For any two nodes (u, v) in the same path, u  v if and only if (u.sid  v.sid) Simple linear algorithm based on topological sort can achieve a path-decomposition

Step 2: Minimal equivalent edge set P1 P2 P1  P2 The reachability between any two paths can be captured by a unique minimal set of edges P1 P2 P1  P2 The edges in the minimal equivalent edge set do not cross (always parallel)!

Step 3: Path-Graph Construction P1 P2 P3 P4 P1 P2 P3 P Weighted Directed Path-Graph Weight reflects the cost we have to pay for the transitive closure computation if we exclude this path-tree edge

Step 4: Extracting Path-Tree Cover P1 P2 P3 P Weighted Directed Path-Graph P1 P2 P3 P Maximal Directed Spanning Tree Chu-Liu/Edmonds algorithm, O(m’+ k logk )

Key Problems How to construct a path-tree? –Algorithm How can path-tree help with reachability queries? –Labeling –Transitive Closure Compression How does path-tree compare with the existing methods? –Optimality

3-Tuple Labeling for Reachability P1 P2 P3 P4 P1 P2 P3 P4 DFS labeling (1-tuple) Interval labeling (2-tuple) High-level description about paths Pi  Pj ? [1,1] [2,2] [1,3] [1,4]

DFS labeling P1 P2 P3 P4 1.Starting from the first vertex in the root-path 2.Always try to visit the next vertex in the same path 3.Label a node when all its neighbors has been visited L(v)=N-x, x is the # of nodes has been labeled

3-Tuple Labeling for Reachability P1 P2 P P1 P2 P3 P4 [1,1] [2,2] [1,3] [1,4] u  v if and only if 1) Interval label I(u)  I(v) 2) DFS label L(u)  L(v) ?Query(9,15) P4[1,4]  P1[1,1] and 5 < 15 Yes ?Query(9,2) ?Query(5,9) P3

Transitive Closure Compression An efficient procedure can compute and compress the transitive closure in O(mk), k is number of paths in path-tree Path-tree cover (including labeling) can be constructed in O(m + n logn)

Key Problems How to construct a path-tree? –Algorithm How can path-tree help with reachability query? –Labeling –Transitive Closure Compression How does path-tree compare with the existing methods? –Optimality

Theoretical Analysis Optimal Path-Tree Cover (OPTC) Problem: –Given a path-decomposition, what is the optimal path- tree cover to maximally compress the transitive closure? –OptIndex weight assignment based on computing the predecessor set Optimal Path-Decomposition (OPD) Problem: –Assuming we only use path-decomposition to compress the transitive closure, what is the optimal path-decomposition to maximally compress the transitive closure? –Minimal-cost flow problem –What is the overall optimal path-decomposition?

Superiority of Path-Tree Cover The optimal tree cover is a special case of path-tree cover when each vertex corresponds to a single path and the weight is based on OptIndex. The path-tree cover approach can compress the transitive closure with size being smaller than or equal to the optimal tree cover approach (and consequently optimal chain cover approach).

Experimental Evaluation Implementation in C++ 12 Real datasets used in Dual-labeling paper and GRIPP paper Synthetic datasets –Sparse DAG with edge density = 2 AMD Opteron 2.0GHz/ 2GB/ Linux PTree1 (OptIndex) and PTree2 –Mainly compare with Optimal Tree Cover

Real Datasets Graph Name#V#EDAG #VDAG #E AgroCyc aMaze Anthra Ecoo HpyCyc Human Kegg Mtbrv Nasa Reactome Vchocyc Xmark

Experimental Result (Real Data) Transitive Closure SizeConstruction Time (in ms)Query Time (in ms) TreePtree-1Ptree-2TreePtree-1Ptree-2TreePtree-1Ptree-2 AgroCyc aMaze Anthra Ecoo HpyCyc Human Kegg Mtbrv Nasa Reactome Vchocyc Xmark On average 10 times better than TreeOn average 3 times better than Tree

Experimental Result (Synthetic Data)

Conclusion A novel Path-Tree structure is proposed to assist the compression of transitive closure and answering reachability query Path-tree has potential to integrate with other existing methods to further improve the efficiency of reachability query processing

Thanks!!

Step 3: Path-Graph Construction P1 P2 P3 P4 P1 P2 P3 P Weighted Directed Path-Graph Weight reflects the penalty if we exclude this path-tree edge

Step 2: Constructing Minimal Equivalent Edge Set (Pi  Pj) P1 P2 P1  P2 1.Ordering the vertices in Pi and Pj by decreasing order 2.Finding the first vertex v in P_j that P_i can reach 3.Finding the last vertex u in P_i that reach v 4.Removing all the edges cross (u,v) and repeat 2-4

3-Tuple Labeling for Reachability P1 P2 P3 P4 P1 P2 P3 P4 DFS labeling (1-tuple) Interval labeling (2-tuple) High-level description about paths Pi  Pj ? [1,1] [2,2] [1,3] [1,4]