An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Graph Algorithms
Advertisements

2012: J Paul GibsonT&MSP: Mathematical FoundationsMAT7003/L2-GraphsAndTrees.1 MAT 7003 : Mathematical Foundations (for Software Engineering) J Paul Gibson,
Transitive Closure Compression Jan. 2013Yangjun Chen ACS Outline: Transitive Closure Compression Motivation DAG decomposition into node-disjoint.
1 Chapter 22: Elementary Graph Algorithms IV. 2 About this lecture Review of Strongly Connected Components (SCC) in a directed graph Finding all SCC (i.e.,
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Lectures on Network Flows
B + -Trees Sept. 2012Yangjun Chen ACS B + -Tree Construction and Record Searching in Relational DBs Chapter 6 – 3rd (Chap. 14 – 4 th, 5 th ed.; Chap.
Jan. 2013Dr. Yangjun Chen ACS Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature.
Yangjun Chen 1 Bipartite Graphs What is a bipartite graph? Properties of bipartite graphs Matching and maximum matching - alternative paths - augmenting.
1 Representing Graphs. 2 Adjacency Matrix Suppose we have a graph G with n nodes. The adjacency matrix is the n x n matrix A=[a ij ] with: a ij = 1 if.
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
Introduction to Graphs
Yangjun Chen 1 Bipartite Graph 1.A graph G is bipartite if the node set V can be partitioned into two sets V 1 and V 2 in such a way that no nodes from.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
1 A Distributed Delay-Constrained Dynamic Multicast Routing Algorithm Quan Sun and Horst Langendorfer Telecommunication Systems Journal, vol.11, p.47~58,
Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.
Efficiently Answering Reachability Queries on Large Directed Graphs Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU),
Reachability Queries Sept. 2014Yangjun Chen ACS Outline: Reachability Query Evaluation What is reachability query? Reachability query evaluation.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada.
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Important Problem Types and Fundamental Data Structures
9.2 Graph Terminology and Special Types Graphs
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
Chapter 2 Graph Algorithms.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Introduction Of Tree. Introduction A tree is a non-linear data structure in which items are arranged in sequence. It is used to represent hierarchical.
1 CS104 : Discrete Structures Chapter V Graph Theory.
Module #19: Graph Theory: part II Rosen 5 th ed., chs. 8-9.
Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong.
1 Closures of Relations: Transitive Closure and Partitions Sections 8.4 and 8.5.
Data Structures & Algorithms Graphs
ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.
COSC 2007 Data Structures II Chapter 14 Graphs I.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Graphs 2015, Fall Pusan National University Ki-Joune Li.
Implicit Representation of Graphs Paper by Sampath Kannan, Moni Naor, Steven Rudich.
A New Top-down Algorithm for Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Great Theoretical Ideas in Computer Science for Some.
Graph Representations And Traversals. Graphs Graph : – Set of Vertices (Nodes) – Set of Edges connecting vertices (u, v) : edge connecting Origin: u Destination:
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
Graphs and Shortest Paths Using ADTs and generic programming.
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
On the Intersection of Inverted Lists Yangjun Chen and Weixin Shen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.
1 Graph theory Outline A graph is an abstract data type for storing adjacency relations –We start with definitions: Vertices, edges, degree and sub-graphs.
A Linear-Space Top-down Algorithm for Tree Inclusion Problem
Chapter 5 : Trees.
Graph theory Definitions Trees, cycles, directed graphs.
Bipartite Graphs What is a bipartite graph?
Lectures on Network Flows
Outline: Transitive Closure Compression
Graphs All tree structures are hierarchical. This means that each node can only have one parent node. Trees can be used to store data which has a definite.
Lectures on Graph Algorithms: searching, testing and sorting
Introduction Wireless Ad-Hoc Network
1.3 Modeling with exponentially many constr.
ITEC 2620M Introduction to Data Structures
Bipartite Graph 1. A graph G is bipartite if the node set V can be partitioned into two sets V1 and V2 in such a way that no nodes from the same set are.
On the Graph Decomposition
Discrete Mathematics for Computer Science
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Important Problem Types and Fundamental Data Structures
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

Outline Motivation Graph decomposition and compression of transitive closure Algorithm for graph decomposition – generating minimum set of disjoint chains -DAG stratification -Bipartite graphs -Virtual nodes and virtual node resolution Experiments Conclusion

Motivation Efficient method to evaluate graph reachability queries Graph reachability queries Given a directed graph G, check whether a node v is reachable from another node u through a path in G. Application In CAD/CAM, CASE, office systems, software management, geographical navigation and ontology queries, data are normally organized into a directed graph and the ancestor-descendant relationship of nodes (whether a node is reachable from another node through a path) is often enquired.

Graph decomposition and compression of transitive closure Graph decomposition Given a directed acyclic graph (DAG) G, generate a set of disjoint chains such that on each chain, if node v appears above node u, there is a path from v to u in G. Example e     a b c d    g h i  f      a b c d    g h i  f e

e Compression of transitive closure Assign to each node an index as follows: (1)Number each chain and number each node on a chain. (2)The jth node on the ith chain will be assigned a pair (i, j) In addition, each node v on the ith chain will be associated with an index sequence of length k - 1: (1, j 1 ) … (i – 1, j i-1 ) (i + 1, j i+1 ) … (k, j k ) such that any node with index (x, y) is a descendant of v if x = i and y < j or x  i but y  j x, where k is the number of the disjoint chains. Example     a b c d    g h i f   a d    g h i    b c  f e (3, 1), (1, 2)(2, _) (3, 2), (1, _)(2, 4) (3, 3), (1, _)(2, _) (2, 1), (1, 2)(3, 3) (2, 2), (1, 2)(3, 3) (2, 3), (1, 2)(3, _) (2, 4), (1, _)(3, _) (1, 1), (2, 2)(3, 3) (1, 2), (2, _)(3, _) 

Algorithm for graph decomposition Minimum set of disjoint chains Stratification of DAGs Definition 1. Let G(V, E) be a DAG. The stratification of G is a decomposition of V into subsets V 1, V 2,..., V h such that V = V 1  V 2 ... V h and each node in V i has its children appearing only in V i-1,..., V 1 (i = 2,..., h), where h is the height of G, i.e., the length of the longest path in G. For each node v in V i, we say, its level is i, denoted l(v) = i. C j (v) (j < i) - a set of links with each pointing to one of v’s children, which appears in V j. Therefore, for each v in V i, there exist i 1,..., i k (i l < i, l = 1,..., k) such that the set of its children equals C i1 (v) ...  C ik (v).

Algorithm for stratifying a DAG Algorithm graph-stratification(G) begin 1.V 1 := all the nodes with no outgoing edges; 2.for i = 1 to h - 1 do 3.{W := all the nodes that have at least one child in V i ; 4.for each node v in W do 5.{let v 1,..., v k be v’s children appearing in V i ; 6.C i (v) := {links to v 1,..., v k }; 7.if d(v) > k then remove v from W; 8.G := G\{(v, v 1 ),..., (v, v k )}; 9.d(v) := d(v) - k;} 10.V i+1 := W; 11.} end needs only O(n) time.

Example e     a b c d    g h i  f  d  i e V1:V1:  a  f C 3 (a) = {c} V4:V4: C 3 (f) = {b}  b  g V3:V3: C 1 (b) = {i}, C 2 (b) = {c} C 1 (g) = {d}, C 2 (g) = {h} h  c  V2:V2: C 1 (c) = {d, e} C 1 (h) = {e, i}

Main idea to generate a minimum set of disjoint chains -Construct a series of bipartite graphs for G(V, E). Each bipartite graph is established for the nodes in V i and V i+1 (i = 1, 2, …, h - 1). -Find a maximum matching for each of such bipartite graphs using Hopcroft-Karp algorithm. -The union of all the maximum matchings make up a minimam set of disjoint chains. -Virtual nodes may be introduced into V i ’s to facilitate the computation.

Important concepts about bipartite graphs Definition 2. (bipartite graph) An undirected graph G(V, E) is bipartite if the node set V can be partitioned into two sets T and S in such a way that no two nodes from the same set are adjacent. We also denote such a graph as G(T, S; E). Definition 3. (matching) Let G(V, E) be a bipartite graph. A subset of edges E’  E is called a matching if no two edges have a common end node. A matching with the largest possible number of edges is called a maximum matching, denoted as M G. Covered node – a node v is covered by M if some edge of M is incident to v. Free node – a uncovered node. A path or cycle is alternating, relative to M, if its edges are alternately in E\M and M. A path is an augmenting path if it is an alternating path with free origin and terminus.

Example e     a b c d    g h i  f  b   g hc V3:V3: V2:V2:   b   g hc M2:M2:   c   h id  e V2:V2: V1:V1:   c   h id  e M1:M1:   a  f b V4:V4: V3:V3:   a  f b M3:M3:  M 1  M 2  M 3 : e     a b c d    g h i  f The number of disjoint chains is 5. It is not minimum.

Virtual nodes M i - the found maximum matching of G(V i+1, V i ; C i ), where C i = C i (v 1 ) ...  C i (v k ) with v l  V i+1 (l = 1,..., k). M i ’ - the found maximum matching of G(V i+1, V i ’; C i ’), where V i ’ = V i  {all the virtual nodes added into V i }. C i ’ = C i  {(u, v) | u  V i+1, v is a virtual node in V i ’. Definition 4. (virtual nodes) Let G(V, E) be a DAG, divided into V 1,..., V h (i.e., V = V 1 ...  V h ). Let v be a free (actual or virtual) node in V i ’ (if i = 1, we take M 1 as M 1 ’). Add a virtual node v’ into V i+1 (i = 1,..., h - 1), labeled as follows.

1.If there exist some covered nodes u 1,..., u k (relative to M i ’) in V i ’ such that each u g (g = 1,..., k) shares a covered parent node w g (i.e., (w g, u g )  M i ’) with v, labeled v’ with v[(w 1, {(n 11, S 11 ),..., (n 1j 1, S 1j 1 )}),..., (w k, u k, {(n k1, S k1 ),..., (n k 1, S kjk )})], where n gj (g = 1,..., k; j = 1,..., j g ) is an odd number to indicate a position on the alternating path starting at w g, and S gj is a set containing all the parents of the node pointed to by n gj, which appear in V i If no such a covered node exists, v’ is labeled with v[ ]. In addition, for a virtual node v’ (generated for v), we will establish an edge (u, v’) for every u  S 11 ...  S 1j1 ...  S k1...  S kjk. v’ will also inherit the edges incident to v except the edges from a node in V i+1 to v. That is, for each parent w of v, we will establish an edge (w, v’) if w appears in V i+2. A virtual edge (v’, v) will be constructed to facilitate the virtual node resolution process. Finally, we set V i+1 ’ to be V i+1  {all those virtual nodes}, and C i+1 ’ to be C i+1  {(u, v) | u  V i+2, v is a virtual nodes in V i+1 ’}.

Example e     a b c d    g h i  f  b   g h c M2’:M2’:   c   h id  e V2:V2: V1:V1:   c   h id  e M1:M1:  V4:V4: V3’:V3’: M3’:M3’: M 1  M 2 ’  M 3 ’:  b   g h c V3:V3: V2’:V2’:   e’   a  f b   h’  a  f b   i     a b c d    g e’ e  f  h’  h e[(c, {(1, {b})}), (h, {(1, {g})})] h[(g, {(1, { }), (3, {a})})]

Virtual node resolution The virtual nodes will be resolved along the chains level by level in a top- down way: 1.If v’ is an unanchored node, remove v’ from the corresponding chain. If its child along the chain is also a virtual node, then that virtual node becomes unanchored. 2.If v’ is an anchored node, resolve it according the following rule. (i)Assume that v’ is reached along an edge (u, v’). Assume that v’ is labeled with v[(w 1, {(n 11, S 11 ),..., (n 1j1, S 1j1 )}),..., (w k, u k, {(n k1, S k1 ),..., (n k1, S kjk )})], (ii)If there exists an n ij such that u is a parent of the node pointed to by n ij, do the following operations:

-Transfer the edges on the alternating path starting at w i and ending at the (n ij + 1)th node w. Add (w i, v). -Remove (u, v’) and v’. -Add (u, w). Otherwise, remove v’ and connect u to the child node of v’ along the chain. Example i     a b c d    g e’ e  f  h i     a b c d   g e  f  h i     a b c d    g e  f  h’  h V3:V3: V2’:V2’:  b   g h c   e’  b   g h c    a  edge transfer alternating path relative to M 2 ’

Experiments Tested Methods -DAG decomposition - Jagadish’s heuristic (DD for short) -Tree encoding by Chen (TE for short), -2-hop labeling by Cohen et al. (2-hop for short) -Dual labeling by Wang et al. (Dual-II for short), -Matrix multiplication by Warren (MM for short), -ours (discussed in this paper).

Theoretical computational complexities  - the number of the leaf nodes of the spanning tree of G t- the number of no-tree edges b- the width of G query timelabeling timespace overhead Graph-traversalO(e)00 DAG-decompositionO(logb)O(n 3 )O(bn) Tree-labeling O(log  )O(  e)O(  n) Dual-IIO(logt)O(n + e + t 3 )O(n+ t 3 ) 2-hopO(e 1/2 )O(n 4 )O(ne 1/2 logn) Matrix-multiplicationO(1)O(n 3 )O(n 2 ) oursO(logb)O(be)O(bn)

Test results 1) Tests on Sparse Graphs: In this group of experiments, we tested a series of graphs with nodes. The edges are randomly generated, ranging from edges to edges. For each generated graph, Tarjan’s algorithm is used to find SCCs (strongly connected components) as a preprocessor. All SCCs are then removed. Size of data structure (16 bits) Time for generating TC (sec.) ours DD TE Dual-II hop MM Size of TC and Time for generating it:Query time:

2) Test on Non-sparse Graphs: In the second group of experiments, we mainly tested two types of DAGs: (1)DAG systematically generated (DSG) A DAG of 640 roots with about four children per non-leaf; about three parents per non-root, eight levels, nodes and edges. (2)DAG semi-randomly generated (DSRG) Any graph of this type is generated as follows: (i)construct a tree with each node having a random number of children from zero to six; (ii)the tree contains a minimum of nodes; and (iii)add randomly up to edges to the tree while ensuring that no cycle is formed. Number of nodes Number of edges Average out- degree Average path length DSG DSR G

MM Dual-II TE DD ours Time for generating TC (sec.) Size of data structure (16 bits) Size of DSG’s TC and Time for generating it: Query time: MM Dual-II TE DD ours Time for generating TC (sec.) Size of data structure (16 bits) Size of DSRG’s TC and Time for generating it: Query time: number of queries t i m e ( s e c. ) * * * MM ours DD TE dual-II * * * * *

3) Tests on Dense Graphs: In the third group of experiments, we have tested some DAGs with density near 0.25 (referred to as 0.25-DAG) Any graph of this type contains 3000 nodes connected by edges generated randomly. The density of the graph is |E|/|V| 2 = / = MM Dual-II TE DD ours Time for generating TC (sec.) Size of data structure (16 bits) Size of 0.25-DAG’s TC and Time for generating it: Query time: number of queries t i m e ( s e c * ) *

Conclusion Algorithm for decomposing a DAG into a minimum set of disjoint chains, based on Hopcroft-Karp algorithm -time complexity: O(n 2 + bnn 1/2 ) -space complexity: O(e + bn) Algorithm for compressing transitive closure -time complexity: O(be) -space complexity: O(bn) -query time: O(logb) Future work -Reducing graph labeling time -Reducing query time