Download presentation
Presentation is loading. Please wait.
1
An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9
2
Outline Motivation Graph decomposition and compression of transitive closure Algorithm for graph decomposition – generating minimum set of disjoint chains -DAG stratification -Bipartite graphs -Virtual nodes and virtual node resolution Experiments Conclusion
3
Motivation Efficient method to evaluate graph reachability queries Graph reachability queries Given a directed graph G, check whether a node v is reachable from another node u through a path in G. Application In CAD/CAM, CASE, office systems, software management, geographical navigation and ontology queries, data are normally organized into a directed graph and the ancestor-descendant relationship of nodes (whether a node is reachable from another node through a path) is often enquired.
4
Graph decomposition and compression of transitive closure Graph decomposition Given a directed acyclic graph (DAG) G, generate a set of disjoint chains such that on each chain, if node v appears above node u, there is a path from v to u in G. Example e a b c d g h i f a b c d g h i f e
5
e Compression of transitive closure Assign to each node an index as follows: (1)Number each chain and number each node on a chain. (2)The jth node on the ith chain will be assigned a pair (i, j) In addition, each node v on the ith chain will be associated with an index sequence of length k - 1: (1, j 1 ) … (i – 1, j i-1 ) (i + 1, j i+1 ) … (k, j k ) such that any node with index (x, y) is a descendant of v if x = i and y < j or x i but y j x, where k is the number of the disjoint chains. Example a b c d g h i f a d g h i b c f e (3, 1), (1, 2)(2, _) (3, 2), (1, _)(2, 4) (3, 3), (1, _)(2, _) (2, 1), (1, 2)(3, 3) (2, 2), (1, 2)(3, 3) (2, 3), (1, 2)(3, _) (2, 4), (1, _)(3, _) (1, 1), (2, 2)(3, 3) (1, 2), (2, _)(3, _)
6
Algorithm for graph decomposition Minimum set of disjoint chains Stratification of DAGs Definition 1. Let G(V, E) be a DAG. The stratification of G is a decomposition of V into subsets V 1, V 2,..., V h such that V = V 1 V 2 ... V h and each node in V i has its children appearing only in V i-1,..., V 1 (i = 2,..., h), where h is the height of G, i.e., the length of the longest path in G. For each node v in V i, we say, its level is i, denoted l(v) = i. C j (v) (j < i) - a set of links with each pointing to one of v’s children, which appears in V j. Therefore, for each v in V i, there exist i 1,..., i k (i l < i, l = 1,..., k) such that the set of its children equals C i1 (v) ... C ik (v).
7
Algorithm for stratifying a DAG Algorithm graph-stratification(G) begin 1.V 1 := all the nodes with no outgoing edges; 2.for i = 1 to h - 1 do 3.{W := all the nodes that have at least one child in V i ; 4.for each node v in W do 5.{let v 1,..., v k be v’s children appearing in V i ; 6.C i (v) := {links to v 1,..., v k }; 7.if d(v) > k then remove v from W; 8.G := G\{(v, v 1 ),..., (v, v k )}; 9.d(v) := d(v) - k;} 10.V i+1 := W; 11.} end needs only O(n) time.
8
Example e a b c d g h i f d i e V1:V1: a f C 3 (a) = {c} V4:V4: C 3 (f) = {b} b g V3:V3: C 1 (b) = {i}, C 2 (b) = {c} C 1 (g) = {d}, C 2 (g) = {h} h c V2:V2: C 1 (c) = {d, e} C 1 (h) = {e, i}
9
Main idea to generate a minimum set of disjoint chains -Construct a series of bipartite graphs for G(V, E). Each bipartite graph is established for the nodes in V i and V i+1 (i = 1, 2, …, h - 1). -Find a maximum matching for each of such bipartite graphs using Hopcroft-Karp algorithm. -The union of all the maximum matchings make up a minimam set of disjoint chains. -Virtual nodes may be introduced into V i ’s to facilitate the computation.
10
Important concepts about bipartite graphs Definition 2. (bipartite graph) An undirected graph G(V, E) is bipartite if the node set V can be partitioned into two sets T and S in such a way that no two nodes from the same set are adjacent. We also denote such a graph as G(T, S; E). Definition 3. (matching) Let G(V, E) be a bipartite graph. A subset of edges E’ E is called a matching if no two edges have a common end node. A matching with the largest possible number of edges is called a maximum matching, denoted as M G. Covered node – a node v is covered by M if some edge of M is incident to v. Free node – a uncovered node. A path or cycle is alternating, relative to M, if its edges are alternately in E\M and M. A path is an augmenting path if it is an alternating path with free origin and terminus.
11
Example e a b c d g h i f b g hc V3:V3: V2:V2: b g hc M2:M2: c h id e V2:V2: V1:V1: c h id e M1:M1: a f b V4:V4: V3:V3: a f b M3:M3: M 1 M 2 M 3 : e a b c d g h i f The number of disjoint chains is 5. It is not minimum.
12
Virtual nodes M i - the found maximum matching of G(V i+1, V i ; C i ), where C i = C i (v 1 ) ... C i (v k ) with v l V i+1 (l = 1,..., k). M i ’ - the found maximum matching of G(V i+1, V i ’; C i ’), where V i ’ = V i {all the virtual nodes added into V i }. C i ’ = C i {(u, v) | u V i+1, v is a virtual node in V i ’. Definition 4. (virtual nodes) Let G(V, E) be a DAG, divided into V 1,..., V h (i.e., V = V 1 ... V h ). Let v be a free (actual or virtual) node in V i ’ (if i = 1, we take M 1 as M 1 ’). Add a virtual node v’ into V i+1 (i = 1,..., h - 1), labeled as follows.
13
1.If there exist some covered nodes u 1,..., u k (relative to M i ’) in V i ’ such that each u g (g = 1,..., k) shares a covered parent node w g (i.e., (w g, u g ) M i ’) with v, labeled v’ with v[(w 1, {(n 11, S 11 ),..., (n 1j 1, S 1j 1 )}),..., (w k, u k, {(n k1, S k1 ),..., (n k 1, S kjk )})], where n gj (g = 1,..., k; j = 1,..., j g ) is an odd number to indicate a position on the alternating path starting at w g, and S gj is a set containing all the parents of the node pointed to by n gj, which appear in V i+2. 2. If no such a covered node exists, v’ is labeled with v[ ]. In addition, for a virtual node v’ (generated for v), we will establish an edge (u, v’) for every u S 11 ... S 1j1 ... S k1... S kjk. v’ will also inherit the edges incident to v except the edges from a node in V i+1 to v. That is, for each parent w of v, we will establish an edge (w, v’) if w appears in V i+2. A virtual edge (v’, v) will be constructed to facilitate the virtual node resolution process. Finally, we set V i+1 ’ to be V i+1 {all those virtual nodes}, and C i+1 ’ to be C i+1 {(u, v) | u V i+2, v is a virtual nodes in V i+1 ’}.
14
Example e a b c d g h i f b g h c M2’:M2’: c h id e V2:V2: V1:V1: c h id e M1:M1: V4:V4: V3’:V3’: M3’:M3’: M 1 M 2 ’ M 3 ’: b g h c V3:V3: V2’:V2’: e’ a f b h’ a f b i a b c d g e’ e f h’ h e[(c, {(1, {b})}), (h, {(1, {g})})] h[(g, {(1, { }), (3, {a})})]
15
Virtual node resolution The virtual nodes will be resolved along the chains level by level in a top- down way: 1.If v’ is an unanchored node, remove v’ from the corresponding chain. If its child along the chain is also a virtual node, then that virtual node becomes unanchored. 2.If v’ is an anchored node, resolve it according the following rule. (i)Assume that v’ is reached along an edge (u, v’). Assume that v’ is labeled with v[(w 1, {(n 11, S 11 ),..., (n 1j1, S 1j1 )}),..., (w k, u k, {(n k1, S k1 ),..., (n k1, S kjk )})], (ii)If there exists an n ij such that u is a parent of the node pointed to by n ij, do the following operations:
16
-Transfer the edges on the alternating path starting at w i and ending at the (n ij + 1)th node w. Add (w i, v). -Remove (u, v’) and v’. -Add (u, w). Otherwise, remove v’ and connect u to the child node of v’ along the chain. Example i a b c d g e’ e f h i a b c d g e f h i a b c d g e f h’ h V3:V3: V2’:V2’: b g h c e’ b g h c a edge transfer alternating path relative to M 2 ’
17
Experiments Tested Methods -DAG decomposition - Jagadish’s heuristic (DD for short) -Tree encoding by Chen (TE for short), -2-hop labeling by Cohen et al. (2-hop for short) -Dual labeling by Wang et al. (Dual-II for short), -Matrix multiplication by Warren (MM for short), -ours (discussed in this paper).
18
Theoretical computational complexities - the number of the leaf nodes of the spanning tree of G t- the number of no-tree edges b- the width of G query timelabeling timespace overhead Graph-traversalO(e)00 DAG-decompositionO(logb)O(n 3 )O(bn) Tree-labeling O(log )O( e)O( n) Dual-IIO(logt)O(n + e + t 3 )O(n+ t 3 ) 2-hopO(e 1/2 )O(n 4 )O(ne 1/2 logn) Matrix-multiplicationO(1)O(n 3 )O(n 2 ) oursO(logb)O(be)O(bn)
19
Test results 1) Tests on Sparse Graphs: In this group of experiments, we tested a series of graphs with 15000 nodes. The edges are randomly generated, ranging from 16000 edges to 20000 edges. For each generated graph, Tarjan’s algorithm is used to find SCCs (strongly connected components) as a preprocessor. All SCCs are then removed. Size of data structure (16 bits) Time for generating TC (sec.) ours3912615.764 DD17078667.683 TE3035712.025 Dual-II3638942.227 2-hop80121724145 MM14063750675.812 Size of TC and Time for generating it:Query time:
20
2) Test on Non-sparse Graphs: In the second group of experiments, we mainly tested two types of DAGs: (1)DAG systematically generated (DSG) A DAG of 640 roots with about four children per non-leaf; about three parents per non-root, eight levels, 31525 nodes and 71786 edges. (2)DAG semi-randomly generated (DSRG) Any graph of this type is generated as follows: (i)construct a tree with each node having a random number of children from zero to six; (ii)the tree contains a minimum of 20000 nodes; and (iii)add randomly up to 10000 edges to the tree while ensuring that no cycle is formed. Number of nodes Number of edges Average out- degree Average path length DSG315257178638.0 DSR G 20004300032.310.11
21
789.70362114102MM 1269.35977182041Dual-II 53.253267831TE 182.261307460DD 21.572169853ours Time for generating TC (sec.) Size of data structure (16 bits) Size of DSG’s TC and Time for generating it: Query time: 286.23525010001MM 591.01531613640Dual-II 27.432200278TE 100.989356310DD 7.81368167ours Time for generating TC (sec.) Size of data structure (16 bits) Size of DSRG’s TC and Time for generating it: Query time: 0 100002000030000400005000060000700008000090000100000 number of queries t i m e ( s e c. ) * * * 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 MM ours DD TE dual-II * * * * *
22
3) Tests on Dense Graphs: In the third group of experiments, we have tested some DAGs with density near 0.25 (referred to as 0.25-DAG) Any graph of this type contains 3000 nodes connected by 2230196 edges generated randomly. The density of the graph is |E|/|V| 2 = 2230196/9000000 = 0.247. 141.99562500MM 2554.2181402622Dual-II 101.000209784TE 235.354444420DD 23.00096900ours Time for generating TC (sec.) Size of data structure (16 bits) Size of 0.25-DAG’s TC and Time for generating it: Query time: 010000 2000030000400005000060000700008000090000100000 number of queries t i m e ( s e c. 0.0 1.0 2.0 3.0 4.0 5.0 6.0 * ) 7.0 8.0 9.0 10.0 *
23
Conclusion Algorithm for decomposing a DAG into a minimum set of disjoint chains, based on Hopcroft-Karp algorithm -time complexity: O(n 2 + bnn 1/2 ) -space complexity: O(e + bn) Algorithm for compressing transitive closure -time complexity: O(be) -space complexity: O(bn) -query time: O(logb) Future work -Reducing graph labeling time -Reducing query time
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.