Next we build a ShortestPathtree, SPG1 for G1

Next we build a ShortestPathtree, SPG1 for G1
A graph is a set of vertices, V, and a set of edges, E, each connecting a pair of those vertices. An edge from vertex h to vertex k is realized as the unordered set, {h,k} (or just hk), and can be viewed as an undirected line from h to k. We can either list the edges in a two column table (the Edge Table) or we can use a 3 column table in which the first 2 columns list all possible vertex pairs (in raster order) and the third column is a bit map indicating with a 1-bit the pairs that are edges and with a 0-bit the pairs that are not edges. This second option is called the edge map or edge mask and is shown below for a small graph, G1. The edge map obviously has |V|2 rows. If the raster ordering is always assumed, the edge map is just a single column of bits. The edge map can be compressed into a pTree (predicate Tree) by dividing the bits up into “strides” of |V| bits each (4 for G1). This forms the lowest level of the pTree (level_0) and an upper level (level_1), indicates the truth of the predicate, “Not Purely Zeros” for the respective level_0 pTrees, and can be used to avoid retrieving level_0 pTrees that are purely zeros. We use the notation Ek for the kth level_0 pTree (which bitmaps the endpoints of the edges adjacent to vertex k.) and call it the Edge pTree of k A Path is a sequence of edges connecting a sequence of vertices, distinct, except for endpts. A Simple Path (assumed throughout) disallows simple loops, (v,v) Next we build a ShortestPathtree, SPG1 for G1 It starts with Level_0 of the EdgeTree.  vertex, k, this gives us a mask, Sk, of the end pts of edges adjacent to vertex k (shortest path of Length 1 starting at k). The complement of Ek (with k turned off) gives us the endpoints that never need to be considered again (since all shortest paths from k to these vertices hve been found). We call these pTrees the “Not Reached Yet masks” or “N masks”. M1 1 M2 M3 M4 Vertex Masks 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 Edges V1 V2 E 1 1_ Edge Map E1 1 E2 E3 E4 2-Level Stride=4, Edge pTree Level1 1 2 3 4 We use the notation, Shk for the map of the endpts of Shortest Paths thru h then k (obviously of length=2) and NLv for the map of vertices not reached by lengthL shortest paths from vertex v S1 1 2 S2 1 S3 1 2 S4 1 3 N11 1 N12 1 2 N13 1 N14 We can avoid these calculations by noting Ct(N14 )=0. S13 =N11&E3 S14 =N11&E4 1 S24 =N12&E4 1 2 S31 =N13&E1 S34 =N13&E4 1 S41 =N14&E1 S42 =N14&E2 S43 =N14&E3 N21=N11& (S13|S14)’ N22=N12& (S24)’ N23=N13& (S31|S34)’ N24=N14& (S41|S42|S43)’ This entire level is unnecessary to construct since |N2k|=0 k. The SPTree is shown by the green links. S142 =N21&E2 S241 =N22&E1 S243 =N22&E3 S312 =N23&E1 S342 =N23&E2 The connectivity components can be deduced from the zero set of the final NLks. Girvan and Newman started a flurry of research by suggesting the graph could be edge labelled by an edge_between-ness measurement (which counts the shortest path participations of the edge) and that a graph could be usefully partitioned (into strongly connected components) by the divisive hierarchical clustering of removing edges in desc order of between-ness. Btwn14= Btwn24= Btwn34=1

G7.1 n r f a g v k d u y e 4 l 9 5 6 x j h 3 2 1 o 7 t b s 8 q w i m p
At this point every vertex has been reached from 1 (radius from 1 r1=3 (3 hops gets you from 1 to any k). A ks s.t. rk is min are some sort of centroids? Useful? But to get all shortest paths from 1 we need to finish the 3 hop analysis (but need not do any 4 hop analysis). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Ct S 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y 1 g 2 1 9 3 1 a 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5 a 1 2 b 1 3 c 1 d 1 2 e 1 5 f 1 2 g 1 2 h 1 2 i 1 2 j 1 2 k 1 3 l 1 2 m 1 2 n 1 2 o 1 5 p 1 3 q 1 3 r 1 2 s 1 4 t 1 3 u 1 4 v 1 4 w 1 6 x 1 c y 1 h N1 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y 1 h 2 1 o 3 1 n 4 1 r 5 1 u 6 1 t 7 1 t 8 1 t 9 1 s a 1 v b 1 u c 1 d 1 v e 1 s f 1 v g 1 v h 1 v i 1 v j 1 v k 1 u l 1 v m 1 v n 1 v o 1 s p 1 u q 1 u r 1 v s 1 t t 1 u u 1 t v 1 t w 1 6 x 1 l y 1 g 1 2 v S 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v w x y 1 3 a 1 3 s 1 3 t 1 3 x 7 1 6 h 1 7 h 1 9 v 1 9 x 7 1 9 y 8 1 e y 8 1 k y 8 1 w p 1 w q 1 w t 1 w x 7 1 w y 8 1 2 1 3 4 1 4 1 5 1 6 1 7 1 8 1 9 3 1 b 1 c 1 d 1 e 1 i 1 k 1 m 1 w 5 N1 1 h N2 1 8 N3 1 n Friendships between 34 members of a karate club. Coloring is GN’s between-ness partition. r f a G7.1 g v k d u y e 4 l 9 5 6 x j h 3 2 1 o 7 t b s 8 q w i m p c

Counting SP participations: If a12= the count of 12 participations and a21=the count of 21 participations, then the full participation is a12+a21+a12a21 since  21 SP, 1—2 will participate in the middle of another participation  12 participation. So the problem is computing all ahk correctly (Loops are trouble!) G SP 1 2 4 3 6 7 5 N1 1 2 3 4 5 6 7 N2k = N1k & (ORhListEkSkh)’ So N21= S12|S13|S14|S16)’ hListEk Shk = Ek&N1h 1 2 3 4 b c d e h i f g 5 6 9 a 7 8 We can now deduce the graph is connected, |N21|=0  CC1=all. 1 2 3 4 6 N1 S 2 1 3 4 N1 S S 3 1 2 4 N1 4 1 2 3 N1 S S 6 5 1 2 N1 S 6 5 N1 1 7 3 S 6 7 1 2 N1 N2 1 N2 2 1 3 N2 1 2 4 N2 1 2 5 N2 1 3 N2 6 7 N2 1 3 2 4 8 So Btwn1—2 = Btwn1—b = Btwn1—c = Btwn2—3 = Btwn2—4 = = 56 S 2 1 6 N2 S 1 3 6 2 N2 S 1 4 6 2 N2 S N2 5 1 6 7 3 S N2 7 1 6 5 3 Btwnb—d = Btwnc—e = Btwn3—5 = Btwn4—6 = = 45 N3 2 N3 3 N3 4 N3 5 N3 6 Btwnd—f = Btwnd—g = Btwne—h = Btwne—i = Btwnr—7 = Btwn5—8 = Btwn6—9 = Btwn6—a = = 17 Radius is an interesting (centroid?) vertex label. And MIN(ahk , akh ) is an interesting (Centroid?) edge label. The min # of SP hops from an edge in either direction is another edge radius (much like the vertex radius?). Removing the zero pTrees, the Shortest Path tree for G2 is (using colors for pointers): 1 4 2 3 5 6 7 BtwnsGN_DNIv: b Btwn1_DNIv Btwn2_DNIv 1 j j j j 2 f f 3 f Let’s see if the shortcut formula works Btwnh—k= ahk + akh + ahkakh Btwn1—2= (1+2) + 0*(1+2) = 3 1 2 4 3 6 7 5 Breaking ties: left-to-right, top-to-bottom 1 2 4 3 6 7 5 Breaking ties: right-to-left, top-to-bottom 1 2 4 3 6 7 5 Breaking ties: right-to-left, bottom-to-top Same ordering as Btwn1 Btwn1—3= (1+2) + 0*(1+2) = 3 1 2 4 3 6 7 5 1 6 2 3 4 5 7 Btwn1—4= (1+2) + 0*(1+2) = 3 Breaking ties in these ways is arbitrary. We do have the count info below as well (for free). We could break this tie by maximizing the sum of the hub adjacency counts (>1 counts): 2+3=5 is max here and it gives a final dendogram partition isomorphic to BtwmGN_DNIv. Btwn1—6= * = 11 Btwn2—3= * = 0 Btwn2—4= * = 0 Btwn3—4= * = 0 Btwn5—6= (2+3) (2+3)*0 = 5 2 1 6 3 4 5 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Btwn6—7= (2+3) +(2+3)*0 = 5

Btwn2.4hk=(|Ph|-1)+(|Pk|-1)+(|Ph|-1)*(|Pk|-1) with DNIv, DNB(4-cliques
Computing actual between-ness is a massive effort. We approximate it cheaply by the edge label Btwn1 which adds Shortest Paths of length 2 starting at h and going thru k to Shortest Paths of length 2 starting at k and going thru h (Btwn1hk=(|Ph|-1)+(|Pk|-1) +1, so Btwn112= 3+3+1=7 5CLQs are 12348, 1234e so 12348e is a 6v 1plex. 4v subsets are 4CLQs n 1 2 6 7 8 3 4 5 1 2 678 Count=3 2 1 345 Count=3 r f a But what about 1—2 participation in these 9 Shortest Paths? They would bring the count to 15 and required only adding the product of two numbers already used (no ANDs). We add two stopping rules: Do Not Isolate vertices (cheap: never let an adjacency count fall to 0) and Do Not Break 4-cliques (expensive: have to have mined all 4cliques first.) g v k d u y e 4 5 l 9 x 6 1 2 8 5 7 6 4 3 j 3 h 2 1 o 7 t b s w 8 q i m Btwn2.4hk=(|Ph|-1)+(|Pk|-1)+(|Ph|-1)*(|Pk|-1) with DNIv, DNB(4-cliques The red arrow shows the closest partition to color partition but how does one determine that? To build the dendogram one has to check for  the split off of a connectivity comp after each delete (very expensive!). Or do it 1 time after the final delete by checking adjacency counts to get hubs? p c Btwn2.3hk=(|Ph|-1)+(|Pk|-1)+(|Ph|-1)*(|Pk|-1), DNIv, DNB(3clqs) f g j l n x 9 v 1 c 4 d e 2 8 i k m 2 a p q o s t w r u y 5 b 7 6 h n f g j l n x 9 v 1 c 4 d e 2 8 i k m 2 a p q o s t w r u y 5 b 6 7 h r f a f g j l n x 9 v 1 c 4 d e 2 8 i k m 2 a p q o s t w r u y 5 b 6 7 h g v k d f g j l n x 9 v 1 c 4 d e 2 8 i k m 2 a p q o s t w r u y 5 b 6 7 h u y e 4 f g j l n x 9 v 1 c 4 d e 2 8 i k m 2 a p q o s t w r u y 5 b 6 7 h 5 l 9 x 6 f g j l n x 1 c 4 d e 2 8 i k m 9 v 2 a p q o s t w r u y 5 b 6 7 h j 3 h 2 1 7 1 c 4 d e 2 8 i k m 9 v f g j l n x 2 a p q o s t w r u y 5 b 6 7 h Zero partitioning results, suggesting the DNB(3-clqs) rule is too strong for G7. o t b s 1 c 4 d e 2 8 i k m 9 v f g j l n x 3 a p q o s t w r u y 5 b 6 7 h DENDOGRAM 8 q w i m Btwn1 is a+b and Btwn2 is a+b+ab+1 where ahk=|Ph|-1 and bhk=|Pk|-1. These numbers are still not quite right. If there are any 3 cliques those number will be way off, e.g.: p c Btwn3hk=(|Ph|-1-|Ph&Pk|)+(|Pk|-1-|Ph&Pk|)+(|Ph|-1-|Ph&Pk|)(|Pk|-1-|Ph&Pk|) with DNIv n r f 1 2 5=8 3=6 4=7 a 1 2 6 7 5=8 3 4 1 2 7 5=8 3=6 4 g v k d u y e 4 5 In this case we should subtract |Ph&Pk| from a and b because 1—2 does not participate in any shortest paths involving 5=8. That means using a=|Ph|-1-|Ph&Pk| and b=|Pk|-1-|Ph&Pk| gives a more accurate between-ness measure (but far more expensive to calculate due to the &). If there is one 3-clique, a=2 and b=2 so Btwn1=4 and Btwn2=9. If there are two 3-cliques, a=1 and b=1 so Btwn1=2 and Btwn2=4. If there are three 3-cliques, a=0 and b=0 so Btwn1=0 and Btwn2=1. So Btwn3hk=a+b+ab+1 with these formulas for a and b may give a vastly different partition. l 9 6 x j 3 h 2 1 o 7 t b s 8 q w i m p c

y x 203 x 3 119 y w 101 w 1 95 y e 84 y 9 84 y o 84 e 1 79 x w 71 y u 67 y v 67 y s 67 x 9 59 x o 59 y t 50 y k 50 e 3 49 b 1 47 k 1 47 x u 47 x v 47 e 2 44 s 3 39 v 2 35 y j 33 y l 33 y a 33 y n 33 y g 33 y f 33 y r 33 d 1 31 i 1 31 m 1 31 e 4 29 t 3 29 k 2 26 x l 23 x n 23 x j 23 x f 23 x g 23 a 3 19 s o 19 u o 19 v 9 19 i 2 17 m 2 17 w t 17 w p 17 w q 17 c 1 15 q o 14 b 6 11 d 4 11 s p 11 b 5 8 q p 8 h 7 7 h 6 7 u r 7 Back to DBtwn2hk = ( |Ph|-1) +( |Pk|-1) + ( |Ph|-1)*( |Pk|-1) +1 with just DNIv stopping rule on G7.1 (very cheap). n r f a G7.1 g g n l j f x a 3 w t y u r b 5 7 h 6 4 d e 8 2 k m i c 1 v 9 o q s p The hub-and-spoke wheels at the end of this divisive clustering are v k d u y e 4 5 l 9 6 x j 3 h o 2 1 7 t b s 8 q w i m p c Let’s try DBtwn2hk = ( |Ph|-1) +( |Pk|-1) + ( |Ph|-1)*( |Pk|-1) +1 with DNIve stopping rule (Do Not Isolate vertices or edges) on G7.1 This is still very cheap. The cost of imposing the DNIe rule is a little higher than DNIv. We would have to update the actual pTrees (two with each edge deletion) but the update is simply turning off a single bit in each of the endpoint vertex pTrees . (If edge, h—k, is deleted, turn off h in Pk and k in Ph.) n r f a G7.1 g v k d u y e g n l j f x y u r 5 b h 6 7 4 d e 8 2 k i m c 1 v 9 a 3 t w o q s p The hub-and-spoke wheels at the end of this divisive clustering are 4 9 5 l 6 x j 3 h 1 o 2 7 t b s 8 q w i m p c

G7.1 G7.1 n n r f r f a a g g v k d v k d u y u y e e 4 4 5 l 9 5 l 9
y x 27 y w 21 w 1 20 x 3 20 y o 20 y 9 20 y e 20 e 1 19 y u 19 y s 19 y v 19 y k 18 y t 18 b 1 17 k 1 17 y a 17 y l 17 y f 17 y n 17 y j 17 y g 17 y r 17 d 1 16 i 1 16 m 1 16 x w 16 c 1 15 x 9 15 x o 15 x v 14 x u 14 e 3 13 e 2 12 s 3 12 x l 12 x j 12 x g 12 x f 12 x n 12 t 3 11 v 2 11 a 3 10 k 2 10 e 4 9 i 2 9 m 2 9 s o 7 u o 7 v 9 7 w q 7 w t 7 w p 7 d 4 6 q o 6 b 6 5 s p 5 b 5 4 h 7 4 h 6 4 q p 4 u r 4 DBtwn1hk = ( |Ph|-1) +( |Pk|-1) +1 with DNIv Using the same between-ness, DBtwn1hk = ( |Ph|-1) +( |Pk|-1) +1 with DNIve (DNIve is “Do Not Isolate vertices or edges” rule): n n r f r f a a G7.1 G7.1 g g v k d v k d u y u y e e 4 4 5 l 9 5 l 9 6 x x 6 j j h 3 h 3 1 2 1 7 o 2 7 o t t b b s s 8 8 q w i m q w i m p c p c g n l j f x a 3 w t y u r b 5 7 h 6 4 d e 8 2 k m i c 1 v 9 o q s p g n l j f x a 3 t y u r 5 b h 6 7 4 d e 8 2 k i m c 1 v 9 w o q s p Imposing a Do Not Isolate vertices (DNIv) rule, the final level of the divisive dendogram is composed of these hub-spoke wheels (Note: with no stopping rule, the final level of any divisive dendogram is always singleton vertex sets.). Backing up the dendogram there is a level at which the yellow and blue partitions are intact (and the 3 spoke green wheels coalesce). There may be ways to agglomerate these wheels, other than backing up the dendogram, which will produce better partitions. The cost of DBtwn1_DNIv is very low (even for very big graphs once they are expressed vertically using Edge pTrees. The counts, |Pk|, k a vertex pre-exist and can be simply decremented with each edge deletion (If edge hk is deleted, decrement the numbers |Ph| and |Pk| by 1. The pTrees themselves do not have to be updated for DNIv). The main cost is determining new dendogram partitions (a new dendogram partition results when a new connectivity component splits off). This is always a high cost step for divisive graph partitioning. We could do it only once at the end, but then we don’t have the dendogram to back up in. We have a very competitive “connectivity partition check” algorithm using pTrees, but to produce the entire dendogram it has to a be applied with each deletion. We are looking for speed improvements. Imposing a “Do Not Isolate vertices or edges” rule results in a final partition much closer to the color partition. The cost of imposing the additional DNIe rule is a little higher. We would have to update the actual pTrees (two with each edge deletion) but the update is simply turning off 1 bit in each of the endpoint pTrees (If edge hk is deleted, turn off h in Pk and k in Ph). Other stopping rules can be added, e.g., “Do Not Break 3-cliques” as done on the previous slide. If 3-cliques are not to be broken then no cliques will be broken because a k-clique is made up entirely of 3-cliques. Adding this stopping rule will keep communities together. However, we should know all 3-cliques before we start. We can also impose the stopping rule “Do Not Break 4cliques, but the partition was not superior to Btwn1 with DNIve. 3cliques can be identified by performing all edge-endpoint pairwise ANDs, Ph&Pk. Edge hk is in | Ph&Pk | 3cliques. The cost is just the & cost since there is a new Pop-count primitive that counts the 1’s at no additional cost during every AND operation. However, if the graph is extremely large (e.g., facebook friends with ~ a Billion vertices and a buzzillion edges) that cost may still be prohibitive.

Next we build a ShortestPathtree, SPG1 for G1

Similar presentations

Presentation on theme: "Next we build a ShortestPathtree, SPG1 for G1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Next we build a ShortestPathtree, SPG1 for G1

Similar presentations

Presentation on theme: "Next we build a ShortestPathtree, SPG1 for G1"— Presentation transcript:

Similar presentations

About project

Feedback