Download presentation
Presentation is loading. Please wait.
Published byΞάνθος Οικονόμου Modified over 5 years ago
1
Divisive Graph Clustering: Girvan and Neuman delete edges with max “betweenness”, i.e., max participation in shortest paths (of all lengths). We look for situations where pTrees give us an advantage. Can SPPC (Shortest Path Participation Count) be constructed with pTrees more efficiently? We try finding the edge with maximum “Fore-Aft” Shortest Path Participation Difference. pTrees should provide great advantage in those calculations. Delete edge with minimum “Fore-Aft” SP co-participation. pTree calculation of FA(h,k) is instantaneous, but is result clustering a good one? Here we use MFA on S1P only, and call it MFA1. G1_1 1 2 3 4 S P S 1 P 2 4 3 5 G1_2 G1_3 1 2 3 4 5 S P 1 S P & 3 1 S P & 4 S 1 P 2 & 4 S 1 P 3 & 4 1 S P & 2 S 1 P 2 & 3 S 1 P 2 & 5 1 S P 3 & S 1 P 4 & 5 G1 1 2 3 4 S P 1 S P & 3 S 1 P 2 & 4 S 1 P 3 & 4 1 S P & 2 S 1 P 2 & 3 S 1 P 2 & 4 S 1 P 2 & 5 MFA1 picks 24. Correct. MFA1 says all edges are equal (seems correct). MFA1 says all edges are equal (seems correct). MFA1 says all edges are equal (correct?). G1_4 1 2 3 4 5 6 S P 1 S P 2 3 4 5 6 G1_6 G1_5 1 2 3 4 5 6 S P MFA1 picks correctly MFA1 picks 23 correctly. MFA1 says all edges are equal. A F 1 2 A F 1 6 A F 1 2 3 A F 1 2 5 A F 1 3 4 A F 1 3 6 A F 1 4 5 A F 1 2 A F 1 6 A F 1 2 3 A F 1 2 6 A F 1 3 4 A F 1 3 5 A F 1 4 5 A F 1 2 A F 1 6 A F 1 2 3 A F 1 2 6 A F 1 3 4 A F 1 3 5 A F 1 4 5 A F 1 4 6 1 S P 2 3 4 5 6 G1_7 MFA1 picks correctly. A F 1 2 A F 1 5 A F 1 6 A F 1 2 3 A F 1 2 6 A F 1 3 4 A F 1 3 5 A F 1 4 5 A F 1 4 6
2
Divisive Graph Clustering
MFA1 on G7: Delete edge with the minimum “Fore-Aft” S1P co-participation. G7 Divisive Graph Clustering 1 2 3 4 5 6 7 8 9 S1P S1P pairwise ANDs MFA1 says delete the zero counts above. and 23 with 33 and 34 get deleted because they only have 33 and 34 as nbrs and 33 and 34 are not nbrs (i.e., they are friends with two enemies. They should not be deleted! Solution? (1,12) because 12 is only connected to 1. (1,32) correct. (2,31) correct. (20,34) correct. (3,10) correct. (24,26) and (25,28) are incorrect. But, recall that 24 and 28 are ambiguous wrt cluster? (3,28) correct. (3,29) correct. We can solve “delete if only connected to 1 pt” problem by checking the nbr count. (10,34) because, now, 10 is only connected to 34. The first round goes a long way toward splitting white-blue from green-yellow. (14,34) correct.
3
McC: Delete edge(s) with Min # Common 1st Cousins (CFCh,kS2Ph & S2Pk).
Del McC0: (1,5) (1,6) (1,11) (1,12) incorrect. McS0-McC0: Unless it results in an isolated singleton or doubleton (keep 1,12) Delete all common Siblings=0 (McS0) and all common Cousins=0 (McC0). h k a b d c e f g i j Del 1,32 2,31 3,10 3,28 3,29 14,34 20,34 15,33 16,33 19,33 21,33 23,33 24,26 25,28 This is McS0-McC0 So do the 1time SiblingANDs (S1Ph&S1Pk) and CousinANDs (S2Ph&S2Pk). Then in one pass reading counts McS0-McC0 deletes 12 edges (whereas Girvan-Neuman makes 1 pass per edge deletion and recalculates each new pass). Next we could delete more edges with our current counts or recaculate counts and redo McS0-McC0 S2Ph= blue and orange Use DelThresh=1 on Siblings (recalculating nothing): Delete additionally: S2Pk= red and green 1,9 1,13 1,18 1,20 1,22 3,33 6,11 6,17 9,34 24,28 24,33 25,26 27,30 29, ,33 31, ,34 (but not 2,18 2,20 2,22 4,13 5,7 5,11 7,17 25,32 26,32 27,34 28,34 29,34; DONOT ISOLATE rule). This is McS1-McC0. Use DelThresh=1 on Cousins: del 1,4 (but not 7, , , , ,34 27,34 due to the DONOT ISOLATE rule.) . This is McS1-McC1 Likely, next round (after recalculating CS and CC), 1,7 and 3,9 will delete. Note: { } has already separated as a component. Then the other clusters would be: { } TheGreens TheYellows S2P pairwise ANDs counts S2P-AND-OP-1 S2P-AND-OP-2 S1P pairwise ANDs
4
McS2-McC0 G7 Unless singleton/doubleton is isolated, del CommonCousins0 and CommonSiblings2 Del CC0 (1,5) (1,6) (1,11) (1,12) saved by the DNI rule. Del CS2 1:5,6,7,9,11,12,13,18,20,22,32 2:18,20,22,31 3:9,10,28,29, : :7, :11, : : : : :33, :33, :33, : :33, :33, :26,28, :26,28, : :30, : : : : :34 We get Yellow Green(-20) {20, 24, 28, 29 ,10,15,16,19,21,23,27,30,34)} {9, 31, 33,25,26,32} So again Black and Blue are a confused, but Yellow and Green are almost perfect. At this point we have looked at serveral threshold combinations for siblings and cousins. I think McS0-McC0 followed by a recalculation and then a reapplication of McS0-McC0 might be best. S2P pairwise ANDs counts S2P-AND-OP-1 S2P-AND-OP-2 S1P pairwise ANDs
5
Divisive Graph Clustering
Delete edge with min “Fore-Aft” SP co-participation. Calculating FA(h,k) is fast with pTrees, but is the resulting clustering a good one? Here we use MFA on S1P and S2P, called MFA2 Divisive Graph Clustering G1_1 1 2 3 4 S P 2 S P 1 3 4 5 G1_2 G1_3 1 2 3 4 5 S P MFA1 sats all edges are equal (correct?). 1 S P & 2 S 1 P 2 & 3 S 1 P 2 & 5 1 S P 3 & S 1 P 4 & 5 Define FA2(1,2) = S1P(1) & S1P(2) | S1P(1) & S2P(2) | S2P(1) & S1P(2) | S2P(1) & S2P(2), S2P(h)=ORkS2P(hk) 1 S P & 3 S 1 P 2 & 4 S 1 P 3 & 4 1 S P & 2 S 1 P 2 & 3 S 1 P 2 & 4 S 1 P 2 & 5 1 S P & 3 1 S P & 4 S 1 P 2 & 4 S 1 P 3 & 4 G1 1 2 3 4 S P MFA1 says all edges are equal (seems correct). MFA1 says all edges are equal (seems correct). A F 2 1 A F 2 3 1 F A 2 5 1 3 F A 2 3 4 1 F A 2 4 5 1 MFA2 says are the best to delete (more sensitive!) MFA1 picks 24. Correct. A F 1 2 A F 1 6 A F 1 2 3 A F 1 2 5 A F 1 3 4 A F 1 3 6 A F 1 4 5 G1_4 1 2 3 4 5 6 S P A F 1 2 A F 1 6 A F 1 2 3 A F 1 2 6 A F 1 3 4 A F 1 3 5 A F 1 4 5 G1_5 1 2 3 4 5 6 S P MFA1 picks 23, correctly A F 2 1 3 A F 2 1 6 F A 2 3 1 4 F A 2 5 1 3 F A 2 3 4 1 F A 2 3 6 1 F A 2 4 5 1 2 S P 1 3 6 4 5 A F 2 1 A F 2 1 6 A F 2 3 4 F A 2 6 1 3 F A 2 3 4 1 F A 2 3 5 1 F A 2 4 5 1 Define FA2_1(hk) = S1P(1) & S1P(2) | S2P(hk) & S2P(kh), MFA1 says all edges are equal. MFA2: are best, are 2nd best, 23 worst. I like it 4cycle with 2 1hairs is best. 4cycle with 1 2hair 2nd best 6cycle worst
6
1 E SkP, k=2,3,4 for vertices 1,2,3,33, Count only Shortest Path Participations emanating from vertices with S1P-counts 50% of the maxS1Pcount=16 (i.e., 8). This specifies starting vertices of only SPPC 5 7 8 10 11 12 13 17 18 20 22 29 31 33 34 G7
7
1 E 34 r r 1 34 1 Count only Shortest Path Participations emanating from vertices with S1P-counts 75% of the maxS1Pcount=16 (i.e., 12). This specifies starting vertices of 1 34 only 8 11 12 13 17 18 22 33 34 G7
8
1 2 3 4 5 6 8 7 Delete (1,2) And {3,6,8} and do over.
1 E ct 1 E ct 1 2 3 4 5 6 8 7 G5 Delete (1,2) And {3,6,8} and do over. 1 SP2 ct SP2 ct 1 SP3 ct 1 SP ct SP gives connectivity comp partition: CC(1)={1,5,7} is a 0plex since EdgeCt=3=COMBO(3,2)-0. CC(2)={2,4} is a 0plex since EdgeCt=1=COMBO(2,2)-0. SP4 ct 1 2 SP ct SP gives connectivity comp partition: CC(1)={1,2,4,5,7} is a 5plex since EdgeCt=5=COMBO(5,2)-5. CC(3)={3,6,8} is a 0plex since EdgeCt=3=COMBO(3,2)-0 6 3 4 1 SPPC (Shortest Path Participation Counts) ct
9
CC(9)={9 a b c} is a 3plex since EdgeCt=3=COMBO(4,2)-3
1 2 3 4 5 6 7 8 9 a b c d e f g SP6 1 2 3 4 5 6 7 8 9 a b c d e f g SP 1 2 3 4 5 6 7 8 9 a b c d e f g 1 2 4 3 5 7 6 8 b a 9 c d e g f 4 1 SP2 1 2 4 3 6 5 8 7 9 a c b d f e g SP gives connectivity comp partition: CC(1)={ } is a 20plex since EdgeCt=8=COMBO(8,2)-20. CC(9)={9 a b c} is a plex since EdgeCt=3=COMBO(4,2)-3 CC(d)={d f g} is a plex since EdgeCt=3=COMBO(3,2) CC( e)={e} SPPC 1 g f 2 7 3 4 5 6 8 9 a b c d e E 1 5 6 7 2 3 4 8 SP2 all pure0 SP 1 5 6 7 2 3 4 8 SP3 1 2 3 4 5 6 7 8 9 a b c d e f g SP gives connect comps: CC(1)={1}, CC(5)={5 6 7} Is a 0plex since EdgeCt34=COMBO(3,2)-0 Done! Delete (1,3) (SPPC=16 max) and delete {d f g}, {e} and do over. Also delete {9 a b c} as a 4VetexHubSpoke3plex. SP4 1 2 3 4 5 6 7 8 9 a b c d e f g E 1 2 3 4 5 6 7 8 SP2 1 2 3 4 5 6 7 8 SP3 all pure0 SP 1 2 3 4 5 6 7 8 SP gives connect comps: CC(1)={ } 2plex EdgeCt=4=COMBO(4,2)-2. CC(2)={ } is a 3plex since Ect=3=COMB(4,2)-3 (a 4VertexHubSpoke) SP5 1 2 3 4 5 6 7 8 9 a b c d e f g G6 1 2 4 3 6 7 5 8 9 a b c d e f g SPPC (Shortest Path Participation Counts) 1 3 2 4 5 6 7 8 Delete{ } 4VHubSpoke3plex, (1,6)
10
1 E E SP SP SP SP wt V#> 2 SP -1 SP -1 SP -1 SP -1 SP WeightSum Nbrs Nbrs If ( WtSum>=-20 & Nbr(1) ) then 1 else 0. wt V#> 2 SP -1 SP -1 SP -1 SP -1 SP WeightSum Nbrs Nbrs select their communities with a threshold on the weighted sum (=-20) giving the light green “1community” and black “34community (overlapping). Next, excise those and iterate. When all are in a community probably do a k means reshuffle to improve? This is an Agglomerative Method based on weighted sum of SPk counts to identify 1 and 34 as centers. Then among their individual nbrs, 1 2 4 3 5 6 7 8 9 Using weights of 0,1,2,4,6 for SP1,2,3,4,5 resp. wt V#> 0 SP 1 SP 2 SP 4 SP 6 SP WeightSum SP1|2(17) Iterate again on the remaining Using weights of5,5,1,1,0 for SP1,2,3,4,5 resp. wt V#> 5 SP 5 SP 1 SP 1 SP 0 SP WeightSum SP1|2(8) SP1|2(33) This method uses site betweeness, not edge betweenenss (SPPC not computed) but gives a good overlapping clustering (close to the author’s). One could attempt a few kMeans rounds to try to improve it. 10,25,26,28,29, 31 33,34 not shown (only 17 on, 8 only 27 turned on 1 SP4 =4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg G7 1 2 3 4 6 5 7 9 8
11
APPENDIX To extend to PT:
Edge, E, Path(PT), ShortestPathv(SPT), AcyclicPath(APT) Trees and CycleList(CL) of G1 G1 1 2 3 4 E2key 1,1,3 1,1,2 1,1,1 1,2,1 1,1,4 1,2,3 1,2,2 1,3,1 1,2,4 1,3,4 1,3,3 1,3,2 1,4,2 1,4,1 1,4,4 1,4,3 2,1,1 2,1,2 2,1,4 2,1,3 2,2,1 2,2,3 2,2,2 2,3,1 2,2,4 2,3,3 2,3,2 2,3,4 2,4,2 2,4,1 2,4,4 2,4,3 3,1,2 3,1,1 3,1,4 3,1,3 3,2,1 3,2,3 3,2,2 3,3,1 3,2,4 3,3,3 3,3,2 3,4,2 3,4,1 3,3,4 3,4,4 3,4,3 4,1,2 4,1,1 4,2,1 4,1,4 4,1,3 4,2,3 4,2,2 4,3,1 4,2,4 4,3,2 4,3,3 4,4,2 4,4,1 4,3,4 4,4,4 4,4,3 PE2 1 PE3 1 , E3key 1,1,1 1,1,4 1,1,3 1,1,2 1,2,1 1,2,4 1,2,3 1,2,2 1,3,1 1,3,4 1,3,3 1,3,2 1,4,1 1,4,4 1,4,3 1,4,2 2,1,1 2,1,4 2,1,3 2,1,2 2,2,1 2,2,4 2,2,3 2,2,2 2,3,1 2,3,4 2,3,3 2,3,2 2,4,2 2,4,1 3,1,1 2,4,4 2,4,3 3,1,2 3,2,1 3,1,4 3,1,3 3,2,2 3,3,1 3,2,4 3,2,3 3,3,2 3,4,1 3,3,4 3,3,3 3,4,2 4,1,1 3,4,4 3,4,3 4,1,2 4,2,1 4,1,4 4,1,3 4,2,2 4,3,1 4,2,4 4,2,3 4,3,2 4,4,1 4,3,4 4,3,3 4,4,3 4,4,2 4,4,4 2 3 4 (pred is NotPureZero) First, construct stride=|V|, 2-level Edge pTree, all others are constructed concurrently from it. E1 key 1,1 1,2 1,4 1,3 2,1 2,3 2,2 2,4 3,1 3,3 3,2 3,4 4,1 4,2 4,3 4,4 PE1 1 E one-level 1 2 3 4 2LEG1 E 2-lev stri=|V|=4 PTG1, extension of EG1 1 2 3 4 PTG1 APTG1 1 2 3 4 All are 3 hop cycles. Each has 3 start pts , 2 directions. Each repeat 6 times. 6/6=1 3hop cycles (1341) SPTG1 1111 1 2 3 4 CLG1 1 2 2 1 3 1 2 1341 1431 3413 3143 4134 4314 SPTG1, init E1=SP1,1 E2=SP2,1 E3=SP3,1 E4=SP4,1 1 2 3 4 SPSFk 1 3 1 4 2 4 1 3 1 3 4 1 4 1 4 3 1 1 3 1 4 2 4 1 3 1 3 4 1 4 1 4 3 1 1 2 2 1 3 2 1 1 3 4 2 4 1 4 2 3 1 3 1 4 1 3 4 3 4 1 2 4 1 4 2 3 1 3 1 4 1 4 3 3 1 4 1 3 4 SPT is completed. For Big Graphs, could stop here (e.g., Friends has ~1B vertices but a diameter of 4, so we would only need to build PT 4-hop paths) and possible expressed as a tree of lists rather than a tree of bitmaps. For sparse BigGraphs, E could be leveled further and/or a tree of lists (then APT, SPT will be also). SPT(G)k (with k turned on) is mask (>0 is “yes”) for connectivity comp, COMP(G)kvk. For bitmap of COMPk bitslicing SPT (SPTk,h..SPTk,0 k=1..|V| then COMPk ORj=h..0SPTk,h. SPT structure may be useful as separate “categorical” bitmaps Shortest Path Length (SPk,h h=1..H. Also keep a mask of Shortest Paths so far, SPSFk vertex, k. With each new SP bitmap, SPB, SPSFkSPSFk | SPB, SPk,h+1 SPB & SPSFk. kListPT3hij PT4hijk=Ek after zeroing i and j bits of Ek To extend to PT: kListEh PT2hk=Ek after zeroing the h bit of Ek kListPT2hj PT3hjk=Ek after zeroing Ek j bit. E PT SPT APT of graph as predicate Trees on E(MaxPathLength). PTG1 E3 pred=(NPZ)|(PZ&AcyclicPathEnd) 1 2 3 4 1,2 1,1 key 1,3 1,4 2,1 2,2 2,4 2,3 3,1 3,3 3,2 3,4 4,1 4,2 4,3 4,4 EG1 E 1lev, pred=NPZ E 2lev str=4 pred=NPZ APTG1 E3predicate = (NPZ&NotCycleEnd)| (PZ&AcyclicPathEnd) SP1,1 SP2,1 SP3,1 SP4,1 SP1,2 SP2,2 SPVertex=3, Len=2 12 SP1,1|2 SP2,1|2 SP3,1|2 SP4,1|2 SPTgives the Connectivity Component Partition; Maximal Cliques (go across SPk,1 then look within subsets of those k’s for commonality); Note, Cliques are 0-plexes. Each mask, SPk,1 masks a 1-plex. Each SPk,1&SPk,2 masks a 2-plex (which is SPSFk,2? So if we save each SPSF instead of overwriting, we have k-plex masks w/o further work?), etc. Next construct predicates for each Path related data structures, PT APT SPT SPSF, to make them into pTrees on a k-path table, E, E2, E3, …
12
SG Clique Mining 1,2 1,1 key 1,3 1,5 1,4 1,7 1,6 2,2 2,1 2,3 2,5 2,4 2,7 2,6 3,2 3,1 3,4 3,3 3,5 3,7 3,6 4,2 4,1 4,4 4,3 4,7 4,6 4,5 5,2 5,1 5,4 5,3 5,6 5,5 5,7 6,2 6,1 6,4 6,3 6,6 6,5 7,1 6,7 7,2 7,4 7,3 7,6 7,5 7,7 PE 1 2 4 3 7 6 G3 5 K=2: 2Cliques (2 vertices): Find endpts of each edges (Int((n-1)/7)+1, Mod(n-1,7) +1) 1 2 4 3 6 G2 7 5 key 1,1 1,3 1,2 1,5 1,4 1,6 2,1 1,7 2,3 2,2 2,5 2,4 2,6 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 4,1 4,3 4,2 4,5 4,4 4,7 4,6 5,2 5,1 5,3 5,5 5,4 5,7 5,6 6,2 6,1 6,3 6,4 6,6 6,5 6,7 7,2 7,1 7,3 7,4 7,6 7,5 7,7 E 1 EU 1 1 2 4 3 6 5 8 7 10 9 20 30 40 C 1 CU 1 6 k=3: k=4: 1234 ( are cliques) 123,134 ,134 , 234 ,2341234. 1234 only 4-clique Using the EdgeCount thm: on C={1,2,3,4}, CU=C&EU C is a clique since ct(CU)=comb(4, 2)=4!/2!2!=6 have 124CS3 PE(1,4)=1 134CS3 Have 123CS3 PE(2,3)=1 234CS3 Have k=2: E= already have 567 PE(2,3)=1 So 123CS3 PE(2,4)=1 124CS3 PE(2,6)=0 PE(6,7)=1 567CS3 PE(1,7)=0 PE(1,5)=0 PE(2,4)=1 1234CS4 Have 1234 k=3: EC, requires counting 1’s in mask pTree of each Subgraph (or candidate Clique, if take the time to generate the CCSs – but then clearly the fastest way to finish up is simply to lookup the single bit position in E, i.e., use EC). EdgeCount Algorithm (EC): |PUC| = (k+1)!/(k-1)!2! then CCCS The SG alg only needs Edge Mask pTree, E, and a fast way to find those pairs of subgraphs in CSk that share k-1 vertices (then check E to see if the two different kth vertices are an edge in G. Again this is a standard part of the Apriori ARM algorithm and has therefore been optimized and engineered ad infinitum!) PE(2,3)=1 234CS3 PE(1,4)=1 134CS3 Have PE(4,8)=1 248CS3 key 1,1 1,3 1,2 1,5 1,4 1,7 1,6 2,2 2,1 1,8 2,4 2,3 2,5 2,6 2,8 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 3,8 4,2 4,1 4,4 4,3 4,6 4,5 4,8 4,7 5,3 5,2 5,1 5,5 5,4 5,7 5,6 6,1 5,8 6,3 6,2 6,4 6,6 6,5 6,8 6,7 7,3 7,2 7,1 7,5 7,4 7,6 7,7 8.1 7,8 8,2 8,4 8,3 8,6 8,5 8.8 8,7 E 1 PE(4,8)=1 348CS3 PE(4,8)=1 12348CS5 have have k=2: k=4: PE(2,3)=1 123CS3 PE(2,4)=1 124CS3 PE(2,8)=1 128CS3 PE(2,6)=0 PE(3,8)=1 138CS3 PE(4,8)=1 148CS3 PE(1,5)=0 PE(1,7)=0 PE(6,8)=0 PE(3,8)=1 238CS3 have PE(6,7)=1 567CS3 have k=5: = CS5. 1 2 4 3 6 G4 7 5 8 PE(3,8)=1 1238CS4 PE(4,8)=1 1248CS4 PE(3,8)=1 1348CS4 k=3: Have PE(2,4)=1 1234CS4 PE(4,8)=1 2348CS4
13
The EdgepTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of a graph, G5 1 2 3 4 5 6 8 7 PTG5 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 EG5 2-level str=8 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 1 5 7 1 7 5 5 1 2 7 1 2 3 8 6 1 3 6 8 1 1 2 4 5 1 2 5 1 7 5 7 1 3 6 8 1 8 6 3 1 7 1 2 7 1 5 1 5 7 8 6 3 1 8 3 6 1 4 2 5 1 4 2 7 1 7 5 2 1 APTG5 CLG5 1571 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 1751 3683 3863 5175 1 2 1 5 1 7 2 1 3 6 1 3 8 1 4 2 1 5 1 5 7 1 6 3 1 6 8 1 7 1 7 5 1 8 3 1 8 6 1 5715 6386 6836 7157 7517 2 1 5 2 1 7 4 2 1 5 1 2 2 1 7 7 5 1 8368 8638 PT Clique Miner Algorithm A clique is all cycles Extend to a k-plex (k-core) mining algorithm? PT(=APT+CL), SPT are powerful datamining tools with closure properties (to eliminate branches) . SPTG5 1 2 1 2 1 2 1 3 1 4 2 1 3 4 2 1 4 1 5 1 2 3 5 1 5 1 2 6 1 7 1 2 7 1 7 1 2 3 8 1 Max clique Mining A kCycle is a kClique iff it’s found in CLk as PERM(k-1,k-1)/2=(k-1)!/2 kCycles (e.g., vertices are repeated in CL for 3cycles, 2!/2=1; 4cycles, 3!/2=3; 5cycles, 4!/2=12; 6cycles, 5!/2=60. 4 1 2 5 4 1 2 7 7 1 5 2 Downward closure: Once, a 4cycle is established as a 4clique (by the fact that {1,2,3,4} occurs 3!/2=3 times in CL), all 3vertex subsets are 3cliques {1,2,3},{1,2,4},{1,3,4}, so no need to check further. k-plex (missing k edges) mining alg? k-core (has k edges) mining alg? Density (internal edge density >> external|avg) mining alg? Degree (internal vertex degree >> external|avg) mining alg? DiameterG5 is max{Diameterk} = max{ 2,2,1,3,2,1,3,1}=3. Connected comp containing V1, COMP1={1,2,4,5,7}. Pick 1st vertex not in COMP1,3, COMP3 ={3,6,8}. Done. The partition is { {1,2,4,5,7}, {3,6,8} }. To pick the first vertex not in COMP1, mask off COMP1 with SPTv1’ and then pick the first vertex in this complement.
14
cycles in blue (not in APT)
SP1 SP1&2 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 d 1 e f 1 g 1 1 3 2 4 6 5 8 7 a 9 c b d f e g E=A1Ps 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 d 1 e f 1 g 1 1 2 4 3 5 6 8 7 9 b a c f e d g 1 2 3 4 5 6 7 8 9 a b c d e f g 4 1 cycles in blue (not in APT) A2Ps 1 2 4 3 6 5 8 7 a 9 c b d f e g SP2 SP1&2&3 1 1 2 1 2 3 3 1 4 4 1 5 1 5 6 6 1 7 7 1 8 8 1 9 1 9 a 1 a b 1 b c d e f g 1 2 4 3 6 5 8 7 9 a c b d f e g 1 2 3 4 5 6 7 8 9 a b c d e f g 1 3 1 6 2 4 1 3 1 3 4 1 4 3 1 5 6 1 5 7 1 6 1 6 5 1 6 7 1 7 5 1 7 6 1 8 4 1 9 c 1 A c 1 b c 1 D f 1 D g 1 F d 1 F g 1 G d 1 G f 1 SP3 SP1&2&3&4 4 3 1 1 6 5 1 6 7 4 2 3 1 3 1 6 4 3 1 1 6 5 5 7 6 1 5 6 7 1 7 5 6 1 6 1 3 5 6 7 1 7 6 5 1 5 7 6 1 7 6 5 1 1 6 7 6 7 5 1 8 3 4 1 F D g 1 D f G 1 D F g 1 G F d 1 D G f 1 G d F 1 1 1 2 1 2 3 3 1 4 1 4 5 5 1 6 1 6 7 7 1 8 1 8 9 a b c d e f g 1 2 4 3 6 5 8 7 9 a c b d f e g 1 2 3 4 5 6 7 8 9 a b c d e f g A3Ps 1 3 2 4 5 7 6 8 b a 9 c d e g f SP4 SP1&2&3&4&5 COMPLETE A4Ps 1 2 4 3 5 7 6 8 9 b a c d e g f A5Ps 1 2 4 3 5 6 8 7 9 b a c d f e g A6Ps 1 3 2 4 5 7 6 8 a 9 c b e d g f 1 2 1 2 3 4 1 4 5 1 5 6 1 6 7 7 1 8 8 1 9 a b c d e f g 1 3 2 4 5 7 6 8 a 9 c b e d g f 1 2 3 4 5 6 7 8 9 a b c d e f g 2 3 4 6 1 5 2 4 1 3 3 1 5 6 3 1 7 6 4 3 6 1 5 6 3 1 5 7 1 6 6 1 4 3 7 5 1 6 7 6 3 1 8 3 4 1 2 4 1 3 6 4 6 1 3 5 4 3 6 1 7 5 6 3 1 4 5 7 1 6 3 7 6 5 3 1 7 3 1 6 4 8 4 1 3 6 4 2 1 3 7 6 7 5 1 6 4 3 5 7 1 6 4 3 4 8 1 3 5 6 8 3 4 6 1 7 SP5 SP6 1 2 2 1 3 4 5 5 1 6 7 7 1 8 8 1 9 a b c d e f g 1 3 2 4 7 6 5 8 a 9 c b e d g f 1 2 3 4 5 6 7 8 9 a b c d e f g 1 3 2 4 7 6 5 8 a 9 c b e d g f G6 1 2 4 3 6 7 5 8 9 a b c d e f g
15
All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5)
SP1 =1deg 1 2 3 4 5 6 7 8 9 SP2 =2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 SP4 =4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 SP3 =3dg G7
16
G8 Trying Hamming Similarity to detect communities on G7 and G8 40 41
Zachary's karate club, a standard benchmark in community detection. (best partition found by optimizing modularity of Newman and Girvan) =1deg =2deg =3deg =4deg =5deg Hamming similarity: S(S1,S2)=DegkDif(S1,S2) To produce an [all?] actual shortest path[s] between x and y: Thm: To produce a [all?]: S2P[s], take a [all?] middle vertex[es], x1, from SP1x & SP1y, produce: xx1y; S3P[s], take a [all?] vertex[es], x1, from SP1x and a [all?] vertex[es], x2, from S2P(x1,y): xx1x2y etc. Is it productive to actually produce (one time) a tree of [all?] shortest paths? I think it is not! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 14 20 17 15 16 24 30 27 18 39 28 42 Can see that this Works Poorly At 1. 17 25 2 24 18 1 14 3 7 Not working! On the other hand, our standard community mining techniques (for kplexes) worked well on G7. Next slide let’s try Hamming on G8. G7 Deg b a g b 2 b f 9 f d Deg 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 40 41 42 46 44 53 48 54 52 45 43 39 38 20 21 24 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 G8
17
G9 G9, Agglomerative clustering of ESP2 using Hamming Similarity
In ESP2, using Hamming similarity, we get three Event clusters, clustering events iff pTrees [Hamming] identical: EventCluster1={1,2,3,4,5} EventCluster2={6,7,8,9} EventCluster3={10,11,12,13,14} 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W ESP E WSP W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E The Degree % of affiliation of Women with R,G,B events is: R G B 1 100% 75% 0% % 75% 0% % 100% 0% % 75% 0% 5 60% 25% 0% % 50% 0% % 75% 0% % 75% 0% % 75% 0% % 75% 20% 11 0% 50% 40% 12 0% 50% 80% 13 0% 75% 80% 14 0% 75% 100% 15 0% 50% 60% 16 0% 50% 0% 17 0% 25% 20% 18 0% 25% 20% W 1 e e e e ESP E 2 3 4 5 6 7 8 9 10 11 12 13 14 E WSP W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W G9 ESP3=ESP1’ and ESP4=ESP2’ so again, in this case, all info is already available in ESP1 and ESP2 (all shortest paths are of length 1 or 2). We don’t need ESPk k>2) WSP W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E WSP3=WSP1’ and WSP4=WSP2’ so, in this case, all information is already available in WSP1 and WSP2 (All shortest paths are of length 1 or 2) (We don’t need WSPk k>2) Clustering Women using Degree% RGB affiliation: WomenClusterR={1,2,4,5} WomanClusterG={3,6,7,8,9,10,11,16,17,18} WomanClsuterB={12,13,14,15} This clustering seems fairly close to the authors. Other methods are possible and if another method puts event6 with 12345, then everything changes and the result seem even closer to the author’s intent..
18
G9 K-plex search on G9 (A k-plex is a SG missing k edges
If H is a k-plex and F is a ISG, then F is a kplex A graph (V,E) is a k-plex iff |V|(|V|-1)/2 – |E| k 1 d d d d ESP2 2 3 4 5 6 7 8 9 10 11 12 13 14 E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W WSP2 h f h f b f f g h h g g h h h g c c Events abcde 14*13/2=91 degs=88888dddd88888 |Edge|=66 kplex k25 Events abcde Not calculating k degs= 7777cccc Until it gets lower Events abcde 14*13/2=91 degs= 666bbbb88888 |Edges|=66 kpl Events456789abcde 14*13/2=91 degs= 55aaaa88888 |Edges|=66 kplex k25 Women abcdefghi 18*17/2=153 degs=hfhfbffghhgghhhgcc |Edges| =139 kplex k14 Events56789abcde 14*13/2=91 degs= |Edges|=66 kplex k25 Women abcdefgh 18*17/2=153 degs=gfgfbfffggffgggfc |Edges| =139 kplex k14 Events6789abcde *8/2= A 9Clique! degs= |Edges|=36 kplex k0 Women abcdefg 18*17/2=153 degs=ffffbffeffeefffe |Edges| =139 kplex k14 So take out {6789abcde} and start over. Women abcdefg 15*14/2=105 degs=eeeeeeeeeeeeeee |Edges| = kplex k0 15Clique Events *4/2=10 |Edges|=10 kplex k 0 A 5clique! degs: 44444 So take out { abcdefg} and start over. If we had used the full algorithm which pursues each minimum degree tie path, one of them would start by eliminating 14 instead of 1. That will result in the 9Clique and the 5Clique abcde. All the other 8 ties would result in one of these two situations. How can we know that ahead of time and avoid all those unproductive minimum degree tie paths? Women5hi 3*2/2=3 degs=011 |Edges| =1 kplex k2 Womenhi 2*1/2=1 degs=11 |Edges| =1 kplex k0 Clique We get no information from applying our kplex search algorithm to WSP2. Again, how could we know this ahead of time to avoid all the work? Possibly by noticing the very high 1-density of the pTrees? (only 28 zeros)? Every ISG of a Clique is a Clique so 6789 and 789 are Cliques (which seems to be the authors intent?) If the goal is to find all maximal Cliques, how do we know that CA= is maximal? If it weren’t then there would be at least one of abcde which when added to CA= would results in a 10Clique. Checking a: PCA&Pa would have to have count=9 (It doesn’t! It has count=5) and PCA(a) would have to be 1 (It isn’t. It’s 0). The same is true for bcde. The same type of analysis shows 6789abcde is maximal. I think one can prove that any Clique obtained by our algorithm would be maximal (without the above expensive check), since we start with the whole vertex set and throw out one at a time until we get a clique, so it has to be maximal? The Women associated strongly with the blue EventClique, abgde are { } and associated but loosely are { }. The Women associated strongly with the green EventClique, are { } and associated but loosely are {6 7 9}
19
G10 E=SP1 2level pTrees LevelOneStride=19 (labelled 0-i), Level0Stride=10 (labelled 0-9)
Note: SP1 should be called S1PDV for “Shortest 1 Path Destination Verticies, because each one, e.g. S1PDV(v1) maps all such destination verticies from that given starting vertex, v1 OutDeg 1 8 1 9 2 1 2 1 2 1 2 3 1 2 4 1 2 5 1 2 6 1 2 7 1 2 8 1 2 9 1 3 1 3 1 3 2 1 3 1 3 4 1 3 5 1 3 6 1 3 7 1 3 8 1 3 9 1 4 1 4 2 1 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 35 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 G10: Web graph of pages of a website and hyperlinks. Communities by color (Girvan Newman Algorithm). |V|=180 (1-i0) and |E|=266. Vertices with OutDeg=0 (leaves) do not have pTrees shown because pTrees display only OutEdges and thus those OD=1 have a pure0 pTree. 45 78 46 47 48 49 50 51 c5 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 c1 c2 c3 c4 c6 c7 c8 c9 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 i0 1 5 3 4 8 5 4 8 5 7 6 9 5 8 7 5 9 6 8 6 7 1 6 3 6 6 5 e 7 6 1 7 1 4 9 tens dig 1 3 2 4 5 6 8 7 9 a b d c e f g i h 4 5 1 4 6 1 4 7 1 4 8 1 4 9 1 5 1 5 1 5 2 1 5 1 5 6 1 7 2 1 7 3 1 7 4 1 1 G10 units 1 2 4 3 5 7 6 8 9 1 1 1 1 1 1 1 1 1 1 1 1 1 units 1 2 4 3 1 1 1 1 1 units 2 1 3 5 4 6 8 7 9 1 1 1 1 units 1 2 4 3 1 1 1 1 1 1 units 1 2 4 3 1 units 2 1 3 5 4 6 8 7 9 1 1 1
20
G10 leaves (OutDegree=0):
G10 E=SP1 2level pTrees LevelOneStride=19 (labelled 0-i), Level0Stride=10 (labelled 0-9) 7 OD 9 OD L1 1 2 4 3 5 7 6 8 9 a b d c e g f h i C 4 1 L1 2 1 3 5 4 6 8 7 9 b a c e d f g i h 4 H OutDeg OD 1 8 1 9 2 1 2 1 2 1 2 3 1 2 4 1 2 5 1 2 6 1 2 7 1 2 8 1 2 9 1 3 1 3 1 3 2 1 3 1 3 4 1 3 5 1 3 6 1 3 7 1 3 8 1 3 9 1 4 1 4 2 1 4 3 1 5 7 6 7 7 6 h 5 B 4 C B 5 C 4 B 6 7 1 6 F 7 G 7 F 6 G 1 G H 2 G 9 F 3 G L1 2 1 3 5 4 6 8 7 9 b a . 1 L0 2 4 3 5 7 6 8 9 4 G 8 F 5 G 7 F 6 G F 7 H 4 7 G 6 F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C B 2 4 C 1 7 4 2 C B 9 4 C 3 B 8 OD L1 1 2 4 3 5 7 6 8 9 a b d c e g f h i 7 8 1 7 9 1 8 1 8 1 9 1 9 1 4 6 7 8 H B C A 7 1 8 6 8 7 9 8 9 5 9 A 6 H G 1 4 1 H G 4 2 H 4 3 H I 4 6 8 1 H L0 1 3 2 4 6 5 7 9 8 1 L0 2 1 3 5 4 6 8 7 9 OD L0 2 1 3 4 6 5 7 9 8 1 5 1 6 4 5 3 8 4 5 4 8 5 9 4 5 7 9 6 5 8 7 5 9 8 6 6 7 1 6 3 6 6 5 7 e 6 1 7 1 9 4 1 1 9 2 1 9 3 1 9 5 7 9 6 7 8 5 H 7 4 6 H 4 8 H 9 1 4 9 H I 4 3 7 8 I 4 H 9 OD L1 1 3 2 4 5 7 6 8 9 a b d c e g f h i 4 5 1 4 6 1 4 7 1 4 8 1 4 9 1 5 1 5 2 1 5 6 1 7 2 1 7 3 1 7 4 1 C 5 1 L0 2 1 3 5 4 6 8 7 9 1 L0 2 1 3 4 6 5 7 9 8 1 9 7 A 9 8 1 9 8 A 8 L0 1 3 2 4 6 5 7 9 8 1 1 1 1 1 1 A 1 9 A 2 B A 4 7 L0 1 3 2 4 6 5 7 9 8 1 1 1 1 1 1 8 7 9 1 4 9 1 5 2 A 5 3 7 20 OD L0 1 2 4 3 5 7 6 8 9 1 1 1 1 1 1 1 1 1 1 1 1 L1 2 1 3 5 4 6 8 7 9 b a c e d f g i h D 2 1 D 2 4 6 7 3 C 8 9 6 C 9 1 D 2 7 C 9 1 D 2 8 C 7 9 D 2 C 9 7 8 D H 4 2 D 1 4 5 7 8 2 D 3 2 D 4 2 L0 1 3 2 4 6 5 7 9 8 1 1 1 L0 1 2 3 4 1 1 1 1 1 D 5 2 6 D F 5 2 D 7 9 2 8 D F 4 2 D 9 1 2 E 9 1 D 2 1 E 7 9 D 2 E 2 D L0 1 2 4 3 5 7 6 8 9 1 1 1 1 L0 1 2 4 3 5 7 6 8 9 1 OD 3 E F D 2 4 E 9 D 2 5 E 8 D 2 6 E 7 D 2 L0 1 2 4 3 1 1 1 1 1 1 1 B 1 2 2 B 7 6 h 1 B 3 2 L0 1 3 2 4 6 5 7 9 8 1 L0 1 2 4 3 1 L0 2 1 3 5 4 6 8 7 9 1 1 1 G10 leaves (OutDegree=0): a3 a6 a8 a9 b0 B7 b8 b9 e7 e8 e9 f0 f1 f2 f3 f4 f5 f8 f9 g0 g8 g9 h7
21
18 1 G10 E=SP1 Lists 75 77 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 SP2 Lists 84 85 C A0 A1 A2 A4 B1 B4 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 H8 H9 E A0 A1 A2 A3 A4 B1 B2 C6 C7 C8 C9 D1 D3 D5 D8 D9 E1 E2 E3 E4 E5 E6 H8 H9 H9 H4 19 2 76 77 36 2 20 3 77 76 H5 74 78 B2 D1 H7 I0 H0 H1 H2 H3 H5 H7 H8 21 4 22 5 D3 D2 23 6 D4 D2 24 7 D5 D2 25 8 C A0 A1 A2 A4 B1 B4 C6 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 H8 H9 E C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E2 E3 E4 E5 E6 D6 E5 D2 39 12 26 9 27 10 86 80 D7 D9 D2 40 10 28 11 87 79 B2 D1 H7 A0 A1 A2 A4 B1 B4 C6 C7 H8 H9 29 12 D8 E4 D2 E C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E3 E4 E5 E6 30 13 89 85 D9 91 D2 31 14 90 A6 E0 91 D2 50 76 D2 H1 81 88 32 15 A0 A1 A2 A4 B1 B4 C6 C7 H8 H9 91 A6 A7 A8 A9 B0 B2 C4 D2 H4 I0 C C6 C7 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 33 16 E1 79 D2 E C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E4 E5 E6 34 17 E2 D2 35 18 B2 D1 H6 H7 36 19 E3 F0 D2 92 91 E4 E9 D2 C E C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E5 E6 D2 46 49 93 91 A0 A1 A2 A4 B1 B4 C6 C7 H8 H9 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 H0 H1 H2 H3 H4 H5 H6 H7 H8 E5 E8 D2 74 E6 E7 D2 95 79 51 D2 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E6 F6 G7 A0 A1 A2 A4 B1 B4 C6 C7 H8 H9 96 78 39 29 F7 G6 H4 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 40 27 97 A7 G1 H1 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 42 45 98 91 G2 F9 43 78 98 99 A0 A1 A2 A4 B1 B4 C6 C7 H8 H9 99 88 70 75 B3 B2 D1 H7 G3 G0 A0 A8 D 79 91 E7 E8 E9 F0 G4 F8 G1 H4 A1 A9 G5 F7 G5 G6 A2 B0 D C6 C7 C8 C9 D1 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 G6 F7 H4 A4 A7 G7 F6 G6 46 81 H0 H1 H2 H3 H4 H5 H6 H7 H8 A5 A3 A7 H0 G1 H4 B2 H6 H6 A A4 A5 A4 A5 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 B1 B2 H1 G1 H4 H0 46 81 H1 H2 H3 H4 H5 H6 H7 H8 51 46 63 1 M A0 A1 A2 A4 B1 B4 C6 C7 H8 H9 B2 76 H1 H2 H0 H4 B3 B2 H3 I0 C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 D C6 C7 C8 C9 D1 D3 D8 D9 E0 E1 E2 E3 E4 E5 E6 H1 46 81 H0 H2 H3 H4 H5 H6 H7 H8 53 48 B4 C4 H4 46 81 H0 H1 H2 H3 H5 H6 H7 H8 54 48 B5 C4 A A5 H2 46 81 G1 H0 H1 H3 H4 H5 H6 H7 H8 A A4 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 E8 55 49 B6 B7 H5 77 H4 A A3 74 C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 B3 C0 B2 C4 H6 H4 B1 76 H1 B2 77 G1 H4 H3 44 H4 H9 57 69 C1 77 C4 H8 91 H4 B3 76 H1 H4 77 91 G1 I0 58 70 C2 B9 C4 B4 B5 B6 C0 C1 C2 C3 D C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 H9 I 59 68 C3 B8 I0 44 H4 H9 74 C6 C7 C8 C9 D1 D3 D5 D8 D9 E0 E1 E2 E3 E4 E5 E6 B5 B4 B6 C0 C1 C2 C3 H5 H0 H1 H2 H3 H6 H7 H8 60 67 C4 B4 B5 B6 C0 C1 C2 C3 61 66 C0 76 B4 B5 B6 C1 C2 C3 H1 D C6 C7 C8 C9 D1 D3 D5 D9 E0 E1 E2 E3 E4 E5 E6 E9 H6 46 81 H0 H1 H2 H3 H5 H7 H8 63 66 c5 45 D5 OD=0: a3 a6 a8 a9 B0 B7 b8 b9 e7 e8 e9 f0 f1 f2 f3 f4 f5 f8 f9 g0 g8 g9 h7 75 76 H5 C1 77 B4 B5 B6 C0 C1 C2 C3 H5 65 E7 C6 91 D2 76 76 H5 H8 A0 A1 A2 A4 B1 B4 C6 C7 H0 H1 H2 H3 H5 H6 H7 H9 66 61 C7 91 D2 77 H4 C2 B4 B5 B6 C0 C1 C3 D A0 A1 A2 A3 A4 B1 B2 C6 C7 C8 C9 D1 D3 D5 D8 E0 E1 E2 E3 E4 E5 E6 H8 H9 71 49 C8 79 D2 C4 77 B2 B7 B8 B9 C4 72 47 D2 C9 78 73 48 D2 D0 H4 D2 C B2 D1 D2 H7 74 49 D2 D1 78 D2
22
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * 37 * * * * 39 * * * * * * * * * * ***** ** * * * * * * * * ** ** * * * * ** * * * * * * * * * * * * * * * * * * * * * 65 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * 82 *** * * * * * * * * *** * * * * * ** ** * * * 93 92 * * 94 * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * ** *** * * * * * * 125 130 * * * *** *** ** ************** * * 136 * * 134 * 137 * * * * * * * * * * * * * * * * 153 152 151 * 155 154 * * ** * ** * 164 163 162 * * 165 * 169 168 * * * * * * * * * * *** ** * * * * * * * * * * G10 Edge Matrix Raster ordering EM gives the E table cardinality(E) = 180*180 = 32,400.
23
GN: Remove edge with largest betweeness. Recalc betweenesses; Repeat.
Use both a fore and aft pTree. P S C 5 4 G1_3 1 2 3 4 5 1,1 Ekey 1,2 1,3 1,4 1,5 2,1 2,2 2,3 2,4 2,5 3,1 3,2 3,3 3,4 3,5 4,1 4,2 4,3 4,4 4,5 5,1 5,2 5,3 5,4 5,5 E 1 1 2 1 3 1 4 1 1 2 2 1 3 1 2 5 1 3 4 1 4 5 1 2 3 1 2 5 1 2 5 To construct SPPC(hk) =SPPC(kh) (Shortest Path Partic Count) if (hk)E ct 1 + CtS2P(hk) + CtS2P(kh) + CtS3P(hkg) + CtS3P(ghk), g + CtS4P(hkfm) + CtS4P(fhkm) + CtS4P(fmhk) f,m. Etc.
24
G7 MCFC: Delete the edge(s) with the Minimum # of Common First Cousins, where CFC(h,k)S2P(h) & S2P(k) S2P(h) = blue and orange S2P h k a b d c e f g i j S2P(k) = red and green S2P All Paths
25
Divisive Graph Clustering: Girvan and Neuman delete edges with max “betweenness”, i.e., max participation in shortest paths (of all lengths) Girvan and Newman (Girvan and Newman,02; 04). Edges deleted based on a measures of edge betweeness:. 1. Computation of the edge betweeness for all edges; 2. Removal of edge with largest betweeness: in case of ties with other edges, one is picked at random; 3. Recalculation of betweeness on the running graph; 4. Iteration of the cycle from step 2. We look for situations where pTrees give us an advantage. Can SPPC (Shortest Path Participation Count) be constructed with pTrees more efficiently? What other measure can pTrees make much more efficiently that can help choose the best edge to delete? Later we will try finding the edge with maximum “Fore-Aft” Shortest Path Participation Difference in S1P, S2P, S3P,… (or some combination). pTrees should provide great advantage in the calculation of FAD(h,k). The other important question to answer is: Does it create a good clsutering? key 1,1 1,2 1,3 1,4 1,5 1,6 1,7 2,1 2,2 2,3 2,4 2,5 2,6 2,7 3,1 3,2 3,3 3,4 3,5 3,6 3,7 4,1 4,2 4,3 4,4 4,5 4,6 4,7 5,1 5,2 5,3 5,4 5,5 5,6 5,7 6,1 6,2 6,3 6,4 6,5 6,6 6,7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 E 1 SP2 1 SP3 1 SPPC 4 c 1 5 While constructing Shortest Path pTrees, SP2…, record Shortest Path Participation Count of each edge (SPPC) The edge(s) with max SPPC should be the best candidates for removal? ct 1 E ct SP2 ct 2 3 4 5 6 7 SP3 ct SP gives the connectivity component partition: CC(1)={1,2,3,4} 0plex since EdgeCt=12= 2*COMBO(4,2) CC(5)={5,6,7} 1plex since EdgeCt=4=2*(COMBO(3,2)-1) SP ct SPPC ct We will try FAD(h,k) |S1P(h)&S1P(k)| / |S1P(h)|*|S1P(k)| Or use S2P? Or both? Or S3P? E 1 1 1 1 1 1 1 1 2 3 4 5 6 7 1 SP2 ct 2 3 4 5 6 7 1 2 4 3 6 G2 7 5 SP3 1 ct 2 3 4 5 6 7 1 SP=SP1 | SP2 | SP3 ct 2 3 4 5 6 7 SP gives connectivity comp partition: CC(1) = {1}List(SP(1) = {1,2,3,4,5,6,7} is a 12plex since EdgeCt=9=COMBO(7,2)-12 4 c SPPC 1 2 3 5 6 7 ct Delete (1,6) and do over.
26
GN Delete max SPPC edge. Recalc SPPCs. Repeat.
Divisive Graph Clustering 1,1 Ekey 1,2 1,3 1,4 1,5 2,1 2,2 2,3 2,4 2,5 3,1 3,2 3,3 3,4 3,5 4,1 4,2 4,3 4,4 4,5 5,1 5,2 5,3 5,4 5,5 E 1 SPPC 4 G1_2 1 2 3 4 5 G1_2 1 2 3 4 5 Ekey 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 E 1 G1_1 2 3 4 S 1 P 2 3 4 1,1 Ekey 1,2 1,3 1,4 1,5 2,1 2,2 2,3 2,4 2,5 3,1 3,2 3,3 3,4 3,5 4,1 4,2 4,3 4,4 4,5 5,1 5,2 5,3 5,4 5,5 E 1 SPPC 5 4 G1_3 1 2 3 4 5 GN Delete max SPPC edge. Recalc SPPCs. Repeat. G1 1 2 3 4 1 S P 2 3 4 null nul S 1 P 2 4 3 5 S 1 P 2 3 4 5 SPPC 3 2 4 1 null nul Ekey 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 E 1 SPPC 1 2 3 nul 2 S P 1 3 2 S P 4 1 S 2 P 3 4 1 S 2 P 4 3 1 2 S P 1 3 4 5 2 S P 1 3 5 4 Check SPPC(34)=SPPC(43) (verify SPs backwards from hk get counted.) (34)E so ct=1 + CountS2P(34)=1 + CountS2P(43)=1 so ct=3 + CtS3P(34g)=0 + CtS3P(g34)=1, g=1 ct=4 GN says delete (3,4)! GN says delete any edge! 2 S P 1 2Pkey 1,1,1 1,1,2 1,1,3 1,1,4 1,2,1 1,2,2 1,2,3 1,2,4 1,3,1 1,3,2 1,3,3 1,3,4 1,4,1 1,4,2 1,4,3 1,4,4 2,1,1 2,1,2 2,1,3 2,1,4 2,2,1 2,2,2 2,2,3 2,2,4 2,3,1 2,3,2 2,3,3 2,3,4 2,4,1 2,4,2 2,4,3 2,4,4 3,1,1 3,1,2 3,1,3 3,1,4 3,2,1 3,2,2 3,2,3 3,2,4 3,3,1 3,3,2 3,3,3 3,3,4 3,4,1 3,4,2 3,4,3 3,4,4 4,1,1 4,1,2 4,1,3 4,1,4 4,2,1 4,2,2 4,2,3 4,2,4 4,3,1 4,3,2 4,3,3 4,3,4 4,4,1 4,4,2 4,4,3 4,4,4 2 P 1 3 S P 1 4 S 3 P 2 4 1 S 3 P 1 2 5 4 GN says delete 12 | 25 | 34 | 36 G1_4 1 2 3 4 5 6 To construct SPPC(hk) =SPPC(kh) (Shortest Path Participation Count) if (hk)E count 1 + OneCountS2P(hk) + OneCountS2P(kh) + OneCountS3P(hkg) + OneCountS3P(ghk), g + OneCountS4P(hkfm) + OneCountS4P(fhkm) + OneCountS4P(fmhk) f,m. Etc. GN: delete 12 | 23 | 25 not 34, 45 1 S P 2 3 4 5 6 Ekey 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6 E 1 G1_4 2 3 4 5 6 G1_3 1 2 3 4 5 G1_4 1 2 3 4 5 6 not 23, 16, 45 SPPC 7 5 6 4 G1_3 1 2 3 4 5 2 S P 1 3 5 4 6 G1_3 1 2 3 4 5 SPPC recalculation and repeat steps? Anyone see a shortcut? Or do we just start the calculation over on the reduced graph? Do the pointers help? Since in S2P(hk) one has to search out S2P(kh) and in S3P(hk) one has to find all S3P(hkg) snf D3P(ghk) g In the appendix I begin work on uniquely representing shortest k paths using both a fore and aft pTree. Consider that in G1_4 S3P(16)=2. G1_3 1 2 3 4 5 Notes: If any OneCount=0, no subsequence exist. It might be useful to use ptrs to make this proc easier. GN edge betweenness specifies pruning (2,4) S 3 P 1 2 5 4 6 G1_3 1 2 3 4 5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.