Shortest Path Trees Construction (We don’t need the Path Trees to get the Shortest Path Trees! That’s because a subpath of a shortest path is a shortest path.) S1P=E SPSF11 SPSF1’1 SPSF12 SPSF1’2 SPSF13 SPSF1’3 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 1 1 2 1 2 1 3 1 3 1 S2P1=SPSF1’1&(ORjS1P1Ej ) S2P2=SPSF1’2&(ORjS1P2Ej ) S2P3=SPSF1’3&(ORjS1P3Ej ) SPSF21 SPSF2’1 1 2 1 3 1 4 1 2 1 1 3 1 4 1 S2P SPSF23 SPSF2’3 3 1 1 2 1 c 1 1 2 1 3 1 4 1 5 6 7 8 9 a b c 1 1 from here on. Identical to 1 3 1 3 1 S3P1=SPSF2’1&(ORjS2P1Ej ) S3P3=SPSF2’3&(ORjS2P3Ej ) S3P SPSF31 SPSF3’1 1 7 1 c 1 1 4 2 3 5 6 7 c 9 b a 8 G6 SPSF33 SPSF3’3 3 1 4 1 9 1 a 1 b 1 1 2 1 3 1 1 1 1 3 1 3 1 S4P1=SPSF3’1&(ORjS3P1Ej ) What is the cost of creating the SPs? vV, there are ~Avg{Diam(v)vV} steps, each costs 1 complement of SPSF (cost =compl), OR of ~Avg|Ek| pTrees (cost=OrAvg|Ek| 1 SPSF & above_OR_result (cost=AND), 1 OR to update SPSF (cost=OR) Cost= |V|*AvgDiam*(compl+OR*AD+AND+OR), so O(|V|). I.e., linear in # of vertices, assuming AD=AvgDeg is small. This is a one-time, parallelizable construction over the vertices. For Friends, it is B*4*(3*pTOP+AD*pTOP)=4B*(3+AD)pTOP=B*pTOP*(12+4AD), where pTOP is the cost of a pTree Operation (comp, &, OR) and B=billion). Parallelized over an n node cluster, this 1-time Shortest Path Tree construction cost would be B*pTOP*(12+4AvgDeg) / n. The SnP’s capture only the shortest path lengths between all pairs of vertices. We could (have) capture actual shortest paths (all shortest paths?, all paths in PTs?), since we construct (but do not retain) that info along the way. How to structure it/index it?/residualize it? S4P3=SPSF3’3&(ORjS3P3Ej ) S4P SPSF41 SPSF4’1 1 5 1 6 1 9 1 a 1 b 1 SPSF43 SPSF4’3 3 1 7 1 8 1 1 2 1 3 1 1 Done with Vertex 1 Shortest Paths. Diam(1)=4 Done with Vertex 3 Shortest Paths. Vertices 4-c SPs done the same way SPSF1i = S1Pi OR Mi , Mi has 1 only at i SPSF(k+1)i = SPSFki OR S(k+1)Pi S(k+1)Pi=SPSFk’i&(ORjSkPj Ej ) “The mask pTree of the shortest k+1 path starting at vertex i is the Shortest Paths So Far Complement ANDed with the OR of ith edge pTrees over all ithe Shortest k Path List”
K-plex Search on G6: A k-plex is a Subgraph missing k edges K-plex Search on G6: A k-plex is a Subgraph missing k edges. All subgraphs will be induced subgraphs defined by their vertex set. Subgraph S has |ES|=s edges, |VS|=v vertices. S is a kplex iff C(v,2) – s = v(v-1)/2-s k If S is a kplex, S’ adds 1 vertex, x to S, (V(S’)=V(S)!{x}) then S’ a kplex iff (v+1)v/2 – (deg(x,S’)+s) k. 1 4 2 3 5 6 7 c 9 b a 8 G6 Edges are 1-plexes. |E{123}| = |PE123| = 3 so 123 is a 0plex(clique) and a 1plex |E{124}| = |PE124| = 3 so 124 is a 0plex (clique) If H is an ISG, |VH|=h, |EH|=H, H=h(h-1)/2 then H is a kplex iff H – H k.. If H is a kplex and F is an ISG of H, then F is a kplex (if F is missing an edge than H is missing that edge also, since K inherits all H edges involving its vertices. F cannot be missing more edges than H.) If G isn’t a kplex, F1 an ISG of G with a vertex of least degree removed. If F1 isn’t a kplex, F2 ISG with a vertex of least degree removed, etc. until we find Fj to be a kplex. Remove Fj Repeat until all vertexes removed. We did a k-plex search of G6 by simple calculating edge counts (which are simply 1-counts of ANDed pTrees) using only SP1=E. 1 3 2 4 5 6 7 8 9 a c b SP1=E G=12*11/2=66. G=19 G is a kplex for k 47. H1=ISG{12346789abc} (deg5=2). H1=11*10/2=55, H1=17. H1 is a kplex for k 37. H2=ISG{1234789abc} (deg6=2). H2=10*9/2=45, H2=15. H2 is a kplex for k 30. H3=ISG{123489abc} (deg7=1). H3=9*8/2=36, H3=14. H3 is a kplex for k 22. H4=ISG{12389abc} (deg4=2). H4=8*7/2=28, H4=12. H4 is a kplex for k 16. 1 2 3 4 5 6 7 8 9 a c b SP2 H5=ISG{1239abc} (deg8=2). H5=7*6/2=21, H5=10. H5 is a kplex for k 11. H6=ISG{239abc} (deg1=2). H6=6*5/2=15, H6=8. H6 is a kplex for k 7. H7=ISG{39abc} (deg2=1). H7=5*4/2=10, H7=7. H7 is a kplex for k 3. H8=ISG{9abc} (deg3=1). H8=4*3/2=6, H8=6. H8 is a kplex for k 0. So take out {9abc} and start over. G={12345678} G=8*7/2=28. G=10 G is a kplex for k 18. deg=33322331 H1=ISG{1234567} (deg8=1). H1=7*6/2=21, H1=9. H1 is a kplex for k 12. deg=2223223 1 2 3 4 5 6 7 8 9 a c b SP3 H2=ISG{234567} (deg1=2). H2=6*5/2=15, H2=6. H2 is a kplex for k 9. deg=112223 H3=ISG{34567} (deg2=1). H3=5*4/2=10, H3=4. H3 is a kplex for k 6. deg=01222 H4=ISG{4567} (deg3=0). H4=4*3/2=6, H4=4. H4 is a kplex for k 2. deg=1222 H5=ISG{567} (deg4=1). H5=3*2/2=3, H5=3. H5 is a kplex for k 0. deg=222 So take out {567} and start over. G={12348} G=5*4/2=10. G=5 G is a kplex for k 5. deg=33220 1 2 3 4 5 6 7 8 9 a c b SP4 H1=ISG{1234} (deg8=0). H1=4*3/2=6, H1=5. H1 is a kplex for k 1. deg=3322 H2=ISG{124} (deg3=2). H2=3*2/2=3, H2=3. H2 is a kplex for k 0. deg=222 This is exactly what we want ! 1234 is a 1plex (missing only 1 edge) and 124 was determined to be a clique (0plex – missing no edges). It’d have been great if 123 had revealed itself as a clique also, and if 89abc had been detected as a 1plex before 9abc was detected as a clique. How might we make progress in these directions? Try returning to remove all degree ties before moving on? We will try that on the next slide?
G7 Very Simple Weighted SP1 k-plex Search on G7 Weighting: 0,1path nbrs of x times 1; 2path nbrs of x times 0; 1 2 1 3 1 2 1 3 2 1 4 5 1 5 2 1 6 2 1 7 2 1 8 2 1 9 2 2 1 3 2 1 2 1 2 3 1 4 2 1 5 5 2 1 3 6 2 1 3 2 7 1 8 2 1 4 9 2 1 3 3 1 4 3 1 4 2 3 1 6 3 1 4 3 1 6 SP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 6 2 1 9 3 1 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5 1 2 3 H=1234567890123456789012345678901234 H=561 H=77 kplx k484 D g9a63444523125222223222533243446bg kcore k77 Cut 123: 1 2 3 H=1234567890123456789012345678901234 H=120 H=38 kplx k82 D 9685322452322522222322243323334367 kcore k38 Cut 23: 1 2 3 H=1234567890123456789012345678901234 H=55 H=26 kplx k24 D 6675322452322522222322223323334344 kcore k26 Cut 24: 1 2 3 H=1234567890123456789012345678901234 H=15 H=12 kplx k3 D 5454322422322422222322223323334344 kcore k12 Cut 2: 1 2 3 H=1234567890123456789012345678901234 H=10 H=10 kplx k0 D 4444322422322422222322223323334344 kcore k10 {1,2,3,4, 14} is a clique. {1,2,3,4,9,14} is a 3plex. 2 3 4 4 4 5 5 6 6 6 6 7 8 9 10 11 12 13 15 5 5 1 4 8 2 6 1 3 6 8 0 5 7 9 1 3 5 8 0 2 4 9 2 5 7 1 4 8 2 8 9 5 Cut0: 1 2 3 H=5678901235678901235678901 H=21 H=4 kplx k17 D 2330102000020000002111011 kcore k4 Cut 1 leaves 25 only. 1 2 3 H=56789012356789012345678901234 D 232031200222021202533232435af Cut012:1 2 3 H=56789012356789012345678901234 H=55 H=19 kplx k36 D 20203120022202120253323233456 kcore k19 1 2 3 H=89023568901235678901 H=19 H=4 kplex k15 D 01000000000002010011 kcore k4 Cut03: 1 2 3 H=56789012356789012345678901234 H=6 H=4 kplx k2 D 20203120022202120223323233222 kcore k6 {24,32,33,34} is a 2plex G7 Cut0: 2 3 H=89023568901235678901 H=19 H=4 kplex k15 D 01000000000002010001 kcore k4 Cut 0 leaves {9,31} as a 0plex 1 2 3 H=5678901235678901235678901 D 2330102000020000002111011 1 2 3 H=89023568901235678901 H=17 H=2 kplex k15 D 01000000000002010011 kcore k2 Cut 0 leaves {27,30} as a 0plex Cut01: 1 2 3 H=5678901235678901235678901 H=15 H=6 kplx k9 D 2330102000020000000111011 kcore k6 Cut0: 1 2 3 H=5678901235678901235678901 H=10 H=6 kplx k4 D 2330102000020000000111011 kcore k6 {5,6,7,11,17} is a 4plex 1 2 3 H=89023568901235678901 H=14 H=0 kplex k14 D 0100000000000201001kcore k0 no edges left 1 2 3 H=89023568901235678901 D 01000000000002111011 The expected communities are mostly not detected as kplexes or kcores. Cut0: 1 2 3 H=5678901235678901235678901 H=21 H=4 kplx k17 D 2330102000020000002111011 kcore k4 1 2 3 4 5 6 01234567890123456789012345678901234567890123456789012345678901234 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@#$ (Symbols for base 65 )
SP2 1 3 2 4 6 5 7 8 10 9 12 11 14 13 16 15 19 18 17 20 22 21 24 23 27 26 25 29 28 30 31 32 34 33 35 37 36 39 38 40 41 43 42 45 44 47 46 48 50 49 51 53 52 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SP1 and SP2 for G8 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 4 4 4 4 4 b a 5 6 4 5 g 9 7 4 6 b 2 b 8 6 4 f 9 f 4 9 3 8 6 d 4 5 4 5 4 2 3 6 7 5 7 6 7 3 5 3 5 3 4 9 6 5 SP1 3 2 5 4 7 6 8 10 9 11 13 12 15 14 16 18 17 20 19 21 22 24 23 26 25 28 27 30 29 32 31 34 33 36 35 37 38 40 39 42 41 43 44 46 45 47 49 48 50 52 51 53
Agglomerative Clustering with similarity=DegDif DegDif(v)=0-deg(v) since 0=intdeg(v) 1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 DegDif(18)= -3 (max). Agglomerate with siblings,{17,18,19,54} DegDif=6-16 = -10 DegDif(28)= -3 (max). Agglomerate with siblings{25 27 28 29} DegDif=6-14 = -8 DegDif(37)= -3 (max). Agglomerate with siblings{17 18 19 37 54} DegDif=10-31 = -21 DegDif(45)= -3 (max). Agglomerate with siblings{8 17 18 19 37 45 54} (Note that we have linked up with the “Light” cluster from the Astronomy cluster. It occurs to me that using an Agglomerative method for an example that is known to have overlapping clusters is a bad idea (agglomerative methods always produce a partition with no overlapping clusters. Therefore, let’s start over applying the agglomerative method to a different example that is not expected to have overlapping clusters. 1 2 3 4 5 6 01234567890123456789012345678901234567890123456789012345678901234 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@#$ SP1 2 1 3 5 4 7 6 9 8 10 11 13 12 15 14 17 16 18 20 19 21 23 22 26 25 24 28 27 29 30 31 33 32 34 36 35 37 38 40 39 42 41 43 44 46 45 49 48 47 50 52 51 54 53 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 10 1 1 1 1 1 1 1 1 - 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 G8 - 21 4 4 4 4 4 b b 5 7 5 6 h 9 7 4 7 c 3 c 8 6 4 f a g 4 9 3 8 6 e 4 5 4 6 5 3 4 6 8 5 7 7 8 3 5 3 5 3 4 9 6 5 j
Agglomerative Clustering with similarity=DegDif DegDif(v)=0-deg(v) since 0=intdeg(v) DegDif(12)= -1 (max). Agglomerate with siblings{1,12} DegDif=1 - 15 = -14 DegDif(10)= -2 (max). Agglomerate with siblings{3 10 34} DegDif=2 - 25 = -23. DegDif(13)= -2 (max). Agglomerate with siblings{1 4 12 13} DegDif=4 - 16 = -12. DegDif(15,16)= -2 (max). Agglomerate with siblings{3 10 15 16 33 34} DegDif=7 - 28 = -25. DegDif(17)= -2 (max). Agglomerate with siblings{6 7 17} DegDif=3 - 1 = 2. DegDif(6 7 17)= 2 (max). Agglomerate with siblings{5 6 7 11 17} DegDif=6 - 4 = 2. DegDif(18)= -2 (max). Agglomerate with siblings{1 2 4 12 13 18} DegDif=8 - 17 = -9. DegDif(19,21)= -2 (max). Agglomerate with siblings{3 10 15 16 19 21 33 34} DegDif=11 - 24 = -13. DegDif(22)= -2 (max). Agglomerate with siblings{1 2 4 12 13 18 22} DegDif=10 - 15 = -5. DegDif(23,27,30)= -2 (max). Agglomerate with siblings{3 10 15 16 19 21 23 27 30 33 34} DegDif=6 - 19 = -13. 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg DegDif(25)=-3 Aggl w sibs{25 26 28 32} DegDif=4-7 = -3. -12 -14 -9 -5 -23 2 -25 -13 -3 Even though there is no cluster overlap here, our method does not follow the usual agglomeration methodology, in which there is a similarity measure between pairs (starting out with all subclusters being points, so the initial similarity is between points and then involves similarity between a point and a subset and also between two subsets. There needs to be a consistent definition of similarity across all these types of pairs, which we do not have here. Therefore, let’s start over trying to define a correct similarity. Given a similarity there are two standard clustering approaches, k means and agglomerative. Agglomerative requires the above complete similarity (between pairs of subsets, one or both of which can be singletons), while k means simply requires a similarity between pairs of points. One similarity we might consider is some weighted sum of common cousins. E.g., let c0 be the # of common 0th cousins (siblings), c1=# of 1st cousins, etc. If we sum the common cousin counts with weights, w0, w1,… (presumably decreasing), then we have a similarity measure which is complete. We try this similarity on the next slide, first for agglomeration, then k means. G7 is Zachary's karate club, a standard benchmark in community detection. The colors correspond to the best partition found by optimizing the modularity of Newman and Girvan.
Agglomerative Clustering with similarity=DegDif DegDif(v)=0-deg(v) since 0=intdeg(v) DegDif(12)= -1 (max). Agglom with siblings{1,12} DegDif=1 - 15 = -14 DegDif(10)= -2 (max). Agglom with siblings{3 10 34} DegDif=2 - 24 = -22 DegDif(12)= -1 (max). Agglom w siblings{1,4,12,13} DegDif=4 - 13 = -9 DegDif(15)= -2 (max) Agglom w sibs{3 10 15 33 34} DegDif=4 - 22 = -18 DegDif(16)= -1 (max) Aggl w sibs{3 10 15 16 33 34} DegDif=6 - 20 = -14 DegDif(19)= -1 mx Aggl w sibs{3 10 15 16 19 33 34} DegDif=8 - 18 = -10 DegDif(21)= -1 mx Ag w sbs{3 10 15 16 19 21 33 34} DegDif=10-16= -6 DegDif(23)= -1 Ag w sbs{3 10 15 16 19 21 23 33 34} DegDif=12 - 14 = -2 DegDif(27,30)= -2 {3 10 15 16 19 21 23 27 30 33 34} DegDif=16 - 10 = 6 DegDif(17)= -2 {6 7 17} DegDif=3 - 2 = 1 DegDif(22)= -2 (max). Agg w siblings{1,4,12,13 22} DegDif=5 - 12 = -7 DegDif(18)= -2 (max). Agg w sibls{1,4,12,13 18 22} DegDif=6 - 11 = -5 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg DegDif(29)= -2 (max). Agg w sibls{29 32} DegDif= 1 - 5 = -4 DegDif(31)= -3 {3 9 10 15 16 19 21 23 27 30 31 33 34} DgDf=22 - 6 = 16 DegDif(25 26 28)= -2. Agg w sibls{25 26 28 29 32} DegDif= 5 - 4 = 1 Even though there is no cluster overlap here, our method does not follow the usual agglomeration methodology, in which there is a similarity measure between pairs (starting out with all subclusters being points, so the initial similarity is between points and then involves similarity between a point and a subset and also between two subsets. There needs to be a consistent definition of similarity across all these types of pairs, which we do not have here. Therefore, let’s start over trying to define a correct similarity. Given a similarity there are two standard clustering approaches, k means and agglomerative. Agglomerative requires the above complete similarity (between pairs of subsets, one or both of which can be singletons), while k means simply requires a similarity between pairs of points. One similarity we might consider is some weighted sum of common cousins. E.g., let c0 be the # of common 0th cousins (siblings), c1=# of 1st cousins, etc. If we sum the common cousin counts with weights, w0, w1,… (presumably decreasing), then we have a similarity measure which is complete. We try this similarity on the next slide, first for agglomeration, then k means. -14 -9 -10 -6 -3 -4 -4 -4 -5 -2 -3 -2 -5 -2 -2 -2 -2 -2 -3 -2 -2 -2 -5 -3 -3 -2 -4 -3 -4 -4 -6 -11 -16 =dgdf -14 -9 -10 -6 -3 -4 -4 -4 -4 -22 -3 -2 -5 -2 -2 -2 -2 -2 -3 -2 -2 -2 -5 -3 -3 -2 -3 -2 -4 -4 -6 -11 -16 =dgdf -9 -8 -10 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -5 -6 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -4 -3 -3 -4 -10 -16 =dgdf -4 -5 16 - -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -4 -3 -3 -4 -10 -16 =dgdf -4 -5 17 - -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -2 -3 -3 -2 -3 1 -3 -3 -4 -10 -16 =dgdf -8 -8 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -9 -8 -14-6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -9 -8 -18-6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -5 -6 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -9 -8 -10-6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -9 -8 -2 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -9 -8 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -4 -5 17 - -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -2 -3 -3 -2 -3 1 -3 -3 -4 -10 -16 =dgdf
Similarity Clustering sim(x,y)=W+k=0. n wk(x,y) Similarity Clustering sim(x,y)=W+k=0..n wk(x,y)*ck(x,y), c0=#common siblings (other than themselves), c1=# common 1st cousins, C2=# common 2nd cousins.. W=5 iff siblings, w0=2, w1=1, else 0. Agglomerative first. Calculate initial similarities: 1 22 18 21 10 9 16 6 5 7 15 4 8 11 3 12 2 20 25 7 23 15 9 11 5 18 4 3 6 8 3 9 10 4 11 2 5 14 3 6 14 3 7 13 3 8 13 9 14 10 6 7 11 14 3 12 15 13 4 2 5 1 7 6 16 3 13 2 14 5 15 5 16 5 17 1 18 2 19 6 20 7 21 6 22 2 23 6 24 11 25 6 26 5 27 7 28 11 29 12 30 9 31 10 32 10 33 25 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 1
Similarity Clustering sim(x,y)= W + h=1. n; k=1. n whk(x,y) Similarity Clustering sim(x,y)= W + h=1..n; k=1..n whk(x,y)*chk(x,y), chk=count(SPh(x)&SPk(y), W=6, w11=3, w12=w21=2, w22=1, else 0. 1 21 2 25 3 39 4 5 11 6 7 8 9 43 10 11 12 13 14 15 36 16 17 18 19 20 21 22 23 24 25 26 27 28 45 29 30 31 32 33 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 1
A Divisive Method 2 (two centroids are the max and the max non-nbr, so 1,17 None in neither. 6,7 in both S1,S17. Decide by count of 1,17 sibs=s, then cous=c, then 2ndcous=d, then 3rdcous=e 6S: S6:2,1 so in 1 S7:2,1 so in 1 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 14 8 5 6 3 4 4 4 3 1 2 4 2 2 2 2 = 1deg 1 6 9 8 12 11 11 10 12 13 12 10 3 12 12 12 = 2deg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2
A Divisive Method 3 (two centroids are the max and the max non-nbr, so 33,34 None in neither. 6,7 in both S1,S17. Decide by count of 1,17 sibs=s, then cous=c, then 2ndcous=d, then 3rdcous=e 6S: S6:2,1 so in 1 S7:2,1 so in 1 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 3 1 2 2 2 2 2 5 3 3 2 3 2 4 3 5 10 14 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 1
1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 5 6 7 11 2 3 5 6 7 8 9 21 2 3 4 7 30 SP4 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8=4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP3 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10=3dg G7
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 Zachary's karate club, a standard benchmark in community detection. (best partition found by optimizing modularity of Newman and Girvan) 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16 =1deg 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6 =2deg 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10 =3deg 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8 =4deg 1 1 8 1 1 1 1 1 1 =5deg G7 Let’s try