Shortest Path Trees Construction

Slides:

Advertisements

Similar presentations

Great Theoretical Ideas in Computer Science

Advertisements

Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.

Great Theoretical Ideas in Computer Science for Some.

Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.

1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.

Great Theoretical Ideas in Computer Science.

Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)

Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.

4/17/2017 Section 8.5 Euler & Hamilton Paths ch8.5.

Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.

MCS312: NP-completeness and Approximation Algorithms

Fixed Parameter Complexity Algorithms and Networks.

MST Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.

UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.

A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.

Combinatorial Algorithms Reference Text: Kreher and Stinson.

Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.

Graph Theory and Applications

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.

Chapter 9: Graphs.

Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.

5. Biconnected Components of A Graph If one city’s airport is closed by bad weather, can you still fly between any other pair of cities? If one computer.

Graph clustering to detect network modules

An Introduction to Graph Theory

Groups of vertices and Core-periphery structure

In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g

Applied Discrete Mathematics Week 13: Graphs

How is Data Analysis Changing?

Graph Clustering Algorithms: Divisive Girvan and Neuman delete edges with max “betweenness”, i.e., max participation in shortest paths (of all lengths).

Minimum Spanning Trees

The countable character of uncountable graphs François Laviolette Barbados 2003.

Minimum Spanning Tree 8/7/2018 4:26 AM

Graph theory Definitions Trees, cycles, directed graphs.

Eulerian tours Miles Jones MTThF 8:30-9:50am CSE 4140 August 15, 2016.

Great Theoretical Ideas in Computer Science

Introduction to Graphs

Next we build a ShortestPathtree, SPG1 for G1

Community detection in graphs

Girvan and Newman (Girvan and Newman,02; 04)

Shortest Path Problems

In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g

All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5)

Shortest Path Trees Construction

Peer-to-Peer and Social Networks

Discrete Mathematics for Computer Science

Michael L. Nelson CS 495/595 Old Dominion University

Shortest Path Trees Construction

Instructor: Shengyu Zhang

Next we build a ShortestPathtree, SPG1 for G1

CS 583 Analysis of Algorithms

A Vertical Graph Clustering Technique:

V11 Metabolic networks - Graph connectivity

Shortest Path Problems

Minimum Spanning Tree Algorithms

Next we build a ShortestPathtree, SPG1 for G1

Graphs and Algorithms (2MMD30)

Advanced Algorithms Analysis and Design

V11 Metabolic networks - Graph connectivity

Trevor Brown DC 2338, Office hour M3-4pm

V11 Metabolic networks - Graph connectivity

Girvan and Newman (Girvan and Newman,02; 04)

Chapter 9 Graph algorithms

Constructing a m-connected k-Dominating Set in Unit Disc Graphs

Concepts of Computation

Concepts of Computation

Minimum Spanning Trees

Presentation transcript:

Shortest Path Trees Construction (We don’t need the Path Trees to get the Shortest Path Trees! That’s because a subpath of a shortest path is a shortest path.) S1P=E SPSF11 SPSF1’1 SPSF12 SPSF1’2 SPSF13 SPSF1’3 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 1 1 2 1 2 1 3 1 3 1 S2P1=SPSF1’1&(ORjS1P1Ej ) S2P2=SPSF1’2&(ORjS1P2Ej ) S2P3=SPSF1’3&(ORjS1P3Ej ) SPSF21 SPSF2’1 1 2 1 3 1 4 1 2 1 1 3 1 4 1 S2P SPSF23 SPSF2’3 3 1 1 2 1 c 1 1 2 1 3 1 4 1 5 6 7 8 9 a b c 1 1 from here on. Identical to 1 3 1 3 1 S3P1=SPSF2’1&(ORjS2P1Ej ) S3P3=SPSF2’3&(ORjS2P3Ej ) S3P SPSF31 SPSF3’1 1 7 1 c 1 1 4 2 3 5 6 7 c 9 b a 8 G6 SPSF33 SPSF3’3 3 1 4 1 9 1 a 1 b 1 1 2 1 3 1 1 1 1 3 1 3 1 S4P1=SPSF3’1&(ORjS3P1Ej ) What is the cost of creating the SPs? vV, there are ~Avg{Diam(v)vV} steps, each costs 1 complement of SPSF (cost =compl), OR of ~Avg|Ek| pTrees (cost=OrAvg|Ek| 1 SPSF & above_OR_result (cost=AND), 1 OR to update SPSF (cost=OR) Cost= |V|*AvgDiam*(compl+OR*AD+AND+OR), so O(|V|). I.e., linear in # of vertices, assuming AD=AvgDeg is small. This is a one-time, parallelizable construction over the vertices. For Friends, it is B*4*(3*pTOP+AD*pTOP)=4B*(3+AD)pTOP=B*pTOP*(12+4AD), where pTOP is the cost of a pTree Operation (comp, &, OR) and B=billion). Parallelized over an n node cluster, this 1-time Shortest Path Tree construction cost would be B*pTOP*(12+4AvgDeg) / n. The SnP’s capture only the shortest path lengths between all pairs of vertices. We could (have) capture actual shortest paths (all shortest paths?, all paths in PTs?), since we construct (but do not retain) that info along the way. How to structure it/index it?/residualize it? S4P3=SPSF3’3&(ORjS3P3Ej ) S4P SPSF41 SPSF4’1 1 5 1 6 1 9 1 a 1 b 1 SPSF43 SPSF4’3 3 1 7 1 8 1 1 2 1 3 1 1 Done with Vertex 1 Shortest Paths. Diam(1)=4 Done with Vertex 3 Shortest Paths. Vertices 4-c SPs done the same way SPSF1i = S1Pi OR Mi , Mi has 1 only at i SPSF(k+1)i = SPSFki OR S(k+1)Pi S(k+1)Pi=SPSFk’i&(ORjSkPj Ej ) “The mask pTree of the shortest k+1 path starting at vertex i is the Shortest Paths So Far Complement ANDed with the OR of ith edge pTrees over all ithe Shortest k Path List”

K-plex Search on G6: A k-plex is a Subgraph missing  k edges K-plex Search on G6: A k-plex is a Subgraph missing  k edges. All subgraphs will be induced subgraphs defined by their vertex set. Subgraph S has |ES|=s edges, |VS|=v vertices. S is a kplex iff C(v,2) – s = v(v-1)/2-s  k If S is a kplex, S’ adds 1 vertex, x to S, (V(S’)=V(S)!{x}) then S’ a kplex iff (v+1)v/2 – (deg(x,S’)+s)  k. 1 4 2 3 5 6 7 c 9 b a 8 G6 Edges are 1-plexes. |E{123}| = |PE123| = 3 so 123 is a 0plex(clique) and a 1plex |E{124}| = |PE124| = 3 so 124 is a 0plex (clique) If H is an ISG, |VH|=h, |EH|=H, H=h(h-1)/2 then H is a kplex iff H – H  k.. If H is a kplex and F is an ISG of H, then F is a kplex (if F is missing an edge than H is missing that edge also, since K inherits all H edges involving its vertices. F cannot be missing more edges than H.) If G isn’t a kplex, F1 an ISG of G with a vertex of least degree removed. If F1 isn’t a kplex, F2 ISG with a vertex of least degree removed, etc. until we find Fj to be a kplex. Remove Fj Repeat until all vertexes removed. We did a k-plex search of G6 by simple calculating edge counts (which are simply 1-counts of ANDed pTrees) using only SP1=E. 1 3 2 4 5 6 7 8 9 a c b SP1=E G=12*11/2=66. G=19 G is a kplex for k  47. H1=ISG{12346789abc} (deg5=2). H1=11*10/2=55, H1=17. H1 is a kplex for k  37. H2=ISG{1234789abc} (deg6=2). H2=10*9/2=45, H2=15. H2 is a kplex for k  30. H3=ISG{123489abc} (deg7=1). H3=9*8/2=36, H3=14. H3 is a kplex for k  22. H4=ISG{12389abc} (deg4=2). H4=8*7/2=28, H4=12. H4 is a kplex for k  16. 1 2 3 4 5 6 7 8 9 a c b SP2 H5=ISG{1239abc} (deg8=2). H5=7*6/2=21, H5=10. H5 is a kplex for k  11. H6=ISG{239abc} (deg1=2). H6=6*5/2=15, H6=8. H6 is a kplex for k  7. H7=ISG{39abc} (deg2=1). H7=5*4/2=10, H7=7. H7 is a kplex for k  3. H8=ISG{9abc} (deg3=1). H8=4*3/2=6, H8=6. H8 is a kplex for k  0. So take out {9abc} and start over. G={12345678} G=8*7/2=28. G=10 G is a kplex for k  18. deg=33322331 H1=ISG{1234567} (deg8=1). H1=7*6/2=21, H1=9. H1 is a kplex for k  12. deg=2223223 1 2 3 4 5 6 7 8 9 a c b SP3 H2=ISG{234567} (deg1=2). H2=6*5/2=15, H2=6. H2 is a kplex for k  9. deg=112223 H3=ISG{34567} (deg2=1). H3=5*4/2=10, H3=4. H3 is a kplex for k  6. deg=01222 H4=ISG{4567} (deg3=0). H4=4*3/2=6, H4=4. H4 is a kplex for k  2. deg=1222 H5=ISG{567} (deg4=1). H5=3*2/2=3, H5=3. H5 is a kplex for k  0. deg=222 So take out {567} and start over. G={12348} G=5*4/2=10. G=5 G is a kplex for k  5. deg=33220 1 2 3 4 5 6 7 8 9 a c b SP4 H1=ISG{1234} (deg8=0). H1=4*3/2=6, H1=5. H1 is a kplex for k  1. deg=3322 H2=ISG{124} (deg3=2). H2=3*2/2=3, H2=3. H2 is a kplex for k  0. deg=222 This is exactly what we want ! 1234 is a 1plex (missing only 1 edge) and 124 was determined to be a clique (0plex – missing no edges). It’d have been great if 123 had revealed itself as a clique also, and if 89abc had been detected as a 1plex before 9abc was detected as a clique. How might we make progress in these directions? Try returning to remove all degree ties before moving on? We will try that on the next slide?

G7 Very Simple Weighted SP1 k-plex Search on G7 Weighting: 0,1path nbrs of x times 1; 2path nbrs of x times 0; 1 2 1 3 1 2 1 3 2 1 4 5 1 5 2 1 6 2 1 7 2 1 8 2 1 9 2 2 1 3 2 1 2 1 2 3 1 4 2 1 5 5 2 1 3 6 2 1 3 2 7 1 8 2 1 4 9 2 1 3 3 1 4 3 1 4 2 3 1 6 3 1 4 3 1 6 SP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 6 2 1 9 3 1 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5 1 2 3 H=1234567890123456789012345678901234 H=561 H=77 kplx k484 D g9a63444523125222223222533243446bg kcore k77 Cut 123: 1 2 3 H=1234567890123456789012345678901234 H=120 H=38 kplx k82 D 9685322452322522222322243323334367 kcore k38 Cut 23: 1 2 3 H=1234567890123456789012345678901234 H=55 H=26 kplx k24 D 6675322452322522222322223323334344 kcore k26 Cut 24: 1 2 3 H=1234567890123456789012345678901234 H=15 H=12 kplx k3 D 5454322422322422222322223323334344 kcore k12 Cut 2: 1 2 3 H=1234567890123456789012345678901234 H=10 H=10 kplx k0 D 4444322422322422222322223323334344 kcore k10 {1,2,3,4, 14} is a clique. {1,2,3,4,9,14} is a 3plex. 2 3 4 4 4 5 5 6 6 6 6 7 8 9 10 11 12 13 15 5 5 1 4 8 2 6 1 3 6 8 0 5 7 9 1 3 5 8 0 2 4 9 2 5 7 1 4 8 2 8 9 5 Cut0: 1 2 3 H=5678901235678901235678901 H=21 H=4 kplx k17 D 2330102000020000002111011 kcore k4 Cut 1 leaves 25 only. 1 2 3 H=56789012356789012345678901234 D 232031200222021202533232435af Cut012:1 2 3 H=56789012356789012345678901234 H=55 H=19 kplx k36 D 20203120022202120253323233456 kcore k19 1 2 3 H=89023568901235678901 H=19 H=4 kplex k15 D 01000000000002010011 kcore k4 Cut03: 1 2 3 H=56789012356789012345678901234 H=6 H=4 kplx k2 D 20203120022202120223323233222 kcore k6 {24,32,33,34} is a 2plex G7 Cut0: 2 3 H=89023568901235678901 H=19 H=4 kplex k15 D 01000000000002010001 kcore k4 Cut 0 leaves {9,31} as a 0plex 1 2 3 H=5678901235678901235678901 D 2330102000020000002111011 1 2 3 H=89023568901235678901 H=17 H=2 kplex k15 D 01000000000002010011 kcore k2 Cut 0 leaves {27,30} as a 0plex Cut01: 1 2 3 H=5678901235678901235678901 H=15 H=6 kplx k9 D 2330102000020000000111011 kcore k6 Cut0: 1 2 3 H=5678901235678901235678901 H=10 H=6 kplx k4 D 2330102000020000000111011 kcore k6 {5,6,7,11,17} is a 4plex 1 2 3 H=89023568901235678901 H=14 H=0 kplex k14 D 0100000000000201001kcore k0 no edges left 1 2 3 H=89023568901235678901 D 01000000000002111011 The expected communities are mostly not detected as kplexes or kcores. Cut0: 1 2 3 H=5678901235678901235678901 H=21 H=4 kplx k17 D 2330102000020000002111011 kcore k4 1 2 3 4 5 6 01234567890123456789012345678901234567890123456789012345678901234 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@#$ (Symbols for base 65 )

SP2 1 3 2 4 6 5 7 8 10 9 12 11 14 13 16 15 19 18 17 20 22 21 24 23 27 26 25 29 28 30 31 32 34 33 35 37 36 39 38 40 41 43 42 45 44 47 46 48 50 49 51 53 52 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SP1 and SP2 for G8 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 4 4 4 4 4 b a 5 6 4 5 g 9 7 4 6 b 2 b 8 6 4 f 9 f 4 9 3 8 6 d 4 5 4 5 4 2 3 6 7 5 7 6 7 3 5 3 5 3 4 9 6 5 SP1 3 2 5 4 7 6 8 10 9 11 13 12 15 14 16 18 17 20 19 21 22 24 23 26 25 28 27 30 29 32 31 34 33 36 35 37 38 40 39 42 41 43 44 46 45 47 49 48 50 52 51 53

Agglomerative Clustering with similarity=DegDif DegDif(v)=0-deg(v) since 0=intdeg(v) 1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 DegDif(18)= -3 (max). Agglomerate with siblings,{17,18,19,54} DegDif=6-16 = -10 DegDif(28)= -3 (max). Agglomerate with siblings{25 27 28 29} DegDif=6-14 = -8 DegDif(37)= -3 (max). Agglomerate with siblings{17 18 19 37 54} DegDif=10-31 = -21 DegDif(45)= -3 (max). Agglomerate with siblings{8 17 18 19 37 45 54} (Note that we have linked up with the “Light” cluster from the Astronomy cluster. It occurs to me that using an Agglomerative method for an example that is known to have overlapping clusters is a bad idea (agglomerative methods always produce a partition with no overlapping clusters. Therefore, let’s start over applying the agglomerative method to a different example that is not expected to have overlapping clusters. 1 2 3 4 5 6 01234567890123456789012345678901234567890123456789012345678901234 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@#$ SP1 2 1 3 5 4 7 6 9 8 10 11 13 12 15 14 17 16 18 20 19 21 23 22 26 25 24 28 27 29 30 31 33 32 34 36 35 37 38 40 39 42 41 43 44 46 45 49 48 47 50 52 51 54 53 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 10 1 1 1 1 1 1 1 1 - 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 G8 - 21 4 4 4 4 4 b b 5 7 5 6 h 9 7 4 7 c 3 c 8 6 4 f a g 4 9 3 8 6 e 4 5 4 6 5 3 4 6 8 5 7 7 8 3 5 3 5 3 4 9 6 5 j

Agglomerative Clustering with similarity=DegDif DegDif(v)=0-deg(v) since 0=intdeg(v) DegDif(12)= -1 (max). Agglomerate with siblings{1,12} DegDif=1 - 15 = -14 DegDif(10)= -2 (max). Agglomerate with siblings{3 10 34} DegDif=2 - 25 = -23. DegDif(13)= -2 (max). Agglomerate with siblings{1 4 12 13} DegDif=4 - 16 = -12. DegDif(15,16)= -2 (max). Agglomerate with siblings{3 10 15 16 33 34} DegDif=7 - 28 = -25. DegDif(17)= -2 (max). Agglomerate with siblings{6 7 17} DegDif=3 - 1 = 2. DegDif(6 7 17)= 2 (max). Agglomerate with siblings{5 6 7 11 17} DegDif=6 - 4 = 2. DegDif(18)= -2 (max). Agglomerate with siblings{1 2 4 12 13 18} DegDif=8 - 17 = -9. DegDif(19,21)= -2 (max). Agglomerate with siblings{3 10 15 16 19 21 33 34} DegDif=11 - 24 = -13. DegDif(22)= -2 (max). Agglomerate with siblings{1 2 4 12 13 18 22} DegDif=10 - 15 = -5. DegDif(23,27,30)= -2 (max). Agglomerate with siblings{3 10 15 16 19 21 23 27 30 33 34} DegDif=6 - 19 = -13. 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg DegDif(25)=-3 Aggl w sibs{25 26 28 32} DegDif=4-7 = -3. -12 -14 -9 -5 -23 2 -25 -13 -3 Even though there is no cluster overlap here, our method does not follow the usual agglomeration methodology, in which there is a similarity measure between pairs (starting out with all subclusters being points, so the initial similarity is between points and then involves similarity between a point and a subset and also between two subsets. There needs to be a consistent definition of similarity across all these types of pairs, which we do not have here. Therefore, let’s start over trying to define a correct similarity. Given a similarity there are two standard clustering approaches, k means and agglomerative. Agglomerative requires the above complete similarity (between pairs of subsets, one or both of which can be singletons), while k means simply requires a similarity between pairs of points. One similarity we might consider is some weighted sum of common cousins. E.g., let c0 be the # of common 0th cousins (siblings), c1=# of 1st cousins, etc. If we sum the common cousin counts with weights, w0, w1,… (presumably decreasing), then we have a similarity measure which is complete. We try this similarity on the next slide, first for agglomeration, then k means. G7 is Zachary's karate club, a standard benchmark in community detection. The colors correspond to the best partition found by optimizing the modularity of Newman and Girvan.

Agglomerative Clustering with similarity=DegDif DegDif(v)=0-deg(v) since 0=intdeg(v) DegDif(12)= -1 (max). Agglom with siblings{1,12} DegDif=1 - 15 = -14 DegDif(10)= -2 (max). Agglom with siblings{3 10 34} DegDif=2 - 24 = -22 DegDif(12)= -1 (max). Agglom w siblings{1,4,12,13} DegDif=4 - 13 = -9 DegDif(15)= -2 (max) Agglom w sibs{3 10 15 33 34} DegDif=4 - 22 = -18 DegDif(16)= -1 (max) Aggl w sibs{3 10 15 16 33 34} DegDif=6 - 20 = -14 DegDif(19)= -1 mx Aggl w sibs{3 10 15 16 19 33 34} DegDif=8 - 18 = -10 DegDif(21)= -1 mx Ag w sbs{3 10 15 16 19 21 33 34} DegDif=10-16= -6 DegDif(23)= -1 Ag w sbs{3 10 15 16 19 21 23 33 34} DegDif=12 - 14 = -2 DegDif(27,30)= -2 {3 10 15 16 19 21 23 27 30 33 34} DegDif=16 - 10 = 6 DegDif(17)= -2 {6 7 17} DegDif=3 - 2 = 1 DegDif(22)= -2 (max). Agg w siblings{1,4,12,13 22} DegDif=5 - 12 = -7 DegDif(18)= -2 (max). Agg w sibls{1,4,12,13 18 22} DegDif=6 - 11 = -5 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg DegDif(29)= -2 (max). Agg w sibls{29 32} DegDif= 1 - 5 = -4 DegDif(31)= -3 {3 9 10 15 16 19 21 23 27 30 31 33 34} DgDf=22 - 6 = 16 DegDif(25 26 28)= -2. Agg w sibls{25 26 28 29 32} DegDif= 5 - 4 = 1 Even though there is no cluster overlap here, our method does not follow the usual agglomeration methodology, in which there is a similarity measure between pairs (starting out with all subclusters being points, so the initial similarity is between points and then involves similarity between a point and a subset and also between two subsets. There needs to be a consistent definition of similarity across all these types of pairs, which we do not have here. Therefore, let’s start over trying to define a correct similarity. Given a similarity there are two standard clustering approaches, k means and agglomerative. Agglomerative requires the above complete similarity (between pairs of subsets, one or both of which can be singletons), while k means simply requires a similarity between pairs of points. One similarity we might consider is some weighted sum of common cousins. E.g., let c0 be the # of common 0th cousins (siblings), c1=# of 1st cousins, etc. If we sum the common cousin counts with weights, w0, w1,… (presumably decreasing), then we have a similarity measure which is complete. We try this similarity on the next slide, first for agglomeration, then k means. -14 -9 -10 -6 -3 -4 -4 -4 -5 -2 -3 -2 -5 -2 -2 -2 -2 -2 -3 -2 -2 -2 -5 -3 -3 -2 -4 -3 -4 -4 -6 -11 -16 =dgdf -14 -9 -10 -6 -3 -4 -4 -4 -4 -22 -3 -2 -5 -2 -2 -2 -2 -2 -3 -2 -2 -2 -5 -3 -3 -2 -3 -2 -4 -4 -6 -11 -16 =dgdf -9 -8 -10 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -5 -6 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -4 -3 -3 -4 -10 -16 =dgdf -4 -5 16 - -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -4 -3 -3 -4 -10 -16 =dgdf -4 -5 17 - -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -2 -3 -3 -2 -3 1 -3 -3 -4 -10 -16 =dgdf -8 -8 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -9 -8 -14-6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -9 -8 -18-6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -5 -6 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -9 -8 -10-6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -4 -3 -3 -2 -3 -2 -3 -4 -6 -10 -16 =dgdf -9 -8 -2 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -9 -8 6 -6 -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -3 -1 -2 -1 -3 -3 -3 -2 -3 -2 -3 -3 -4 -10 -16 =dgdf -4 -5 17 - -3 -4 -4 -3 -3 -2 -3 -2 -3 -2 -1 1 -2 -1 -3 -1 -2 -1 -2 -3 -3 -2 -3 1 -3 -3 -4 -10 -16 =dgdf

Similarity Clustering sim(x,y)=W+k=0. n wk(x,y) Similarity Clustering sim(x,y)=W+k=0..n wk(x,y)*ck(x,y), c0=#common siblings (other than themselves), c1=# common 1st cousins, C2=# common 2nd cousins.. W=5 iff siblings, w0=2, w1=1, else 0. Agglomerative first. Calculate initial similarities: 1 22 18 21 10 9 16 6 5 7 15 4 8 11 3 12 2 20 25 7 23 15 9 11 5 18 4 3 6 8 3 9 10 4 11 2 5 14 3 6 14 3 7 13 3 8 13 9 14 10 6 7 11 14 3 12 15 13 4 2 5 1 7 6 16 3 13 2 14 5 15 5 16 5 17 1 18 2 19 6 20 7 21 6 22 2 23 6 24 11 25 6 26 5 27 7 28 11 29 12 30 9 31 10 32 10 33 25 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 1

Similarity Clustering sim(x,y)= W + h=1. n; k=1. n whk(x,y) Similarity Clustering sim(x,y)= W + h=1..n; k=1..n whk(x,y)*chk(x,y), chk=count(SPh(x)&SPk(y), W=6, w11=3, w12=w21=2, w22=1, else 0. 1 21 2 25 3 39 4 5 11 6 7 8 9 43 10 11 12 13 14 15 36 16 17 18 19 20 21 22 23 24 25 26 27 28 45 29 30 31 32 33 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 1

A Divisive Method 2 (two centroids are the max and the max non-nbr, so 1,17 None in neither. 6,7 in both S1,S17. Decide by count of 1,17 sibs=s, then cous=c, then 2ndcous=d, then 3rdcous=e 6S: S6:2,1 so in 1 S7:2,1 so in 1 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 14 8 5 6 3 4 4 4 3 1 2 4 2 2 2 2 = 1deg 1 6 9 8 12 11 11 10 12 13 12 10 3 12 12 12 = 2deg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2

A Divisive Method 3 (two centroids are the max and the max non-nbr, so 33,34 None in neither. 6,7 in both S1,S17. Decide by count of 1,17 sibs=s, then cous=c, then 2ndcous=d, then 3rdcous=e 6S: S6:2,1 so in 1 S7:2,1 so in 1 G7 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 3 1 2 2 2 2 2 5 3 3 2 3 2 4 3 5 10 14 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 1

1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 5 6 7 11 2 3 5 6 7 8 9 21 2 3 4 7 30 SP4 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8=4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP3 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10=3dg G7

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 Zachary's karate club, a standard benchmark in community detection. (best partition found by optimizing modularity of Newman and Girvan) 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16 =1deg 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6 =2deg 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10 =3deg 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8 =4deg 1 1 8 1 1 1 1 1 1 =5deg G7 Let’s try