Shortest Path Trees Construction

Shortest Path Trees Construction
(We don’t need the Path Trees to get the Shortest Path Trees! That’s because a subpath of a shortest path is a shortest path.) S1P=E SPSF11 SPSF1’1 SPSF12 SPSF1’2 SPSF13 SPSF1’3 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 1 1 2 1 2 1 3 1 3 1 S2P1=SPSF1’1&(ORjS1P1Ej ) S2P2=SPSF1’2&(ORjS1P2Ej ) S2P3=SPSF1’3&(ORjS1P3Ej ) SPSF21 SPSF2’1 1 2 1 3 1 4 1 2 1 1 3 1 4 1 S2P SPSF23 SPSF2’3 3 1 1 2 1 c 1 1 2 1 3 1 4 1 5 6 7 8 9 a b c 1 1 from here on. Identical to 1 3 1 3 1 S3P1=SPSF2’1&(ORjS2P1Ej ) S3P3=SPSF2’3&(ORjS2P3Ej ) S3P SPSF31 SPSF3’1 1 7 1 c 1 1 4 2 3 5 6 7 c 9 b a 8 G6 SPSF33 SPSF3’3 3 1 4 1 9 1 a 1 b 1 1 2 1 3 1 1 1 1 3 1 3 1 S4P1=SPSF3’1&(ORjS3P1Ej ) What is the cost of creating the SPs? vV, there are ~Avg{Diam(v)vV} steps, each costs 1 complement of SPSF (cost =compl), OR of ~Avg|Ek| pTrees (cost=OrAvg|Ek| 1 SPSF & above_OR_result (cost=AND), 1 OR to update SPSF (cost=OR) Cost= |V|*AvgDiam*(compl+OR*AD+AND+OR), so O(|V|). I.e., linear in # of vertices, assuming AD=AvgDeg is small. This is a one-time, parallelizable construction over the vertices. For Friends, it is B*4*(3*pTOP+AD*pTOP)=4B*(3+AD)pTOP=B*pTOP*(12+4AD), where pTOP is the cost of a pTree Operation (comp, &, OR) and B=billion). Parallelized over an n node cluster, this 1-time Shortest Path Tree construction cost would be B*pTOP*(12+4AvgDeg) / n. The SnP’s capture only the shortest path lengths between all pairs of vertices. We could (have) capture actual shortest paths (all shortest paths?, all paths in PTs?), since we construct (but do not retain) that info along the way. How to structure it/index it?/residualize it? S4P3=SPSF3’3&(ORjS3P3Ej ) S4P SPSF41 SPSF4’1 1 5 1 6 1 9 1 a 1 b 1 SPSF43 SPSF4’3 3 1 7 1 8 1 1 2 1 3 1 1 Done with Vertex 1 Shortest Paths. Diam(1)=4 Done with Vertex 3 Shortest Paths. Vertices 4-c SPs done the same way SPSF1i = S1Pi OR Mi , Mi has 1 only at i SPSF(k+1)i = SPSFki OR S(k+1)Pi S(k+1)Pi=SPSFk’i&(ORjSkPj Ej ) “The mask pTree of the shortest k+1 path starting at vertex i is the Shortest Paths So Far Complement ANDed with the OR of ith edge pTrees over all ithe Shortest k Path List”

K-plex Search on G6: A k-plex is a Subgraph missing  k edges
K-plex Search on G6: A k-plex is a Subgraph missing  k edges. All subgraphs will be induced subgraphs defined by their vertex set. Subgraph S has |ES|=s edges, |VS|=v vertices. S is a kplex iff C(v,2) – s = v(v-1)/2-s  k If S is a kplex, S’ adds 1 vertex, x to S, (V(S’)=V(S)!{x}) then S’ a kplex iff (v+1)v/2 – (deg(x,S’)+s)  k. 1 4 2 3 5 6 7 c 9 b a 8 G6 Edges are 1-plexes. |E{123}| = |PE123| = 3 so 123 is a 0plex(clique) and a 1plex |E{124}| = |PE124| = 3 so 124 is a 0plex (clique) If H is an ISG, |VH|=h, |EH|=H, H=h(h-1)/2 then H is a kplex iff H – H  k.. If H is a kplex and F is an ISG of H, then F is a kplex (if F is missing an edge than H is missing that edge also, since K inherits all H edges involving its vertices. F cannot be missing more edges than H.) If G isn’t a kplex, F1 an ISG of G with a vertex of least degree removed. If F1 isn’t a kplex, F2 ISG with a vertex of least degree removed, etc. until we find Fj to be a kplex. Remove Fj Repeat until all vertexes removed. We did a k-plex search of G6 by simple calculating edge counts (which are simply 1-counts of ANDed pTrees) using only SP1=E. 1 3 2 4 5 6 7 8 9 a c b SP1=E G=12*11/2=66. G= G is a kplex for k  H1=ISG{ abc} (deg5=2). H1=11*10/2=55, H1=17. H1 is a kplex for k  37. H2=ISG{ abc} (deg6=2). H2=10*9/2=45, H2=15. H2 is a kplex for k  30. H3=ISG{123489abc} (deg7=1). H3=9*8/2=36, H3=14. H3 is a kplex for k  22. H4=ISG{12389abc} (deg4=2). H4=8*7/2=28, H4=12. H4 is a kplex for k  16. 1 2 3 4 5 6 7 8 9 a c b SP2 H5=ISG{1239abc} (deg8=2). H5=7*6/2=21, H5=10. H5 is a kplex for k  11. H6=ISG{239abc} (deg1=2). H6=6*5/2=15, H6= H6 is a kplex for k  7. H7=ISG{39abc} (deg2=1). H7=5*4/2=10, H7= H7 is a kplex for k  3. H8=ISG{9abc} (deg3=1). H8=4*3/2=6, H8= H8 is a kplex for k  So take out {9abc} and start over. G={ } G=8*7/2= G= G is a kplex for k  18. deg= H1=ISG{ } (deg8=1) H1=7*6/2=21, H1=9. H1 is a kplex for k  12. deg= 1 2 3 4 5 6 7 8 9 a c b SP3 H2=ISG{234567} (deg1=2) H2=6*5/2=15, H2=6. H2 is a kplex for k  9. deg=112223 H3=ISG{34567} (deg2=1) H3=5*4/2=10, H3=4. H3 is a kplex for k  6. deg=01222 H4=ISG{4567} (deg3=0) H4=4*3/2=6, H4=4. H4 is a kplex for k  2. deg=1222 H5=ISG{567} (deg4=1) H5=3*2/2=3, H5=3. H5 is a kplex for k  0. deg=222 So take out {567} and start over. G={12348} G=5*4/2= G= G is a kplex for k  5. deg=33220 1 2 3 4 5 6 7 8 9 a c b SP4 H1=ISG{1234} (deg8=0) H1=4*3/2=6, H1=5. H1 is a kplex for k  1. deg=3322 H2=ISG{124} (deg3=2) H2=3*2/2=3, H2=3. H2 is a kplex for k  0. deg=222 This is exactly what we want ! is a 1plex (missing only 1 edge) and 124 was determined to be a clique (0plex – missing no edges). It’d have been great if 123 had revealed itself as a clique also, and if 89abc had been detected as a 1plex before 9abc was detected as a clique. How might we make progress in these directions? Try returning to remove all degree ties before moving on? We will try that on the next slide?

G7 Very Simple Weighted SP1 k-plex Search on G7 Weighting:
0,1path nbrs of x times 1; 2path nbrs of x times 0; 1 2 1 3 1 2 1 3 2 1 4 5 1 5 2 1 6 2 1 7 2 1 8 2 1 9 2 2 1 3 2 1 2 1 2 3 1 4 2 1 5 5 2 1 3 6 2 1 3 2 7 1 8 2 1 4 9 2 1 3 3 1 4 3 1 4 2 3 1 6 3 1 4 3 1 6 SP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 6 2 1 9 3 1 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5 H= H=561 H=77 kplx k484 D g9a bg kcore k77 Cut 123: H= H=120 H=38 kplx k82 D kcore k38 Cut 23: H= H=55 H=26 kplx k24 D kcore k26 Cut 24: H= H=15 H=12 kplx k3 D kcore k12 Cut 2: H= H=10 H=10 kplx k0 D kcore k10 {1,2,3,4, 14} is a clique. {1,2,3,4,9,14} is a 3plex. Cut0: H= H=21 H=4 kplx k17 D kcore k4 Cut 1 leaves 25 only. H= D af Cut012: H= H=55 H=19 kplx k36 D kcore k19 H= H=19 H=4 kplex k15 D kcore k4 Cut03: H= H=6 H=4 kplx k2 D kcore k6 {24,32,33,34} is a 2plex G7 Cut0: H= H=19 H=4 kplex k15 D kcore k4 Cut 0 leaves {9,31} as a 0plex H= D H= H=17 H=2 kplex k15 D kcore k2 Cut 0 leaves {27,30} as a 0plex Cut01: H= H=15 H=6 kplx k9 D kcore k6 Cut0: H= H=10 H=6 kplx k4 D kcore k6 {5,6,7,11,17} is a 4plex H= H=14 H=0 kplex k14 D kcore k0 no edges left H= D The expected communities are mostly not detected as kplexes or kcores. Cut0: H= H=21 H=4 kplx k17 D kcore k4 (Symbols for base 65 )

SP2 1 3 2 4 6 5 7 8 10 9 12 11 14 13 16 15 19 18 17 20 22 21 24 23 27 26 25 29 28 30 31 32 34 33 35 37 36 39 38 40 41 43 42 45 44 47 46 48 50 49 51 53 52 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SP1 and SP2 for G8 1 b a g b 2 b f 9 f d SP1 3 2 5 4 7 6 8 10 9 11 13 12 15 14 16 18 17 20 19 21 22 24 23 26 25 28 27 30 29 32 31 34 33 36 35 37 38 40 39 42 41 43 44 46 45 47 49 48 50 52 51 53

Agglomerative Clustering with similarity=DegDif DegDif(v)=0-deg(v) since 0=intdeg(v)
1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 DegDif(18)= -3 (max). Agglomerate with siblings,{17,18,19,54} DegDif=6-16 = -10 DegDif(28)= -3 (max). Agglomerate with siblings{ } DegDif=6-14 = -8 DegDif(37)= -3 (max). Agglomerate with siblings{ } DegDif=10-31 = -21 DegDif(45)= -3 (max). Agglomerate with siblings{ } (Note that we have linked up with the “Light” cluster from the Astronomy cluster. It occurs to me that using an Agglomerative method for an example that is known to have overlapping clusters is a bad idea (agglomerative methods always produce a partition with no overlapping clusters. Therefore, let’s start over applying the agglomerative method to a different example that is not expected to have overlapping clusters. SP1 2 1 3 5 4 7 6 9 8 10 11 13 12 15 14 17 16 18 20 19 21 23 22 26 25 24 28 27 29 30 31 33 32 34 36 35 37 38 40 39 42 41 43 44 46 45 49 48 47 50 52 51 54 53 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 10 1 1 1 1 1 1 1 1 - 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 G8 - 21 b b h c 3 c f a g e j

DegDif(12)= -1 (max). Agglomerate with siblings{1,12} DegDif= = -14 DegDif(10)= -2 (max). Agglomerate with siblings{ } DegDif= = -23. DegDif(13)= -2 (max). Agglomerate with siblings{ } DegDif= = -12. DegDif(15,16)= -2 (max). Agglomerate with siblings{ } DegDif= = -25. DegDif(17)= -2 (max). Agglomerate with siblings{6 7 17} DegDif=3 - 1 = 2. DegDif(6 7 17)= 2 (max). Agglomerate with siblings{ } DegDif=6 - 4 = 2. DegDif(18)= -2 (max). Agglomerate with siblings{ } DegDif= = -9. DegDif(19,21)= -2 (max). Agglomerate with siblings{ } DegDif= = -13. DegDif(22)= -2 (max). Agglomerate with siblings{ } DegDif= = -5. DegDif(23,27,30)= -2 (max). Agglomerate with siblings{ } DegDif= = -13. 1 SP1 =1deg DegDif(25)=-3 Aggl w sibs{ } DegDif=4-7 = -3. -12 -14 -9 -5 -23 2 -25 -13 -3 Even though there is no cluster overlap here, our method does not follow the usual agglomeration methodology, in which there is a similarity measure between pairs (starting out with all subclusters being points, so the initial similarity is between points and then involves similarity between a point and a subset and also between two subsets. There needs to be a consistent definition of similarity across all these types of pairs, which we do not have here. Therefore, let’s start over trying to define a correct similarity. Given a similarity there are two standard clustering approaches, k means and agglomerative. Agglomerative requires the above complete similarity (between pairs of subsets, one or both of which can be singletons), while k means simply requires a similarity between pairs of points. One similarity we might consider is some weighted sum of common cousins. E.g., let c0 be the # of common 0th cousins (siblings), c1=# of 1st cousins, etc. If we sum the common cousin counts with weights, w0, w1,… (presumably decreasing), then we have a similarity measure which is complete. We try this similarity on the next slide, first for agglomeration, then k means. G7 is Zachary's karate club, a standard benchmark in community detection. The colors correspond to the best partition found by optimizing the modularity of Newman and Girvan.

DegDif(12)= -1 (max). Agglom with siblings{1,12} DegDif= = -14 DegDif(10)= -2 (max). Agglom with siblings{ } DegDif= = -22 DegDif(12)= -1 (max). Agglom w siblings{1,4,12,13} DegDif= = -9 DegDif(15)= -2 (max) Agglom w sibs{ } DegDif= = -18 DegDif(16)= -1 (max) Aggl w sibs{ } DegDif= = -14 DegDif(19)= -1 mx Aggl w sibs{ } DegDif= = -10 DegDif(21)= -1 mx Ag w sbs{ } DegDif=10-16= -6 DegDif(23)= -1 Ag w sbs{ } DegDif= = -2 DegDif(27,30)= -2 { } DegDif= = 6 DegDif(17)= -2 {6 7 17} DegDif=3 - 2 = 1 DegDif(22)= -2 (max). Agg w siblings{1,4,12,13 22} DegDif= = -7 DegDif(18)= -2 (max). Agg w sibls{1,4,12, } DegDif= = -5 1 SP1 =1deg DegDif(29)= -2 (max). Agg w sibls{29 32} DegDif= = -4 DegDif(31)= -3 { } DgDf= = 16 DegDif( )= -2. Agg w sibls{ } DegDif= = 1 Even though there is no cluster overlap here, our method does not follow the usual agglomeration methodology, in which there is a similarity measure between pairs (starting out with all subclusters being points, so the initial similarity is between points and then involves similarity between a point and a subset and also between two subsets. There needs to be a consistent definition of similarity across all these types of pairs, which we do not have here. Therefore, let’s start over trying to define a correct similarity. Given a similarity there are two standard clustering approaches, k means and agglomerative. Agglomerative requires the above complete similarity (between pairs of subsets, one or both of which can be singletons), while k means simply requires a similarity between pairs of points. One similarity we might consider is some weighted sum of common cousins. E.g., let c0 be the # of common 0th cousins (siblings), c1=# of 1st cousins, etc. If we sum the common cousin counts with weights, w0, w1,… (presumably decreasing), then we have a similarity measure which is complete. We try this similarity on the next slide, first for agglomeration, then k means. =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf =dgdf

Similarity Clustering sim(x,y)=W+k=0. n wk(x,y)
Similarity Clustering sim(x,y)=W+k=0..n wk(x,y)*ck(x,y), c0=#common siblings (other than themselves), c1=# common 1st cousins, C2=# common 2nd cousins.. W=5 iff siblings, w0=2, w1=1, else 0. Agglomerative first. Calculate initial similarities: 1 22 18 21 10 9 16 6 5 7 15 4 8 11 3 12 2 20 25 7 23 15 9 11 5 18 4 3 6 8 3 9 10 4 11 2 5 14 3 6 14 3 7 13 3 8 13 9 14 10 6 7 11 14 3 12 15 13 4 2 5 1 7 6 16 3 13 2 14 5 15 5 16 5 17 1 18 2 19 6 20 7 21 6 22 2 23 6 24 11 25 6 26 5 27 7 28 11 29 12 30 9 31 10 32 10 33 25 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 G7 1 SP1 =1dg 1 2 3 4 5 6 7 8 9 SP2 =2dg 1

Similarity Clustering sim(x,y)= W + h=1. n; k=1. n whk(x,y)
Similarity Clustering sim(x,y)= W + h=1..n; k=1..n whk(x,y)*chk(x,y), chk=count(SPh(x)&SPk(y), W=6, w11=3, w12=w21=2, w22=1, else 0. 1 21 2 25 3 39 4 5 11 6 7 8 9 43 10 11 12 13 14 15 36 16 17 18 19 20 21 22 23 24 25 26 27 28 45 29 30 31 32 33 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 G7 1 SP1 =1dg 1 2 3 4 5 6 7 8 9 SP2 =2dg 1

A Divisive Method 2 (two centroids are the max and the max non-nbr, so 1,17
None in neither. 6,7 in both S1,S17. Decide by count of 1,17 sibs=s, then cous=c, then 2ndcous=d, then 3rdcous=e 6S: S6:2,1 so in 1 S7:2,1 so in 1 G7 1 SP1 = 1deg = 2deg 1 2 3 4 5 6 7 8 9 SP2

A Divisive Method 3 (two centroids are the max and the max non-nbr, so 33,34
None in neither. 6,7 in both S1,S17. Decide by count of 1,17 sibs=s, then cous=c, then 2ndcous=d, then 3rdcous=e 6S: S6:2,1 so in 1 S7:2,1 so in 1 G7 1 SP1 1 2 3 4 5 6 7 8 9 SP2 1

1 SP1 =1deg 1 2 3 4 5 6 7 8 9 SP2 =2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 SP4 =4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 SP3 =3dg G7

Zachary's karate club, a standard benchmark in community detection. (best partition found by optimizing modularity of Newman and Girvan) =1deg =2deg =3deg =4deg =5deg G7 Let’s try

Shortest Path Trees Construction

Similar presentations

Presentation on theme: "Shortest Path Trees Construction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shortest Path Trees Construction

Similar presentations

Presentation on theme: "Shortest Path Trees Construction"— Presentation transcript:

Similar presentations

About project

Feedback