Shortest Path Trees Construction

Slides:

Advertisements

Similar presentations

Advertisements

Greedy Algorithms Greed is good. (Some of the time)

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.

3.3 Spanning Trees Tucker, Applied Combinatorics, Section 3.3, by Patti Bodkin and Tamsen Hunter.

Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.

V4 Matrix algorithms and graph partitioning

1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.

Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.

Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.

1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.

Data Structures and Algorithms Graphs Minimum Spanning Tree PLSD210.

Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R

A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.

Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han

Data Structures & Algorithms Graphs

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

CompSci 102 Discrete Math for Computer Science March 13, 2012 Prof. Rodger Slides modified from Rosen.

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Graphs. Graph Definitions A graph G is denoted by G = (V, E) where  V is the set of vertices or nodes of the graph  E is the set of edges or arcs connecting.

Grade 11 AP Mathematics Graph Theory Definition: A graph, G, is a set of vertices v(G) = {v 1, v 2, v 3, …, v n } and edges e(G) = {v i v j where 1 ≤ i,

Capabilities, Minimization, and Transformation of Sequential Machines

Graph clustering to detect network modules

The NP class. NP-completeness

Greedy Algorithms.

Groups of vertices and Core-periphery structure

BackTracking CS255.

In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g

Graphs Representation, BFS, DFS

Copyright © Zeph Grunschlag,

How is Data Analysis Changing?

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Minimum Spanning Tree 8/7/2018 4:26 AM

Graph theory Definitions Trees, cycles, directed graphs.

Mean Shift Segmentation

The Edge pTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of the graph, G1 G1.

Depth-First Search.

Chapter 5. Optimal Matchings

Community detection in graphs

Girvan and Newman (Girvan and Newman,02; 04)

In taking the inner product of 32 bitwidth Scalar pTreeSets (e. g

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Shortest Path Trees Construction

Graph Operations And Representation

Degree and Eigenvector Centrality

1.3 Modeling with exponentially many constr.

Planarity Testing.

Discrete Mathematics for Computer Science

CIS 700: “algorithms for Big Data”

Instructor: Shengyu Zhang

The Edge pTree(E), PathTree(PT), ShortestPathvTree(SPT), AcyclicPathTree(APT) and CycleList(CL) of the graph, G1 G1.

Lectures on Graph Algorithms: searching, testing and sorting

Graph Operations And Representation

CS 583 Analysis of Algorithms

Shortest Path Trees Construction

A Vertical Graph Clustering Technique:

CSCI B609: “Foundations of Data Science”

Next we build a ShortestPathtree, SPG1 for G1

1.3 Modeling with exponentially many constr.

Graphs and Algorithms (2MMD30)

Graph Operations And Representation

Closures of Relations Epp, section 10.1,10.2 CS 202.

Chapter 9 Graph algorithms

INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .

Data Structures and Algorithms

Minimum Spanning Trees

Presentation transcript:

Shortest Path Trees Construction (We don’t need the Path Trees to get the Shortest Path Trees! That’s because a subpath of a shortest path is a shortest path.) S1P=E SPSF11 SPSF1’1 SPSF12 SPSF1’2 SPSF13 SPSF1’3 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 1 1 2 1 2 1 3 1 3 1 S2P1=SPSF1’1&(ORjS1P1Ej ) S2P2=SPSF1’2&(ORjS1P2Ej ) S2P3=SPSF1’3&(ORjS1P3Ej ) SPSF21 SPSF2’1 1 2 1 3 1 4 1 2 1 1 3 1 4 1 S2P SPSF23 SPSF2’3 3 1 1 2 1 c 1 1 2 1 3 1 4 1 5 6 7 8 9 a b c 1 1 from here on. Identical to 1 3 1 3 1 S3P1=SPSF2’1&(ORjS2P1Ej ) S3P3=SPSF2’3&(ORjS2P3Ej ) S3P SPSF31 SPSF3’1 1 7 1 c 1 1 4 2 3 5 6 7 c 9 b a 8 G6 SPSF33 SPSF3’3 3 1 4 1 9 1 a 1 b 1 1 2 1 3 1 1 1 1 3 1 3 1 S4P1=SPSF3’1&(ORjS3P1Ej ) What is the cost of creating the SPs? vV, there are ~Avg{Diam(v)vV} steps, each costs 1 complement of SPSF (cost =compl), OR of ~Avg|Ek| pTrees (cost=OrAvg|Ek| 1 SPSF & above_OR_result (cost=AND), 1 OR to update SPSF (cost=OR) Cost= |V|*AvgDiam*(compl+OR*AD+AND+OR), so O(|V|). I.e., linear in # of vertices, assuming AD=AvgDeg is small. This is a one-time, parallelizable construction over the vertices. For Friends, it is B*4*(3*pTOP+AD*pTOP)=4B*(3+AD)pTOP=B*pTOP*(12+4AD), where pTOP is the cost of a pTree Operation (comp, &, OR) and B=billion). Parallelized over an n node cluster, this 1-time Shortest Path Tree construction cost would be B*pTOP*(12+4AvgDeg) / n. The SnP’s capture only the shortest path lengths between all pairs of vertices. We could (have) capture actual shortest paths (all shortest paths?, all paths in PTs?), since we construct (but do not retain) that info along the way. How to structure it/index it?/residualize it? S4P3=SPSF3’3&(ORjS3P3Ej ) S4P SPSF41 SPSF4’1 1 5 1 6 1 9 1 a 1 b 1 SPSF43 SPSF4’3 3 1 7 1 8 1 1 2 1 3 1 1 Done with Vertex 1 Shortest Paths. Diam(1)=4 Done with Vertex 3 Shortest Paths. Vertices 4-c SPs done the same way SPSF1i = S1Pi OR Mi , Mi has 1 only at i SPSF(k+1)i = SPSFki OR S(k+1)Pi S(k+1)Pi=SPSFk’i&(ORjSkPj Ej ) “The mask pTree of the shortest k+1 path starting at vertex i is the Shortest Paths So Far Complement ANDed with the OR of ith edge pTrees over all ithe Shortest k Path List”

1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 5 6 7 11 2 3 5 6 7 8 9 21 2 3 4 7 30 SP4 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8=4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP3 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10=3dg G7 1 2 3 1234567890123456789012345678901234 ver g9a63444523125222223222533243446bg 1dg 9djgdcdhojepepff3fgqfggf66dklfkqb6 2dg 8b4b888c3b889366c8646864nn678581aa 3dg 000088800188809a8880888811a1180011 4dg 0000000000000011801010110010010000 5dg 17 is an outlier. Try clustering by SPdeg from 17. The SPk17 pTrees mask the clustering (next slide) BASE 65 1 2 3 4 5 6 01234567890123456789012345678901234567890123456789012345678901234 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@#$ Shortest Path Trees

17 is an outlier. Try clustering by SPdeg from 17 17 is an outlier. Try clustering by SPdeg from 17. The SPk17 pTrees mask the clustering. 1 2 3 4 5 6 7 8 9 SPdegk(17) 1 SPdeg=1: 6 7 2 1 SPdeg=2: 1 5 11 3 1 SPdeg=3: 2 3 4 8 9 12 13 14 18 20 22 32 4 1 SPdeg=4: 10 25 26 28 29 31 33 34 5 1 SPdeg=5: 15 16 19 21 23 24 27 30 G7 Now we would want to make this divisive and recursive. The maroon cluster could be broken apart into white and blue. Then one could use DegreeDifference within clusters to trade vertices among clustes to improve the DegDif quality measure. Maybe an agglomerative or divisive approach using SPdeg? Agglomerate two pieces together iff the SPdegdif is improved (or still exceeds a threshold?)? One could use Genetic Algorithm Hill Climbing to optimize clustering based on Gas applied to the SPdeg arrays. The bottom line is that there is a wealth of value in ShortestPathDegrees. One can easily mask subsets and recalculate SPdeg.

1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 5 6 7 11 2 3 5 6 7 8 9 21 2 3 4 7 30 SP4 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8=4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP3 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10=3dg G7 1 and 34 have highest SP1deg (most siblings) at 16. Start with clusters, S(1), S(34) of siblings. Break ties with DegreeDiffs defined below. intdegS(x)=#edges from x to S-vertices. extdegS(x)=#edges from x to S’-vertices. DegDifS(x)=indegS(x)-extdegS(x) (or intdegS(x)/1+extdegS(x)? Start with S (and T,U,… if there are ties) =siblings of x of highest SP1degree. So for G7, S=Sibl(1) and T=Sibl(34). Add y(S’-T) to S iff DegDifS(y)>thresh1 and subract zS from S iff DegDif(z)<thesh2.

K-plex Search on G6: A k-plex is a Subgraph missing  k edges K-plex Search on G6: A k-plex is a Subgraph missing  k edges. All subgraphs will be induced subgraphs defined by their vertex set. Subgraph S has |ES|=s edges, |VS|=v vertices. S is a kplex iff C(v,2) – s = v(v-1)/2-s  k If S is a kplex, S’ adds 1 vertex, x to S, (V(S’)=V(S)!{x}) then S’ a kplex iff (v+1)v/2 – (deg(x,S’)+s)  k. 1 4 2 3 5 6 7 c 9 b a 8 G6 Edges are 1-plexes. |E{123}| = |PE123| = 3 so 123 is a 0plex(clique) and a 1plex |E{124}| = |PE124| = 3 so 124 is a 0plex (clique) If H is an ISG, |VH|=h, |EH|=H, H=h(h-1)/2 then H is a kplex iff H – H  k.. If H is a kplex and F is an ISG of H, then F is a kplex (if F is missing an edge than H is missing that edge also, since K inherits all H edges involving its vertices. F cannot be missing more edges than H.) If G isn’t a kplex, F1 an ISG of G with a vertex of least degree removed. If F1 isn’t a kplex, F2 ISG with a vertex of least degree removed, etc. until we find Fj to be a kplex. Remove Fj Repeat until all vertexes removed. We did a k-plex search of G6 by simple calculating edge counts (which are simply 1-counts of ANDed pTrees) using only SP1=E. 1 3 2 4 5 6 7 8 9 a c b SP1=E G=12*11/2=66. G=19 G is a kplex for k  47. H1=ISG{12346789abc} (deg5=2). H1=11*10/2=55, H1=17. H1 is a kplex for k  37. H2=ISG{1234789abc} (deg6=2). H2=10*9/2=45, H2=15. H2 is a kplex for k  30. H3=ISG{123489abc} (deg7=1). H3=9*8/2=36, H3=14. H3 is a kplex for k  22. H4=ISG{12389abc} (deg4=2). H4=8*7/2=28, H4=12. H4 is a kplex for k  16. 1 2 3 4 5 6 7 8 9 a c b SP2 H5=ISG{1239abc} (deg8=2). H5=7*6/2=21, H5=10. H5 is a kplex for k  11. H6=ISG{239abc} (deg1=2). H6=6*5/2=15, H6=8. H6 is a kplex for k  7. H7=ISG{39abc} (deg2=1). H7=5*4/2=10, H7=7. H7 is a kplex for k  3. H8=ISG{9abc} (deg3=1). H8=4*3/2=6, H8=6. H8 is a kplex for k  0. So take out {9abc} and start over. G={12345678} G=8*7/2=28. G=10 G is a kplex for k  18. deg=33322331 H1=ISG{1234567} (deg8=1). H1=7*6/2=21, H1=9. H1 is a kplex for k  12. deg=2223223 1 2 3 4 5 6 7 8 9 a c b SP3 H2=ISG{234567} (deg1=2). H2=6*5/2=15, H2=6. H2 is a kplex for k  9. deg=112223 H3=ISG{34567} (deg2=1). H3=5*4/2=10, H3=4. H3 is a kplex for k  6. deg=01222 H4=ISG{4567} (deg3=0). H4=4*3/2=6, H4=4. H4 is a kplex for k  2. deg=1222 H5=ISG{567} (deg4=1). H5=3*2/2=3, H5=3. H5 is a kplex for k  0. deg=222 So take out {567} and start over. G={12348} G=5*4/2=10. G=5 G is a kplex for k  5. deg=33220 1 2 3 4 5 6 7 8 9 a c b SP4 H1=ISG{1234} (deg8=0). H1=4*3/2=6, H1=5. H1 is a kplex for k  1. deg=3322 H2=ISG{124} (deg3=2). H2=3*2/2=3, H2=3. H2 is a kplex for k  0. deg=222 This is exactly what we want ! 1234 is a 1plex (missing only 1 edge) and 124 was determined to be a clique (0plex – missing no edges). It’d have been great if 123 had revealed itself as a clique also, and if 89abc had been detected as a 1plex before 9abc was detected as a clique. How might we make progress in these directions? Try returning to remove all degree ties before moving on? We will try that on the next slide?

1 4 2 3 5 6 7 c 9 b a 8 K-plex search on G6 continued G6 k-plex=Subgraph missing  k edges. H a kplex and F a ISG(H), then F is a kplex If H is an ISG, |VH|=h, |EH|=H, H=h(h-1)/2, H is a kplex iff H–Hk. If F is missing an edge, H is missing that edge too (K inherits all H edges). F can’t be missing more edges than H. k-core=Subgraph containing  k edges. If F a kcore ISG of H then H is a kcore H0=G={123456789abc} H0=12*11/2=66. H0=19 H0 is a kplex for k  47 deg=333323334434 is a kcore for k19 Mining all kplexes and kcores. At each step, we [potentially] branch to each of the lowest degree vertices (note, I skipped many of them in this illustration.) We might want kplex and/or kcore structure around a particular vertex. Use SP1, SP2…. E.g., find the kplex and kcore structure around v=1: H1=ISG{12346789abc} (deg5=2). H1=11*10/2=55, H1=17. H1 is a kplex for k  37. deg= 33332234434 is a kcore for k17 H26=ISG{1234789abc} (deg6=2). H26=10*9/2=45, H26=15 H26 is a kplex for k  30. deg= 3333124434 is a kcore for k15 H27=ISG{1234689abc} (deg7=2). H27=10*9/2=45 H27=15 H27 is a kplex for k  30. deg= 3332134434 is a kcore for k15 (H26 and H27 specify removal of 7 and 6 resp. Thus remove both) H2=ISG{123489abc} H2=9*8/2=36 H2=14 H2 is a kplex for k  22. deg= 333224434 is a kcore for k14 H34=ISG{12389abc H34=8*7/2=28 H34=12 H34 is a kplex for k  16. deg= 22324434 is a kcore for k12 1 3 2 4 5 6 7 8 9 a c b SP1 H38=ISG{12349abc} H38=8*7/2=28 H38=13 H38 is a kplex for k  15. deg= 33324434 is a kcore for k13 H348=ISG{1239abc H348=7*6/2=21 H384=10 H384 is a kplex for k  11. deg= 2233334 is a kcore for k10 H341=ISG{2389abc} ( H341=7*6/2=21 H341=10 H341 is a kplex for k  11. deg= 1224434 is a kcore for k10 SPL1(1)=234 SPL2(1)=7c SPL3(1)=569abc SPL4(1)=8 To check 1234 kplex/core status check if there are edges, 23 24 34 (y,y,n). Thus, 123, 124 are 0plexes and 3cores. 134, 234 are 1plexes and 2cores. 1234 is a 1plex and a 5core. H342=ISG{1389abc} H342=7*6/2=21 H342=10 H342 is a kplex for k  11. deg= 1224434 is a kcore for k10 (H341,H342,H38 specify removal of 1,2. Thus remove both) H4=ISG{389abc H4=15 H4=9 H4 is a kplex for k  6. deg= 124434 is a kcore for k9 H5=ISG{89abc H5=5*4/2=10 H5=8 H5 is a kplex for k  2. deg= 24433 is a kcore for k8 1 2 3 4 5 6 7 8 9 a c b SP2 H6=ISG{9abc} (deg7=2) H6=6 H6=6 H6 is a kplex for k  0. deg= 3333 is a kcore for k6 This is what we want. 89abc a 2plex;9abc a 0plex H0=G={1234567} H=21 H=9 H is a kplex for k  11. deg=3323223 is a kcore for k9 H03=G={124567} H=15 H=8 H is a kplex for k  7. deg=333223 is a kcore for k8 H05=G={123467} H=15 H=8 H is a kplex for k  7. deg=332323 is a kcore for k8 To check 12347c kplex/core status, check edges 17 1c 27 2c 37 3c 47 4c 7c (n n n n n y y n n) 12347c=(Comb(6,2)-7)plex=8plex, 7core H06=G={123457} H=15 H=8 H is a kplex for k  7. deg=332323 is a kcore for k8 1 2 3 4 5 6 7 8 9 a c b SP3 H035=G={12467} H=10 H=8 H is a kplex for k  7. deg= 22312 is a kcore for k8 H036=G={12457} H=10 H=8 H is a kplex for k  7. deg= 22322 is a kcore for k8 H0356=G={1247} H=6 H=4 H is a kplex for k  2. deg= 2231 is a kcore for k4 H03567=G={124} H=3 H=3 H is a kplex for k  0. deg= 222 is a kcore for k3 This is what we want. Remove 12489abc H7={3567} H7=6. H7=3 H7 is a kplex for k  3. deg=0222 is a kcore for k3 1 2 3 4 5 6 7 8 9 a c b SP4 H7={567} H7=3. H7=3 H7 is a kplex for k  0. deg=222 is a kcore for k3 1 4 2 3 5 6 7 c 9 b a 8 G6

K-Degree-Difference Community Search on G6: A kDegreeDifference Community of a graph, G, is a subgraph, H, such that ddHIntDegH-ExtDegH  k. Theorem: If hH, ddH-h = ddH – (2idh - edh). So we want to remove h s.t. (2idh – edh) is minimum. H=G= {123456789abc} id= 333323334434 ed= 000000000000 ddH=38 ddH/|VH| = 38/12 = 3.16 Remove 5 H= { 35678} id= 02321 ed= 30012 ddH=2 ddH/|VH| = 2/5 = 0.4 2id-ed=-34630 Remove 3 H= {12346789abc} id= 33333334434 ed= 00001100000 ddH=34 ddH/|VH| = 34/11 = 3.09 2id-ed=66665568868 Remove 6,7 H= { 5678} id= 2321 ed= 0012 ddH=5 ddH/|VH| = 5/4 = 1.2 2id-ed= 4630 Remove 8 H= {123489abc} id= 333224434 ed= 000110000 ddH=26 ddH/|VH| = 26/9 = 2.88 2id-ed=666338868 Remove 4,8 H= { 567} id= 222 ed= 011 ddH=4 ddH/|VH| = 4/3 = 1.33 2id-ed= 433 Clique, so remove 567 and start over with 38 (but it has 0 id) H= {1239abc} id= 2233334 ed= 1101100 ddH=16 ddH/|VH| = 16/7 = 2.28 2id-ed=3365568 Remove 1,2 H= {39abc} id= 13334 ed= 21100 ddH=10 ddH/|VH| = 10/5 = 2.0 2id-ed=05568 Remove 3 H= {9abc} id= 3333 ed= 1101 ddH=9 ddH/|VH| = 9/4 = 2.25 2id-ed=5565 Clique so start over with 12345678 H= {12345678} id= 33232331 ed= 00100002 ddH=17 ddH/|VH| = 17/8 = 2.13 2id-ed=66563660 Remove 8 H= {1234567} id= 3323223 ed= 0010010 ddH=16 ddH/|VH| = 16/7 = 2.28 2id-ed=6636436 Remove 3,6 H= {12457} id= 22312 ed= 11011 ddH=6 ddH/|VH| =6/5 = 1.2 2id-ed=33613 Remove 5 1 3 2 4 5 6 7 8 9 a c b SP1 H= {1247} id= 2231 ed= 1102 ddH=4 ddH/|VH| = 4/4 = 1.0 2id-ed=3360 Remove 7 H= {124} id= 222 ed= 111 ddH=3 ddH/|VH| = 3/3 = 1.0 2id-ed=333 Clique, so start over with 35678 1 4 2 3 5 6 7 c 9 b a 8 G6

Very Simple Weighted SP1 and SP2 K-plex Search on G6 Weighting: 0,1path nbrs of x times 3; 2path nbrs of x times 2; Until all degrees are weighted, then back to actual subgraph degrees H={123456789abc deg999923634438 x=1 H={123456789abc H=15 H=7 kplex k8 deg999923634438 x=1 after cutting 2,3,4 H={123456789abc H=6 H=5 kplex k1 deg999923634438 x=1, after cut 23468 H={123456789abc deg 999923634438 x=2 H={123456789abc H=15 H=7 kplex k8 deg999923634438 x=2 after cutting 2,3,4 H={123456789abc H=6 H=5 kplex k1 deg999923634438 x=2, after cut 23468 H={123456789abc H=3 H=3 0plex deg 222623338861 x=3 after cut 1 (actual subgraph degrees) H={123456789abc deg 99962333886c x=3 H={123456789abc H=6 H=4 2plex deg 99962333886c x=3, after cut 2368 H={123456789abc deg 996946334434 x=4 H={123456789abc H=3 H=3 0plex deg 996946334434 x=4 after cut 2346 UNWEIGHTED Degrees H={123456789abc deg 333323334434 H={123456789abc deg 333669964434 x=5 H={123456789abc H=10 H=5 5plex deg 333669964434 x=5 after cut 34 H={123456789abc H=3 H=3 0plex deg 333123314434x=5 after cut 1 from SG degs 1 3 2 4 5 6 7 8 9 a c b SP1 H={123456789abc deg 333669998834 x=6 H={123456789abc deg 333669998834 x=6 after cut 34 H={123456789abc H=3 H=2 1plex deg 33312333223 x=6 after cut 12 SG degs 211 H={123456789abc deg 333969934434 x=7 H={123456789abc deg 333969998834x=7 after cut 34 H={123456789abc H=3 H=3 0plex deg 333122232234 x=7 after cut 1 SG degs H={123456789abc deg 33334969cc68 x=8 H={123456789abc deg 33334969cc68 x=8 after cut 34 H={123456789abc 2plex deg 333342134433 x=8 after cut12 SG degs 1 2 3 4 5 6 7 8 9 a c b SP2 H={123456789abc deg 33632639cc9c x=9 H={123456789abc H=10 H=8 H a kplex k 2 deg 33632639cc9c x=9 after Cutting 2,3,6 H={123456789abc deg 33632639cc9c x=a H={123456789abc H=10 H=8 H a kplex k 2 deg 33632639cc9c x=a after cut 2,3,6 H={123456789abc deg 33632336cc9c x=b H={123456789abc H=6 H=6 H a kplex k 0 deg 33632639cc9c x=b after cut 2,3,6 1 2 3 4 5 6 7 8 9 a c b SP3 H={123456789abc deg 66932336ccpc x=c H={123456789abc H=6 H=6 H a kplex k 0 deg 66932336cc9c x=c after cut 2,3,6 By weighting the initial round we have gotten nearly perfect information for this example (G6). The weightings, 3 and 2, were arbitrarily chosen but worked here. In general, one should devise a formula to determine them. Also we could weight SP3 and etc. as well? If we have paid the price of constructing SPk k>1, this is a much simpler way to do it, as compared to the Clique Percolation method of Palla (next slide). 1 2 3 4 5 6 7 8 9 a c b SP4 1 4 2 3 5 6 7 c 9 b a 8 G6

G7 Very Simple Weighted SP1 k-plex Search on G7 Weighting: 0,1path nbrs of x times 1; 2path nbrs of x times 0; 1 2 1 3 1 2 1 3 2 1 4 5 1 5 2 1 6 2 1 7 2 1 8 2 1 9 2 2 1 3 2 1 2 1 2 3 1 4 2 1 5 5 2 1 3 6 2 1 3 2 7 1 8 2 1 4 9 2 1 3 3 1 4 3 1 4 2 3 1 6 3 1 4 3 1 6 SP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 6 2 1 9 3 1 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5 1 2 3 H=1234567890123456789012345678901234 H=561 H=77 kplx k484 D g9a63444523125222223222533243446bg kcore k77 Cut 123: 1 2 3 H=1234567890123456789012345678901234 H=120 H=38 kplx k82 D 9685322452322522222322243323334367 kcore k38 Cut 23: 1 2 3 H=1234567890123456789012345678901234 H=55 H=26 kplx k24 D 6675322452322522222322223323334344 kcore k26 Cut 24: 1 2 3 H=1234567890123456789012345678901234 H=15 H=12 kplx k3 D 5454322422322422222322223323334344 kcore k12 Cut 2: 1 2 3 H=1234567890123456789012345678901234 H=10 H=10 kplx k0 D 4444322422322422222322223323334344 kcore k10 {1,2,3,4, 14} is a clique. {1,2,3,4,9,14} is a 3plex. 2 3 4 4 4 5 5 6 6 6 6 7 8 9 10 11 12 13 15 5 5 1 4 8 2 6 1 3 6 8 0 5 7 9 1 3 5 8 0 2 4 9 2 5 7 1 4 8 2 8 9 5 Cut0: 1 2 3 H=5678901235678901235678901 H=21 H=4 kplx k17 D 2330102000020000002111011 kcore k4 Cut 1 leaves 25 only. 1 2 3 H=56789012356789012345678901234 D 232031200222021202533232435af Cut012:1 2 3 H=56789012356789012345678901234 H=55 H=19 kplx k36 D 20203120022202120253323233456 kcore k19 1 2 3 H=89023568901235678901 H=19 H=4 kplex k15 D 01000000000002010011 kcore k4 Cut03: 1 2 3 H=56789012356789012345678901234 H=6 H=4 kplx k2 D 20203120022202120223323233222 kcore k6 {24,32,33,34} is a 2plex G7 Cut0: 2 3 H=89023568901235678901 H=19 H=4 kplex k15 D 01000000000002010001 kcore k4 Cut 0 leaves {9,31} as a 0plex 1 2 3 H=5678901235678901235678901 D 2330102000020000002111011 1 2 3 H=89023568901235678901 H=17 H=2 kplex k15 D 01000000000002010011 kcore k2 Cut 0 leaves {27,30} as a 0plex Cut01: 1 2 3 H=5678901235678901235678901 H=15 H=6 kplx k9 D 2330102000020000000111011 kcore k6 Cut0: 1 2 3 H=5678901235678901235678901 H=10 H=6 kplx k4 D 2330102000020000000111011 kcore k6 {5,6,7,11,17} is a 4plex 1 2 3 H=89023568901235678901 H=14 H=0 kplex k14 D 0100000000000201001kcore k0 no edges left 1 2 3 H=89023568901235678901 D 01000000000002111011 The expected communities are mostly not detected as kplexes or kcores. Cut0: 1 2 3 H=5678901235678901235678901 H=21 H=4 kplx k17 D 2330102000020000002111011 kcore k4 1 2 3 4 5 6 01234567890123456789012345678901234567890123456789012345678901234 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@#$ (Symbols for base 65 )

ISG EdgeCount kplex Search Alg on G8 G8 is a graph of word associations starting from the word, BRIGHT using USF Free Association. An edge, AB, means some people associate the word B to word A. We try to determine the 4 categories; Intelligence, Astronomy, Light, Colors . 1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 1 2 3 4 5 H = 123456789012345678901234567890123456789012345678901234 H=1431 H=197 kplex k1234 Deg 44444bb5656h9747c3c864fag4a386e4546534685768353534965j kcore k197 Cut0-9 1 2 3 4 5 H = 123456789012345678901234567890123456789012345678901234 H=45 H=22 kplex k13 Deg 444442456565974733386446544386545465346857683535349656 kcore k22 Cut234 1 2 3 4 5 H = 123456789012345678901234567890123456789012345678901234 H=10 H=8 kplex k2 Deg 444442456562974733286444344386345465346857683535349654 kcore k8 So {12,24,25,31,54}={sun,yellow,color,red,bright} is a 2plex Attempt 2: Remove bright, double the weight of nbrs of 12 (vertex if max degree) 1 2 3 4 5 H = 12345678901234567890123456789012345678901234567890123 H=1431 H=197 kplx k1234 44444ba5645g9746b2b864f9f49386d4545423675767353534965 Cut 1-9 1 2 3 4 5 H = 12345678901234567890123456789012345678901234567890123 H=1431 H=197 kplex k1234 44484mka68agie4cm2b8c4fif49386d454542367576e356a349c5 G8 Cut 3 1 2 3 4 5 H = 1234567890123456789012345678901234567890123456789012 H=1431 H=197 kplex k1234 44484664684c66467238444444938634545423675764356334935 1 6 1 2 7 1 3 9 1 4 7 1 5 4 1 6 7 1 7 2 1 8 3 1 9 2 2 1 8 2 1 6 2 1 4 3 2 1 5 2 4 1 5 2 1 6 6 2 1 4 2 7 1 8 2 1 3 9 2 1 8 3 1 6 SP1 2 1 3 4 6 5 7 9 8 10 11 12 14 13 15 17 16 18 19 20 22 21 23 25 24 26 28 27 29 30 31 32 34 33 35 37 36 38 39 40 42 41 43 45 44 46 48 47 49 50 51 52 54 53 1 4 2 1 4 3 1 4 4 1 5 1 4 6 1 7 1 8 1 5 9 1 6 1 5 3 1 4 2 3 1 4 3 1 5 3 4 1 5 3 1 6 6 3 1 5 3 7 1 8 3 1 4 9 3 1 6 4 1 8 4 1 5 2 4 1 7 3 4 1 6 4 1 8 5 4 1 3 6 4 1 5 7 4 1 3 8 4 1 5 9 4 1 3 5 1 4 5 1 9 2 5 1 6 5 3 1 4 5 1 9 1 Scientist 2 Science 3 Astronomy 4 Earth 5 Space 6 Moon 7 Star 8 Ray 9 Intelligent 10 Golden 11 Glare 12 Sun 13 Sky 14 Moonlight 15 Eyes 16 Sunshine 17 Light 18 Lit 19 Dark 20 Brown 21 Tan 22 Orange 23 Blue 24 Yellow 25 Color 27 Black 26 Gray 28 Race 29 White 30 Green 32 Crayon 31 Red 33 Pink 35 Flashlight 34 Velvet 36 Glow 38 Gifted 37 Dim 39 Genius 40 Smart 41 Inventor 43 Brilliant 42 Einstein 44 Shine 46 Telescope 45 Laser 47 Horizon 48 Sunset 49 Ribbon 50 Violet 51 Purple 52 Beam 53 Night 54 Bright

SP2 1 3 2 4 6 5 7 8 10 9 12 11 14 13 16 15 19 18 17 20 22 21 24 23 27 26 25 29 28 30 31 32 34 33 35 37 36 39 38 40 41 43 42 45 44 47 46 48 50 49 51 53 52 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SP1 and SP2 for G8 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 4 4 4 4 4 b a 5 6 4 5 g 9 7 4 6 b 2 b 8 6 4 f 9 f 4 9 3 8 6 d 4 5 4 5 4 2 3 6 7 5 7 6 7 3 5 3 5 3 4 9 6 5 SP1 3 2 5 4 7 6 8 10 9 11 13 12 15 14 16 18 17 20 19 21 22 24 23 26 25 28 27 30 29 32 31 34 33 36 35 37 38 40 39 42 41 43 44 46 45 47 49 48 50 52 51 53

Very Simple Weighted SP1 and SP2 K-plex Search on G8 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 G8 Weighting 444 0,1path neighbors (12012) times 5 334 2 path nbrs (39893) times 3 00244105845697461218645954938634545429855587353534965 next cut<18 221 11 1 1 1 1 1 1 1 13231 12345678901234567890123456789012345678901234567890123 x=1 00244105845697461218645954938634545429855587353534965 instead cut<19 221 11 1 1 1 1 1 1 1 13231 12345678901234567890123456789012345678901234567890123 x=1 This gives C0={1,2,9,39,40,41,42,43} which is exactly the Intelligence Class except that v=38 (gifted) is missing. It is a kplex k8 (not that strong of a community!) 00244105845697461218645954938634545429855587353534965 221 11 1 1 1 1 1 1 1 13231 12345678901234567890123456789012345678901234567890123 x=1 Within the Intelligence Class this is the 1plex, C1={1, 2,40,41,42} ( only edge missing is (2,40) ) with C1-degrees: 4 3 3 4 4 Thus if we cut next using C1-degrees (cut 2,40) leaves the clique (0plex) C2={1,41,42} Cutting C0 and starting over: 44544105645697461218645954938634545421675766353534965 G-C0 degs 11 1 1 1 1 1 1 12345678901234567890123456789012345678901234567890123 x=3 Weighting 0,1path neighbors (367) times 5 1111445 2 path nbrs (452347483) times 3 44522505645887163218645954938634545421675768353534965 next cut<10 21155 1422 3 1 1 1 1 1 1 1 12345678901234567890123456789012345678901234567890123 x=3 44522505645887163218645954938634545421675768353534965 next cut<12 21155 1422 3 1 1 1 1 1 1 1 12345678901234567890123456789012345678901234567890123 x=3 This gives C2={3,4,5,6,7, 12,13,14,15,17,23,25,31,44, 48, 53} Whereas, Astronomy is 3,4,5,6,7,8,10,11,12,13,14,16,17, 44,45,46,47,48,52,53 so, not a good fit! On the next slide we try again with replacement but using as starting vertex, the remaining vertex of highest degree.

Very Simple Weighted SP1 and SP2 K-plex Search on G8 Continued 1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 44444105645697461218645954938634545423675767353534965 11 1 1 1 1 1 1 12345678901234567890123456789012345678901234567890123 With replacement but using as starting vertex, the remaining vertex of highest degree (first, v=12). Weighting 0,1 SP nbrs times 5 2 SP nbrs times 3 44202505605655205634025554734894545823675785955594705 cut<20 121552 22143135 3231441 2 213 11 13 112 231 12345678901234567890123456789012345678901234567890123 x=12 G8 44202505605655205634025554734894545823675785955594705 cut<20 121552 22143135 3231441 2 213 11 13 112 231 12345678901234567890123456789012345678901234567890123 x=12 11111 11 44444 55 Astronomy is 345678 01234 67 45678 23 Weighting 0,1 SP nbrs times 6 2 SP nbrs times 3 44444105645697461218645954938634545423675767353534965 11 1 1 1 1 1 1 12345678901234567890123456789012345678901234567890123 44242600640642266634620404734864545223675782958094865 cut<30 121663 23954136 3231353 2 212 11 14 113 131 Astronomy is 345678 01234 67 45678 23 Weighting 0,1 SP nbrs times 6 2 SP nbrs times 1 1234567890123456789012345678901234567890123456789012 5 astronomy vertices missing (3,5,45,46,53} and 2 non-astronomy included {21,24} 44444105645697461218640454488684045423675767353534465 11 1 2 1 143 95125143723 25 12345678901234567890123456789012345678901234567890123 x=25 Weighting 0,1 SP nbrs times 6 Colors is 5 012345678901234 901 4 colors missing but zero non-colors included. 44444ba5645g9746b2b864f9f49386d4545423675767353534965 44444105645697461218645954938634545423675767353534965 11 1 1 1 1 1 1 12345678901234567890123456789012345678901234567890123 x=1

APPENDIX: G8 1 Scientist 3 Astronomy 2 Science 5 Space 4 Earth 7 Star 6 Moon 8 Ray 10 Golden 9 Intelligent 12 Sun 11 Glare 14 Moonlight 13 Sky 15 Eyes 17 Light 16 Sunshine 19 Dark 18 Lit 21 Tan 20 Brown 22 Orange 23 Blue 25 Color 24 Yellow 26 Gray 28 Race 27 Black 29 White 30 Green 32 Crayon 31 Red 33 Pink 35 Flashlight 34 Velvet 36 Glow 37 Dim 39 Genius 38 Gifted 42 Einstein 41 Inventor 40 Smart 44 Shine 43 Brilliant 46 Telescope 45 Laser 48 Sunset 47 Horizon 49 Ribbon 51 Purple 50 Violet 53 Night 52 Beam 54 BRIGHT 1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 Fortunato: A graph of word assoc. starting from BRIGHT. It builds on U. S. Florida Free Association. An edge between words A and B indicates that some people associate B to the word A. 4 categories Intelligence, Astronomy, Light, Colors.“bright" is related to all. e.g. “dark" is in Colors and Light. For overlapping communities introduce a further variable, the membership of vertices in different communities, which enormously increases the number of possible covers wrt standard partitions. Clique Percolation Method (Palla) based on; internal edges of a community are likely to form cliques due to their high density. On the other hand, it is unlikely that intercommunity edges form cliques. Palla used term k-clique to indicate a complete graph with k vertices (k-clique is different from the n-clique). Two k-cliques are adjacent if they share k-1 vertices. The union of adjacent k-cliques is a k-clique chain. Two k-cliques are connected if they are part of a k-clique chain. Finally, a k-clique community is the largest connected subgraph obtained by the union of a k-clique and of all k-cliques which are connected to it (a k-clique community is identified by making a k-clique roll over adjacent k-cliques, where rolling means rotating a k-clique about the k vertices it shares with any adjacent k-clique.) k-clique communities can share vertices, so they can be overlapping. May be vertices belonging to non-adjacent k-cliques, reached by different paths and end up in different clusters. Unfortunately, there are also vertices that cannot be reached by any k-clique, like, e.g. vertices with degree one. In order to find k-clique communities, one searches 1st for maximal cliques. Then a clique-clique overlap matrix O is built, which is an nc by nc matrix (nc=#of cliques). Oij is the number of vertices shared by cliques i ,j . To find k-cliques, keep entries of O  k-1, set others to 0 and find connected components of the resulting matrix. Detecting maximal cliques is known to require a running time that grows exponentially with the size of the graph. However, the authors found that, for the real networks they analyzed, the procedure is quite fast, due to the fairly limited number of cliques, and that (sparse) graphs with up to 10^5 vertices can be analyzed in a short time. 1 5 1 6 1 2 7 1 3 9 1 4 7 1 5 4 1 6 7 1 7 2 1 8 3 1 9 2 2 1 8 2 1 6 2 1 4 3 2 1 5 2 4 1 5 2 1 6 6 2 1 4 2 7 1 8 2 1 3 9 2 1 8 3 1 6 SP1 2 1 3 4 6 5 7 9 8 10 11 12 14 13 15 17 16 18 19 20 22 21 23 25 24 26 28 27 29 30 31 32 34 33 35 37 36 38 39 40 42 41 43 45 44 46 48 47 49 50 51 52 54 53 1 4 2 1 4 3 1 4 4 1 5 1 4 6 1 7 1 8 1 5 9 1 6 3 1 4 2 3 1 4 3 1 5 3 4 1 5 3 1 6 6 3 1 5 3 7 1 8 3 1 4 9 3 1 6 4 1 8 4 1 5 2 4 1 7 3 4 1 6 4 1 8 5 4 1 3 6 4 1 5 7 4 1 3 8 4 1 5 9 4 1 3 5 1 4 5 1 9 2 5 1 6 5 3 1 4 5 1 9 1 Scientist 2 Science 3 Astronomy 4 Earth 5 Space 6 Moon 7 Star 8 Ray 9 Intelligent 10 Golden 11 Glare 12 Sun 13 Sky 14 Moonlight 15 Eyes 16 Sunshine 17 Light 18 Lit 19 Dark 20 Brown 21 Tan 22 Orange 23 Blue 24 Yellow 25 Color 27 Black 26 Gray 28 Race 29 White 30 Green 32 Crayon 31 Red 33 Pink 35 Flashlight 34 Velvet 36 Glow 38 Gifted 37 Dim 39 Genius 40 Smart 41 Inventor 43 Brilliant 42 Einstein 44 Shine 46 Telescope 45 Laser 47 Horizon 48 Sunset 49 Ribbon 50 Violet 51 Purple 52 Beam 53 Night 54 Bright

SG Clique Mining 1,2 1,1 key 1,3 1,5 1,4 1,7 1,6 2,2 2,1 2,3 2,5 2,4 2,7 2,6 3,2 3,1 3,4 3,3 3,5 3,7 3,6 4,2 4,1 4,4 4,3 4,7 4,6 4,5 5,2 5,1 5,4 5,3 5,6 5,5 5,7 6,2 6,1 6,4 6,3 6,6 6,5 7,1 6,7 7,2 7,4 7,3 7,6 7,5 7,7 PE 1 2 4 3 7 6 G3 5 K=2: 2Cliques (2 vertices): 12 13 14 16 23 24 34 56 67 Find endpts of each edges (Int((n-1)/7)+1, Mod(n-1,7) +1) 1 2 4 3 6 G2 7 5 key 1,1 1,3 1,2 1,5 1,4 1,6 2,1 1,7 2,3 2,2 2,5 2,4 2,6 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 4,1 4,3 4,2 4,5 4,4 4,7 4,6 5,2 5,1 5,3 5,5 5,4 5,7 5,6 6,2 6,1 6,3 6,4 6,6 6,5 6,7 7,2 7,1 7,3 7,4 7,6 7,5 7,7 E 1 EU 1 1 2 4 3 6 5 8 7 10 9 20 30 40 C 1 CU 1 6 k=3: 123 124 134 234 k=4: 1234 (123 124 234 are cliques) 123,1341234. 123.2341234. 124,1341234. 124, 2341234. 134,2341234. 1234 only 4-clique Using the EdgeCount thm: on C={1,2,3,4}, CU=C&EU C is a clique since ct(CU)=comb(4, 2)=4!/2!2!=6 have 124CS3 PE(1,4)=1 134CS3 Have 123CS3 PE(2,3)=1 234CS3 Have k=2: E=12 13 14 16 23 24 34 56 57 67. already have 567 PE(2,3)=1 So 123CS3 PE(2,4)=1 124CS3 PE(2,6)=0 PE(6,7)=1 567CS3 PE(1,7)=0 PE(1,5)=0 PE(2,4)=1 1234CS4 Have 1234 k=3: 123 124 134 234 567 EC, requires counting 1’s in mask pTree of each Subgraph (or candidate Clique, if take the time to generate the CCSs – but then clearly the fastest way to finish up is simply to lookup the single bit position in E, i.e., use EC). EdgeCount Algorithm (EC): |PUC| = (k+1)!/(k-1)!2! then CCCS The SG alg only needs Edge Mask pTree, E, and a fast way to find those pairs of subgraphs in CSk that share k-1 vertices (then check E to see if the two different kth vertices are an edge in G. Again this is a standard part of the Apriori ARM algorithm and has therefore been optimized and engineered ad infinitum!) PE(2,3)=1 234CS3 PE(1,4)=1 134CS3 Have PE(4,8)=1 248CS3 key 1,1 1,3 1,2 1,5 1,4 1,7 1,6 2,2 2,1 1,8 2,4 2,3 2,5 2,6 2,8 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 3,8 4,2 4,1 4,4 4,3 4,6 4,5 4,8 4,7 5,3 5,2 5,1 5,5 5,4 5,7 5,6 6,1 5,8 6,3 6,2 6,4 6,6 6,5 6,8 6,7 7,3 7,2 7,1 7,5 7,4 7,6 7,7 8.1 7,8 8,2 8,4 8,3 8,6 8,5 8.8 8,7 E 1 PE(4,8)=1 348CS3 PE(4,8)=1 12348CS5 have have k=2: 12 13 14 16 23 24 34 56 57 67 18 28 38 48. k=4: 1234 1238 1248 1348 2348 PE(2,3)=1 123CS3 PE(2,4)=1 124CS3 PE(2,8)=1 128CS3 PE(2,6)=0 PE(3,8)=1 138CS3 PE(4,8)=1 148CS3 PE(1,5)=0 PE(1,7)=0 PE(6,8)=0 PE(3,8)=1 238CS3 have PE(6,7)=1 567CS3 have k=5: 12348 = CS5. 1 2 4 3 6 G4 7 5 8 PE(3,8)=1 1238CS4 PE(4,8)=1 1248CS4 PE(3,8)=1 1348CS4 k=3: 123 124 134 234 567 128 138 148 238 248 348 Have PE(2,4)=1 1234CS4 PE(4,8)=1 2348CS4

A kDensityDifference Community (kDenDif) of a graph, G, is a subgraph, H, such that dendifHIntDenH-ExtDenH  k. |VH|=h |EH|=H IntDenH=H/Comb(H,2) = H/(H(H-1)/2) = 2/(H-1) ExtDenH=(G-H) / h(g-h). So, dendifH = 2/(H-1) – (G-H)/(h(g-h)) For xH, Dendif(H-x)= 2/((H-degHx)-1) - (G-(H-degHx))/((h-1)(g-h+1)) = 2 / (H – (degHx+1) - (G-H+degHx) / (hg-hh+2h-g-1) =[ (2hg-2hh+4h-2) – (G-H+degHx)(H-degHx-1) ] / (H-degHx-1)(hg-hh+2h-g-1) Theorem: If hH, dendifH-h = dendifH – (2idh - edh). So we want to remove h s.t. (2idh – edh) is minimum. 1 3 2 4 5 6 7 8 9 a c b SP1 1 4 2 3 5 6 7 c 9 b a 8 G6

ELEMENTS OF COMMUNITY DETECTION (Fortunato): The identification of structural clusters is possible only if graphs are sparse, i. e. if the number of edges m is of the order of the number of nodes n of the graph. If m>>n, the distribution of edges among the nodes is too homogeneous for communities to make sense. In this case the problem turns into something rather different, close to data clustering, which requires concepts and methods of a different nature. The main difference: while communities in graphs are related to the concept of edge density (inside versus outside the community), in data clustering communities are sets of points which are “close" to each other, with respect to a measure of distance or similarity, defined for each pair of points. We can relax the notion of cliques to subgraphs which are still clique-like (use properties related to reachability), i. e. to the existence (and length) of paths between vertices. An n-clique is a maximal subgraph such that the distance of each pair of its vertices is not larger than n. For n=1 it’s a clique, so each geodesic (shortest path) has length 1. This definition, more flexible than that of clique, still has some limitations, deriving from the fact that the geodesic paths need not run on the vertices of the subgraph at study. The consequences: First, the diameter of the subgraph may exceed n , even if in principle each vertex of the subgraph is less than n steps away from any of the others. Second, the subgraph may be disconnected, which is not consistent with the notion of cohesion one tries to enforce. There are two possible solutions, the n-clan and the n-club. An n-clan is an n-clique whose diameter is not larger than n, i.e. a subgraph such that the distance computed over shortest paths within the subgraph, does not exceed n. An n-club is a maximal subgraph of diameter n. An n-clan is a maximal n-clique. An n-club is maximal under the constraint imposed by the length of the diameter. The example below is a network of karate club members, a well-known graph used as a benchmark to test community detection algs, consisting of 34 vertices, the members of a karate club in the United States, who were observed during a period of three years. Edges connect individuals who were observed to interact outside the activities of the club. A conflict between the president and the instructor led to the fission of the club in two separate groups (indicated by squares and circles). The question is whether from the original network structure it is possible to infer the composition of the two groups. One can distinguish two aggregations, one around vertices 33 and 34 (34 is the president), the other around vertex 1 (the instructor). One can also identify several vertices lying between the two main structures, like 3, 9, 10; such vertices are often missclassified by community detection methods 1 2 1 3 1 2 1 3 2 1 4 5 1 5 2 1 6 2 1 7 2 1 9 2 2 1 3 2 1 2 1 2 3 1 4 2 1 5 5 2 1 3 6 2 1 3 2 7 1 8 2 1 4 9 2 1 3 3 1 4 3 1 4 3 2 1 3 1 4 3 1 6 G7 SP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 6 2 1 9 3 1 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5

SP3 1 3 2 4 6 5 7 8 10 9 12 11 14 13 16 15 19 18 17 20 22 21 24 23 27 26 25 29 28 30 31 32 34 33 35 37 36 39 38 40 41 43 42 45 44 47 46 48 50 49 51 53 52 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3

SP2 1 3 2 4 6 5 7 8 10 9 12 11 14 13 16 15 19 18 17 20 22 21 24 23 27 26 25 29 28 30 31 32 34 33 35 37 36 39 38 40 41 43 42 45 44 47 46 48 50 49 51 53 52 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 40 1 2 3 4 5 6 7 8 9 50 1 2 3 4 4 4 4 4 b a 5 6 4 5 g 9 7 4 6 b 2 b 8 6 4 f 9 f 4 9 3 8 6 d 4 5 4 5 4 2 3 6 7 5 7 6 7 3 5 3 5 3 4 9 6 5 SP1 3 2 5 4 7 6 8 10 9 11 13 12 15 14 16 18 17 20 19 21 22 24 23 26 25 28 27 30 29 32 31 34 33 36 35 37 38 40 39 42 41 43 44 46 45 47 49 48 50 52 51 53

Text Mining using pTrees . DTtf DocTerm DT SR DTPe Data Cube . Doc3 DTPe Term Table: Term P1D1 P1D2 P1D3...P7D1…P7D3 1 1 0 1 ... 0 … 0 9 0 … 0 . . . 1 … 1 . DTPe Term Usage Table: 1 noun verb adj adv …noun 9 adj noun noun adj noun Doc3 Doc2 Doc1 1 DTPe TpTreeSet index (D,P) Positions 1 2 … P1D1 noun adj tf is the +rollup of the DTPe datacube along the position dimension. One can use any measurement or data structure of measurements, e.g., DT tfidf in which each cell has a decimal tfidf, which can be bitsliced directly into whole number bitslices plus fractional bitslices (one for each binary digit to the right of the binary point-no need to shift!) using: MOD(INT(x/(2k),2), e.g., a tfidf =3.5 is k: 3 2 1 0 -1 -2 bit: 0 0 1 1 1 0 3 2 1 .Docs are April apple and an always. all AAPL buy Terms DTtf DocTerm termfreq Data Cube DT tfidf Doc Table: Doc T1 T2 . . . T9 1 .75 0 . . . 1 2 0 1 .25 3 0 0 0 Rating of T=stock at doc date close: 1=sell, 2=hold,3=buy 0=non-stock Term 3 2 1 .Docs are April apple and an always. all $AAPL buy Terms DT SR DocTerm StockRating Cube DT SR bitmap DpTreeSet 1 T2,R=buy T2,R=hold T2,R=sell 1 2 … 9 Term 3 D TDcard P=k k=1..7 DTPe k=1..7 TDRolodexCd 1 2 … 7 Pos 3 D PDcard T=k k=1..9 DTPe k=1..9 PDCd 1 2 … 7 Pos 9 T PT card D=k k=1,2,3 DTPe k=1..3 PTCd DT SR bitslice DpTreeSet 1 T2k2 T2k1 0 . . . DTPe DocTbl DpTreeSet indexed by (T,P)) Position 1 2 3 4 5 6 7 Term an and April are apple 1 1 . . . always all AAPL buy 0 . . . DT tfidf DpTreeSet T1k1 1 T1k0 T1k-1 T1k-2 1 2 3 4 5 6 7 1 1 .Doc are April apple and an always. all AAPL buy ... Term 0 . . . 0 . . . 0 . . . 0 . . . 0 . . . DTPe Data Cube DTPe Position Table Pos T1D1 T1D2 T1D3...T9D1…T9D3 1 1 0 1 ... 0 … 0 7 0 … 0 . . . 1 … 1 . 1 Classical DocTbl DpTreeSet 1 Auth Date Subj1 Subjm 1 0 . . . Term buy DTPe in PpTreeSet index (T,D) Doc3 Doc2 Doc1 Classical Document Table: Doc Auth… Date . . .Subj1 …Subjm 1 1 1/2/13 . . . 0 … 0 2 0 2/2/15 . . . 1 … 0 3 0 3/3/14 . . . 1 … 1 DTPe Document Table: Doc T1P1…T1P7 . . . T9P1…T9P7 1 1 … 0 . . . 0 … 0 2 0 … 0 . . . 1 … 0 3 0 … 0 . . . 1 … 1 Pos

2 1 2 3 2 3 2 1 1 1 1 3 2 0 2 2 SP1=E SPSF2 3 2 3 1 1 1 1 2 2 2 2 0 0 0 0 0 SP2 1 2 3 4 5 6 7 8 9 a b c d e f g 2 1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 SP3 0 1 0 2 1 2 1 1 0 0 0 0 0 0 0 0 SP4 0 2 0 0 2 0 2 2 0 0 0 0 0 0 0 0 SP5 SPA 7 7 7 7 7 7 7 7 3 3 3 3 2 0 2 2 2 1 2 3 2 3 2 1 1 1 1 3 2 0 2 2 SP1=E 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 d 1 e f 1 g 1 1 2 4 3 5 6 8 7 9 b a c f e d g 1 2 3 4 5 6 7 8 9 a b c d e f g G5 1 2 4 3 6 7 5 8 9 a b c e d f g 4 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 SP2 SPSF3 1 1 2 2 1 3 1 3 4 4 1 5 1 5 6 1 6 7 7 1 8 1 8 9 1 9 a a 1 b 1 b c d e f g 1 2 4 3 6 5 8 7 9 a c b d f e g 1 2 3 4 5 6 7 8 9 a b c d e f g How do we know we don’t have to go further (that Diam(G5)=5?)? We really should have continued one more step and then noticed that SPSF6=all pure0 pTrees. Then we could conclude, since all 6paths are non-shortest, no extension of a 6path can be a shortest. Done! SPSF6=all pure0 since 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 SP3 SPSF4 1 1 2 1 2 3 1 3 4 4 1 5 1 5 6 1 6 7 7 1 8 1 8 9 a b c d e f g 1 2 4 3 6 5 8 7 9 a c b d f e g 1 2 3 4 5 6 7 8 9 a b c d e f g SP5(2)=5,7. E5,E7 have only 6, already in SP5(2) 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 SP4 SPSF5 1 2 1 2 3 4 4 1 5 1 5 6 1 6 7 7 1 8 1 8 9 a b c d e f g 1 3 2 4 5 7 6 8 a 9 c b e d g f 1 2 3 4 5 6 7 8 9 a b c d e f g SP5(5)=2,8. E2,E8 have only 4, already in SP5(5) SP5(7)=2,8. E2,E8 have only 4, already in SP5(7) SP5(8)=5,7. E5,E7 have only 6, already in SP5(8) 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 SP5 SPSF5 gives the connectivity partition and can be formed anytime as ORk=1..5SPk. Since we’ve formed it already, we should retain it for that use. Others? I don’t see value in SPSF1-4. 1 2 2 1 3 4 5 5 1 6 7 1 7 8 1 8 9 a b c d e f g 1 3 2 4 7 6 5 8 a 9 c b e d g f