Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.

Similar presentations


Presentation on theme: "Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs."— Presentation transcript:

1 Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

2 1. Centrality measures Within graph theory and network analysis, there are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph. Degree centrality Betweenness centrality Closeness centrality Eigenvector centrality Subgraph centrality We will discuss on the following centrality measures:

3 Degree centrality Degree centrality is defined as the number of links incident upon a node i.e. the number of degree of the node Degree centrality is often interpreted in terms of the immediate risk of the node for catching whatever is flowing through the network (such as a virus, or some information). Degree centrality of the blue nodes are higher

4 Betweenness centrality The vertex betweenness centrality BC(v) of a vertex v is defined as follows: Here σ uw is the total number of shortest paths between node u and w and σ uw (v) is number of shortest paths between node u and w that pass node v Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not.

5 a d b f e c Betweenness centrality σ uw σ uw (v) σ uw /σ uw (v) (a,b) 10 0 (a,d) 111 (a,e) 111 (a,f) 111 (b,d) 111 (b,e) 111 (b,f) 111 (d,e) 100 (d,f) 100 (e,f) 100 Betweenness centrality of node c=6 Betweenness centrality of node a=0 Calculation for node c

6 Hue (from red=0 to blue=max) shows the node betweenness. Betweenness centrality Nodes of high betweenness centrality are important for transport. If they are blocked, transport becomes less efficient and on the other hand if their capacity is improved transport becomes more efficient. Using a similar concept edge betweenness is calculated. http://en.wikipedia.org/wiki/Between ness_centrality#betweenness

7 Closeness centrality The farness of a vortex is the sum of the shortest-path distance from the vertex to any other vertex in the graph. The reciprocal of farness is the closeness centrality (CC). Here, d(v,t) is the shortest distance between vertex v and vertex t Closeness centrality can be viewed as the efficiency of a vertex in spreading information to all other vertices

8 Eigenvector centrality Let A is the adjacency matrix of a graph and λ is the largest eigenvalue of A and x is the corresponding eigenvector then The i th component of the eigenvector x then gives the eigenvector centrality score of the i th node in the network. From (1) Therefore, for any node, the eigenvector centrality score be proportional to the sum of the scores of all nodes which are connected to it. Consequently, a node has high value of EC either if it is connected to many other nodes or if it is connected to others that themselves have high EC -----(1) N×NN×1 |A-λI|=0, where I is an identity matrix

9 Subgraph centrality the number of closed walks of length k starting and ending on vertex i in the network is given by the local spectral moments μ k (i), which are simply defined as the ith diagonal entry of the kth power of the adjacency matrix, A: Closed walks can be trivial or nontrivial and are directly related to the subgraphs of the network. Subgraph Centrality in Complex Networks, Physical Review E 71, 056103(2005)

10 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 M = M uv = 1 if there is an edge between nodes u and v and 0 otherwise. Subgraph centrality Adjacency matrix

11 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 4 2 2 3 2 1 1 0 0 0 0 0 0 1 2 4 3 2 3 1 1 0 0 0 0 0 0 1 2 3 5 2 3 1 0 1 0 0 0 0 0 0 3 2 2 3 2 1 1 0 0 0 0 0 0 1 2 3 3 2 5 0 1 0 0 1 0 0 0 0 1 1 1 1 0 2 0 0 1 0 0 0 0 0 1 1 0 1 1 0 2 0 1 0 0 1 1 0 0 0 1 0 0 0 0 4 2 1 1 2 2 0 0 0 0 0 0 1 1 2 4 0 1 2 2 0 0 0 0 0 1 0 0 1 0 2 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 1 2 2 1 0 4 2 0 0 0 0 0 0 0 1 2 2 1 1 2 3 M 2 = (M 2 ) uv for u  v represents the number of common neighbor of the nodes u and v. local spectral moment Subgraph centrality

12 Table 2. Summary of results of eight real-world complex networks.

13 Hierarchical Clustering AtpBAtpA AtpGAtpE AtpAAtpH AtpBAtpH AtpGAtpH AtpEAtpH Data is not always available as binary relations as in the case of protein-protein interactions where we can directly apply network clustering algorithms. In many cases for example in case of microarray gene expression analysis the data is multivariate type. An Introduction to Bioinformatics Algorithms by Jones & Pevzner

14 We can convert multivariate data into networks and can apply network clustering algorithm about which we will discuss in the next class. If dimension of multivariate data is 3 or less we can cluster them by plotting directly. Hierarchical Clustering An Introduction to Bioinformatics Algorithms by Jones & Pevzner

15 However, when dimension is more than 3, we can apply hierarchical clustering to multivariate data. In hierarchical clustering the data are not partitioned into a particular cluster in a single step. Instead, a series of partitions takes place. Some data reveal good cluster structure when plotted but some data do not. Data plotted in 2 dimensions Hierarchical Clustering

16 Hierarchical clustering is a technique that organizes elements into a tree. A tree is a graph that has no cycle. A tree with n nodes can have maximum n-1 edges. A Graph A tree Hierarchical Clustering

17 Hierarchical Clustering is subdivided into 2 types 1.agglomerative methods, which proceed by series of fusions of the n objects into groups, 2.and divisive methods, which separate n objects successively into finer groupings. Agglomerative techniques are more commonly used Data can be viewed as a single cluster containing all objects to n clusters each containing a single object. Hierarchical Clustering

18 Distance measurements Euclidean distance between g 1 and g 2 Hierarchical Clustering

19 An Introduction to Bioinformatics Algorithms by Jones & Pevzner In stead of Euclidean distance correlation can also be used as a distance measurement. For biological analysis involving genes and proteins, nucleotide and or amino acid sequence similarity can also be used as distance between objects Hierarchical Clustering

20 An agglomerative hierarchical clustering procedure produces a series of partitions of the data, P n, P n-1,......., P 1. The first P n consists of n single object 'clusters', the last P1, consists of single group containing all n cases. At each particular stage the method joins together the two clusters which are closest together (most similar). (At the first stage, of course, this amounts to joining together the two objects that are closest together, since at the initial stage each cluster has one object.) Hierarchical Clustering

21 An Introduction to Bioinformatics Algorithms by Jones & Pevzner Differences between methods arise because of the different ways of defining distance (or similarity) between clusters. Hierarchical Clustering

22 How can we measure distances between clusters? Single linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = Min { d(i,j) : Where object i is in cluster A and object j is cluster B} Hierarchical Clustering

23 Complete linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = Max { d(i,j) : Where object i is in cluster A and object j is cluster B} Hierarchical Clustering

24 Average linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = T AB / ( N A * N B ) Where T AB is the sum of all pair wise distances between objects of cluster A and cluster B. N A and N B are the sizes of the clusters A and B respectively. Total N A * N B edges Hierarchical Clustering

25 Average group linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = = Average { d(i,j) : Where observations i and j are in cluster t, the cluster formed by merging clusters A and B } Total n(n-1)/2 edges Hierarchical Clustering

26 Alizadeh et al. Nature 403: 503-511 (2000). Hierarchical Clustering

27 Classifying bacteria based on 16s rRNA sequences.

28 Line Graphs Given a graph G, its line graph L(G) is a graph such that each vertex of L(G) represents an edge of G; and two vertices of L(G) are adjacent if and only if their corresponding edges share a common endpoint ("are adjacent") in G. Graph GVertices in L(G) constructed from edges in G Added edges in L(G) The line graph L(G) http://en.wikipedia.org/wiki/Line_graph

29 Line Graphs RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs By JOHN W. RAYMOND1, ELEANOR J. GARDINER2 AND PETER WILLETT2 THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002 The above paper has introduced a new graph similarity calculation procedure for comparing labeled graphs. The chemical graphs G1 and G2 are shown in Figure a, and their respective line graphs are depicted in Figure b.

30 Line Graphs Detection of Functional Modules From Protein Interaction Networks By Jose B. Pereira-Leal,1 Anton J. Enright,2 and Christos A. Ouzounis1 PROTEINS: Structure, Function, and Bioinformatics 54:49–57 (2004) Transforming a network of proteins to a network of interactions. a: Schematic representation illustrating a graph representation of protein interactions: nodes correspond to proteins and edges to interactions. b: Schematic representation illustrating the transformation of the protein graph connected by interactions to an interaction graph connected by proteins. Each node represents a binary interaction and edges represent shared proteins. Note that labels that are not shared correspond to terminal nodes in (a) A star is transformed into a clique


Download ppt "Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs."

Similar presentations


Ads by Google