Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

Graph Theory Arnold Mesa. Basic Concepts n A graph G = (V,E) is defined by a set of vertices and edges v3 v1 v2Vertex (v1) Edge (e1) A Graph with 3 vertices.
Clustering.
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Review Binary Search Trees Operations on Binary Search Tree
Introduction to Network Theory: Modern Concepts, Algorithms
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Applied Discrete Mathematics Week 12: Trees
Graph & BFS.
Introduction to Bioinformatics Algorithms Clustering.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
Centrality Measures These measure a nodes importance or prominence in the network. The more central a node is in a network the more significant it is to.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
The Shortest Path Problem
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Concept of Line Graphs Topic 3: Introduction.
Lecture 4 1.Protein Function prediction using network concepts 2.Hierarchical Clustering.
Graph Theory Topics to be covered:
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Different centrality measures of.
TCP Traffic and Congestion Control in ATM Networks
Graphs Data Structures and Algorithms A. G. Malamos Reference Algorithms, 2006, S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani Introduction to Algorithms,Third.
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Data Structures & Algorithms Graphs
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Data Structures & Algorithms Graphs Richard Newman based on book by R. Sedgewick and slides by S. Sahni.
Discrete Structures CISC 2315 FALL 2010 Graphs & Trees.
Graph spectral analysis/
How to Analyse Social Network? Social networks can be represented by complex networks.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Chapter 9: Graphs.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
(CSC 102) Lecture 30 Discrete Structures. Graphs.
Lecture 5 Graph Theory prepped by Lecturer ahmed AL tememe 1.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
رياضيات متقطعة لعلوم الحاسب MATH 226. Chapter 10.
Chapter Chapter Summary Graphs and Graph Models Graph Terminology and Special Types of Graphs Representing Graphs and Graph Isomorphism Connectivity.
Graph clustering to detect network modules
School of Computing Clemson University Fall, 2012
Graph theory Definitions Trees, cycles, directed graphs.
Network analysis.
Community detection in graphs
Clustering BE203: Functional Genomics Spring 2011 Vineet Bafna and Trey Ideker Trey Ideker Acknowledgements: Jones and Pevzner, An Introduction to Bioinformatics.
Clustering.
Graph Operations And Representation
3.3 Network-Centric Community Detection
Clustering.
GRAPHS.
Presentation transcript:

Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs

1. Centrality measures Within graph theory and network analysis, there are various measures of the centrality of a vertex within a graph that determine the relative importance of a vertex within the graph. Degree centrality Betweenness centrality Closeness centrality Eigenvector centrality Subgraph centrality We will discuss on the following centrality measures:

Degree centrality Degree centrality is defined as the number of links incident upon a node i.e. the number of degree of the node Degree centrality is often interpreted in terms of the immediate risk of the node for catching whatever is flowing through the network (such as a virus, or some information). Degree centrality of the blue nodes are higher

Betweenness centrality The vertex betweenness centrality BC(v) of a vertex v is defined as follows: Here σ uw is the total number of shortest paths between node u and w and σ uw (v) is number of shortest paths between node u and w that pass node v Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not.

a d b f e c Betweenness centrality σ uw σ uw (v) σ uw /σ uw (v) (a,b) 10 0 (a,d) 111 (a,e) 111 (a,f) 111 (b,d) 111 (b,e) 111 (b,f) 111 (d,e) 100 (d,f) 100 (e,f) 100 Betweenness centrality of node c=6 Betweenness centrality of node a=0 Calculation for node c

Hue (from red=0 to blue=max) shows the node betweenness. Betweenness centrality Nodes of high betweenness centrality are important for transport. If they are blocked, transport becomes less efficient and on the other hand if their capacity is improved transport becomes more efficient. Using a similar concept edge betweenness is calculated. ness_centrality#betweenness

Closeness centrality The farness of a vortex is the sum of the shortest-path distance from the vertex to any other vertex in the graph. The reciprocal of farness is the closeness centrality (CC). Here, d(v,t) is the shortest distance between vertex v and vertex t Closeness centrality can be viewed as the efficiency of a vertex in spreading information to all other vertices

Eigenvector centrality Let A is the adjacency matrix of a graph and λ is the largest eigenvalue of A and x is the corresponding eigenvector then The i th component of the eigenvector x then gives the eigenvector centrality score of the i th node in the network. From (1) Therefore, for any node, the eigenvector centrality score be proportional to the sum of the scores of all nodes which are connected to it. Consequently, a node has high value of EC either if it is connected to many other nodes or if it is connected to others that themselves have high EC -----(1) N×NN×1 |A-λI|=0, where I is an identity matrix

Subgraph centrality the number of closed walks of length k starting and ending on vertex i in the network is given by the local spectral moments μ k (i), which are simply defined as the ith diagonal entry of the kth power of the adjacency matrix, A: Closed walks can be trivial or nontrivial and are directly related to the subgraphs of the network. Subgraph Centrality in Complex Networks, Physical Review E 71, (2005)

M = M uv = 1 if there is an edge between nodes u and v and 0 otherwise. Subgraph centrality Adjacency matrix

M 2 = (M 2 ) uv for u  v represents the number of common neighbor of the nodes u and v. local spectral moment Subgraph centrality

Table 2. Summary of results of eight real-world complex networks.

Hierarchical Clustering AtpBAtpA AtpGAtpE AtpAAtpH AtpBAtpH AtpGAtpH AtpEAtpH Data is not always available as binary relations as in the case of protein-protein interactions where we can directly apply network clustering algorithms. In many cases for example in case of microarray gene expression analysis the data is multivariate type. An Introduction to Bioinformatics Algorithms by Jones & Pevzner

We can convert multivariate data into networks and can apply network clustering algorithm about which we will discuss in the next class. If dimension of multivariate data is 3 or less we can cluster them by plotting directly. Hierarchical Clustering An Introduction to Bioinformatics Algorithms by Jones & Pevzner

However, when dimension is more than 3, we can apply hierarchical clustering to multivariate data. In hierarchical clustering the data are not partitioned into a particular cluster in a single step. Instead, a series of partitions takes place. Some data reveal good cluster structure when plotted but some data do not. Data plotted in 2 dimensions Hierarchical Clustering

Hierarchical clustering is a technique that organizes elements into a tree. A tree is a graph that has no cycle. A tree with n nodes can have maximum n-1 edges. A Graph A tree Hierarchical Clustering

Hierarchical Clustering is subdivided into 2 types 1.agglomerative methods, which proceed by series of fusions of the n objects into groups, 2.and divisive methods, which separate n objects successively into finer groupings. Agglomerative techniques are more commonly used Data can be viewed as a single cluster containing all objects to n clusters each containing a single object. Hierarchical Clustering

Distance measurements Euclidean distance between g 1 and g 2 Hierarchical Clustering

An Introduction to Bioinformatics Algorithms by Jones & Pevzner In stead of Euclidean distance correlation can also be used as a distance measurement. For biological analysis involving genes and proteins, nucleotide and or amino acid sequence similarity can also be used as distance between objects Hierarchical Clustering

An agglomerative hierarchical clustering procedure produces a series of partitions of the data, P n, P n-1, , P 1. The first P n consists of n single object 'clusters', the last P1, consists of single group containing all n cases. At each particular stage the method joins together the two clusters which are closest together (most similar). (At the first stage, of course, this amounts to joining together the two objects that are closest together, since at the initial stage each cluster has one object.) Hierarchical Clustering

An Introduction to Bioinformatics Algorithms by Jones & Pevzner Differences between methods arise because of the different ways of defining distance (or similarity) between clusters. Hierarchical Clustering

How can we measure distances between clusters? Single linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = Min { d(i,j) : Where object i is in cluster A and object j is cluster B} Hierarchical Clustering

Complete linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = Max { d(i,j) : Where object i is in cluster A and object j is cluster B} Hierarchical Clustering

Average linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = T AB / ( N A * N B ) Where T AB is the sum of all pair wise distances between objects of cluster A and cluster B. N A and N B are the sizes of the clusters A and B respectively. Total N A * N B edges Hierarchical Clustering

Average group linkage clustering Distance between two clusters A and B, D(A,B) is computed as D(A,B) = = Average { d(i,j) : Where observations i and j are in cluster t, the cluster formed by merging clusters A and B } Total n(n-1)/2 edges Hierarchical Clustering

Alizadeh et al. Nature 403: (2000). Hierarchical Clustering

Classifying bacteria based on 16s rRNA sequences.

Line Graphs Given a graph G, its line graph L(G) is a graph such that each vertex of L(G) represents an edge of G; and two vertices of L(G) are adjacent if and only if their corresponding edges share a common endpoint ("are adjacent") in G. Graph GVertices in L(G) constructed from edges in G Added edges in L(G) The line graph L(G)

Line Graphs RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs By JOHN W. RAYMOND1, ELEANOR J. GARDINER2 AND PETER WILLETT2 THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002 The above paper has introduced a new graph similarity calculation procedure for comparing labeled graphs. The chemical graphs G1 and G2 are shown in Figure a, and their respective line graphs are depicted in Figure b.

Line Graphs Detection of Functional Modules From Protein Interaction Networks By Jose B. Pereira-Leal,1 Anton J. Enright,2 and Christos A. Ouzounis1 PROTEINS: Structure, Function, and Bioinformatics 54:49–57 (2004) Transforming a network of proteins to a network of interactions. a: Schematic representation illustrating a graph representation of protein interactions: nodes correspond to proteins and edges to interactions. b: Schematic representation illustrating the transformation of the protein graph connected by interactions to an interaction graph connected by proteins. Each node represents a binary interaction and edges represent shared proteins. Note that labels that are not shared correspond to terminal nodes in (a) A star is transformed into a clique