Download presentation
Presentation is loading. Please wait.
Published byBlanche Watkins Modified over 6 years ago
1
Assessing Hierarchical Modularity in Protein Interaction Networks
Young-Rae Cho, Woochang Hwang, Murali Ramanathan and Aidong Zhang State University of New York at Buffalo
2
Scale-free & Modular Networks
Scale-free Networks Power Law degree distribution4,1: P(k) ~ k –γ with 2 < γ < 3 Disassortativity4,1,5 Frequent connections between a hub and a peripheral node Infrequent connections between two hubs Small-world property11,1: Average geodesic path length l ~ log log N Modular Networks3,8,1 Here we map the samples and the genes into 2-dimensional space. As we can see, the genes has some dense area, if we remove the outliers and zoom in the dense area, we will find detailed dense area and some outliers. So the gene distribution has some hierarchical-dense structure. But the samples are very sparse in high-dimensional space. Even mapped into 2-dimensional space, there are no class structure can be detected. We can partition the sample by many hyperplane, but cannot judge which partition is better. So the the techniques that are effective for gene-based analysis are not adequate for analyzing samples. Effective and efficient sample-based analysis remains a challenging problem. Bridges Bridging Nodes
3
Bridge Measurement Global Measurement Local Measurement
Betweenness Centrality7: Local Measurement Clustering Coefficient11: Neighbor Significance: , Similarity of Nodes: Combined Bridge Measurement Bridging Nodes: Bridges: The existing methods of selecting informative genes to cluster samples fall into two major categories: supervised analysis and unsupervised analysis. The supervised approach assumes that additional information is attached to some (or all) data, for example, that biological samples are labeled as diseased vs. normal. The most famous supervised method is the neighborhood analysis method which is a science paper published in 1999 and it stimulate the research of sample phenotype detection. Other supervised method include: tree harvesting, support vector machines, decision tree method, genetic algorithm, the artificial neural networks, and a variety of ranking based methods. The basic steps of these supervised methods is first select a subset of samples as the training set, using the phenotypes as a reference to select a small percent of informative genes which manifest the phenotype partition within the training samples. Finally, the whole set of samples are grouped according to the selected informative genes.
4
Hierarchical Modularization
Constraint ∆C = C(G) – C(G’) ≥ 0 if v is a bridging node where V’ = V – {v} C(G): Average clustering coefficient of the nodes in graph G. C(G’): Average clustering coefficient of the nodes in the reduced graph G’, in which v with the highest BR(v) is removed. Algorithm Successive Removal of Bridging Nodes if ∆C < 0 Successive Removal of Bridges No if G is split into Gi’ G Gi’ > θsize Gi’ Therefore, it is natural to ask “Can we find both the phenotypes and the informative genes automatically at the same time?” For example, if the samples’ phenotypes are unknown, can we correctly distinguish Three classes of samples, as well as output gene1 ~gene3 as informative genes? Unsupervised sample clustering is much more complex than supervised manner. Yes if G is split into Gi’ Replace each Gi’ to G
5
Topological Analysis Data: Core protein interaction data of Saccharomyces cerevisiae from DIP10,2 Method: Remove v with the highest BR(v) and compute C(G), iteratively. Results: I & II (~ 30%): Bridging and interconnecting node removal zone. III (~ 30%): Core node removal zone. IV (~ 40%): Peripheral node removal zone.
6
Biological Analysis Data: Core protein interaction data of Saccharomyces cerevisiae from DIP10,2 Method: Remove a set S of nodes v with the highest BR(v) and compute the proportion of lethal proteins in S, iteratively. Results: Bridging and interconnecting nodes are less lethal than core nodes. Bridging and interconnecting nodes are more lethal than peripheral nodes.
7
Modularization Results
6,9
8
References Barabasi, A.-L. and Oltvai, Z. N., Nature Reviews: Genetics (2004) Dean, C. M., Salwinski, L., Xenarios, I. and Eisenberg, D., Molecular and Cellular Proteomics (2002) Hartwell, L. H., Hopfield, J. J., Leibler, S. and Murray, A. W., Nature (1999) Jeong, H., Mason, S. P., Barabasi, A.-L. and Oltvai, Z. N., Nature (2001) Maslov, S. and Sneppen, K., Science (2002) Mewes, H. W., at al., Nucleic Acid Research (2006) Newman, M. E. J., Physical Review E (2001) Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. and Barabasi, A.-L., Science (2002) Ruepp, A., at al., Nucleic Acid Research (2004) Salwinski, L., at al., Nucleic Acid Research (2004) Watts, D. J. and Strogatz, S. H., Nature (1998)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.