Gene and Protein Networks II Monday, April 16 2006 CSCI 4830: Algorithms for Molecular Biology Debra Goldberg.

Gene and Protein Networks II Monday, April 16 2006 CSCI 4830: Algorithms for Molecular Biology Debra Goldberg

Outline 1. Recap 2. Confidence assessment, edge prediction (cont’d) 3. Predicting protein function 4. Predicting protein complexes/functional groups 5. Network integration 6. Caveats, cautions, practical issues

Summary of network models Random not grown, low clustering, short distances, Poisson degree distribution Regular (lattice) high clustering, long distances Small world high clustering, short distances Scale-freepower law degree distribution Hierarchical high clustering, modular, power law degree distribution

There is information in a gene’s position in the network We can use this to predict Relationships –Interactions –Regulatory relationships Protein function –Process –Complex / “molecular machine”

Confidence assessment Can use topology to assess confidence if true edges and false edges have different network properties Assess how well each edge fits topology of true network Can also predict unknown relations

Prediction A v-w edge would have a high clustering coefficient v w

Interaction generality Confidence measure for edge based on topology around neighbors. Saito, Suzuki, and Hayashizaki 2002,2003

Confidence assessment Integrate experimental details with local topology –Degree –Clustering coefficient –Degree of neighbors –Etc. Used logistic regression Bader, et al., Nature Biotechnology 2003

The synthetic lethal network has many triangles Xiaofeng Xin, Boone Lab

2-hop predictors for SSL SSL – SSL (S-S) Homology – SSL (H-S) Co-expressed – SSL (X-S) Physical interaction – SSL (P-S) 2 physical interactions (P-P) v w S:Synthetic sickness or lethality (SSL) H:Sequence homology X:Correlated expression P:Stable physical interaction Wong, et al., PNAS 2004

Multi-color motifs S:Synthetic sickness or lethality H:Sequence homology X:Correlated expression P:Stable physical interaction R:Transcriptional regulation Zhang, et al., Journal of Biology 2005

Computationally predicting protein function Homology Machine Learning Graph-theoretic methods

Majority method Consider immediate neighbors “Guilt by association” –Schwikowski, et al., Nature Biotechnology 2001

Neighborhood method How does frequency affect assignment? Consider a given radius –Hishigaki, et al., Yeast 2001

Minimum Cut methods Minimize interactions between proteins with different annotations –Vazquez, et al., Nature Biotech. 2003 –Karaoz, et al., PNAS 2004

Functional flow Use network flow algorithm to “transport” function annotation –Nabieva, et al., Bioinformatics 2005

A Markov Random Field method Function prediction based on – Frequency of each function – # neighbors – # of these neighbors with function in question Functional linkage graph Iterate twice – Letovsky and Kasif, Bioinformatics 2003

Community structure Proteins in a community may be involved in a common process or function Communities are dense subgraphs with sparse interconnections

Hierarchical clustering (1) Using natural edge weights Gene co-expression e.g., Eisen MB, et al., PNAS 1998 from www.medscape.com

Hierarchical clustering (2) Adjacency vector Function cluster: Tong et al., Science 2004 Find drug targets: Parsons et al., Nature Biotechnology 2004

Topological overlap A measure of neighborhood similarity l i,j is 1 if there is a direct link between i and j, 0 otherwise Ravasz, et al., Science 2002

Spectral clustering Compute adjacency matrix eigenvectors Each eigenvector defines a cluster: –Proteins with high magnitude contributions Bu, et al., Nucleic Acids Research 2003 positive eigenvaluenegative eigenvalue

Dense subgraphs Spirin and Mirny, PNAS 2003 –Find fully connected subgraphs (cliques), OR –Find subgraphs that maximize density: 2 m / (n (n-1)) Bader and Hogue, BMC Bioinformatics 2003 –Weight vertices by neighborhood density, connectedness –Find connected communities with high weights

“Betweenness” centrality Consider the shortest path(s) between all pairs of nodes “Betweenness” centrality of an edge is a measure of how many shortest paths traverse this edge Edges between communities have higher centrality Girvan, et al., PNAS 2002

Finding motifs

Motif function and aggregation

Relationships between network data types Distinct data sources generally lead to better inferences. Associations not independent Errors independent

Various methods with varying goals

Incorporating experimental conditions Luscombe, et al., Nature 2004

Party and date hubs Protein interaction network Partition hubs by expression correlation of neighbors Han, et al., Nature 2004

Network connectivity Scale-free networks are: –Robust to random failures –Vulnerable to attacks on hubs Removing hubs quickly disconnects a network and reduces the size of the largest component Albert, et al., Nature 2000

Removing date hubs shatters network into communities Many sub-networks Date Hubs Party Hubs A single main component

Multiple species

Network alignment Across or within species Interaction network and genome sequence e.g., Ogata, et al., Nucleic Acids Research 2000

Bias: Protein abundance Abundant proteins are –more likely to be represented in some types of experiments –More likely to be essential Correlation between degree (hubs) and essentiality disappears or is reduced when corrected for protein abundance Bloom and Adami, BMC Evolutionary Biology 2003

Bias: Degree correlation Anti-correlation of degrees of interacting proteins disappears in un-biased data Coulomb, et al., Proceedings of the Royal Society B 2005 010203040506070 degree k average degree K1 25 20 15 10 5 0 essential non-essential

Data quality and sparseness

No gold standard Insufficient highly-accurate data Gold-standards often used to train and validate Insufficient standardization of procedures

Significance

Final words Network analysis has become an essential tool for analyzing complex systems –There is still much biologists can learn from scientists in other disciplines –Network analysis is itself a new and evolving field

Gene and Protein Networks II Monday, April 16 2006 CSCI 4830: Algorithms for Molecular Biology Debra Goldberg.

Similar presentations

Presentation on theme: "Gene and Protein Networks II Monday, April 16 2006 CSCI 4830: Algorithms for Molecular Biology Debra Goldberg."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gene and Protein Networks II Monday, April 16 2006 CSCI 4830: Algorithms for Molecular Biology Debra Goldberg.

Similar presentations

Presentation on theme: "Gene and Protein Networks II Monday, April 16 2006 CSCI 4830: Algorithms for Molecular Biology Debra Goldberg."— Presentation transcript:

Similar presentations

About project

Feedback