Social behavior of proteins? Rui Alves
Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data
Proteins do not work alone!
Networks of “interactions” predict global function Having the network of proteins/genes in which your protein/gene is inserted provides predictive information: –Which cellular pathways or processes is your protein/gene likely to be involved in
Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data
Publication databases are source of information
Meta text databases create social models from publication analysis
iHOP is a sofisticated context analysis motor
How does meta-text analysis create networks? Literature database Gene names database Language rules database scripts Entry Gene list Rule list Server/ Program Your genes List of entries mentioning your gene e.g Ste20 e.g activate, inhibit rescue
Organization of the talk Social behavior of the protein?!?!?!? Meta text analysis Evolutionary based protein interaction prediction Using pathway homology Using protein docking Using microarray data Using protein interaction data
Proteins that have coevolved share a function If protein A has co-evolved with protein B, they are likely to be involved in the same process Looking for proteins that coevolved will help prediction social networks of proteins There are many methods to look for co-evolution of proteins –Phylogenetic profiling, gene neighbourhoods, gene fusion events, phylogenetic trees…
Creating phylogenetic profiles Database of proteins in fully sequenced genomes Homology search against each genome Sequence of each protein Database of proteins in one genome Target Genome Homologue in Genome 1? Homologue in Genome 2? … Protein AYN… ………… Database of profiles for each protein in each organism
Using phylogenetic profiles to predict protein interactions Your Sequence (A) Server/ Program Database of profiles for each protein in each organism Database of proteins in fully sequenced genomes Protein id A Target Genome Homologue in Genome 1? Homologue in Genome 2? … ABC…ABC… YNY…YNY… NYN…NYN… …………………… AB 00 i/number of genomes<1 C 1 j/number of genomes A 1 C 0.9 … B 0.11 … Proteins (A and C) that are present and absent in the same set of genomes are likely to be involved in the same process and therefore interact Similarly, if protein A is absent in all genomes in which protein B is present there is a likelihood that they perform the same function! 2 Calculate coincidence index
How to do it? Download genomes Use blast for homology Use perl for homology processing and coincidence index calculations
Proteins A and B are in a conserved relative position in most genomes which is an indication that they are likely to interact Syntheny/Conservation of gene neighborhoods Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein CProtein D Protein AProtein BProtein C Protein D Protein A Protein BProtein C Protein D … Protein AProtein BProtein CProtein D Which of these proteins “interact”?
How to do it? Download genomes Use perl for analysis
Gene fusion events Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein C Protein D Protein AProtein BProtein CProtein D Protein A Protein BProtein C Protein D … Protein AProtein BProtein CProtein D Which of these proteins interact? Proteins A and B have suffered gene fusion events in at least some genomes, which is an indication that they are likely to interact
How to do it? Download genomes Use perl for analysis
Building phylogenetic trees of proteins Genome 1 Genome 2 Genome 3 Genome … Protein AProtein BProtein CProtein D Protein AProtein BProtein C Protein D Protein A Protein BProtein C Protein D … Get sequence of all homogues, align and build a phylogenetic tree Phylogenetic trees represent the evolutionary history of homologue genes/proteins based on their sequence
Distance based phylogenetic trees ACTDEEGGGGSRGHI… A-TEEDGGAASRGHI… ACFDDEGGGGSRGHL… … A1 A2 A3 … A1 A2 A3 A1 5 substitutions 3 substitutions A2 A3 8 substitutions A2 A3 A1 3 5
Maximum likelihood phylogenetic trees ACTDEEGGGGSRGHI… A-TEEDGGAASRGHI… ACFDDEGGGGSRGHL… … Alignment Probability of aa substitution A - E D … A … … E D …
Maximum likelihood phylogenetic trees ACTDEEGGGGSRGHI… A-TEEDGGAASRGHI… ACFDDEGGGGSRGHL… … Alignment A1 A2 A3 A1 5 substitutions 3 substitutions A2 A3 8 substitutions p(1,2) p(1,3) p(2,3) p(2,3)<p(1,2)<p(1,3) A1 A3 A2 A3 A1
Similarity of phylogenetic trees indicates “interaction” between proteins A1 B2 C1 D1 A2 A3 …… … B1 B3 C2 C3 … D3 D2 Proteins A and B have similar evolutionary trees and thus are likely to “interact”
How to do it? Download genomes Use blast,… for analysis Use Clustal, Phylip, PAUP, … for tree building
Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data
Pathway homology Database of protein sequences in genomes Database of pathways in genomes Database of interactions in genomes Server/ Program Your Sequence Homologue(s) Output
Pathway homology complements protein homology
Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data
What is protein docking? Protein A Protein B Protein A Protein B Protein A Protein B Protein A Protein B Same area of interaction Protein A Protein B Protein A Protein B Positive Negative Best Docking
Caveats of using protein docking to predict interaction Protein A Protein B Protein C GlycolisysDNA synthesis Proteins may not come into contact in the cell although if they did they could interact Very heavy computationaly
When shoudl we use protein docking to predict network structure? When we have a group of proteins that are known to be involved in the same function and we want to predict how the different proteins interact with each other
How to do it? Download structures or create structure predictions Use GRAMM, HEX, …
Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data
Predicting protein interactions using micro array data cells Stimulum Purify cDNA Compare cDNA levels of corresponding genes in the different populations Genes overexpressed as a result of stimulus Genes underexpressed as a result of stimulus Genes with expression independent of stimulus Group of proteins involved in response to the stimulus
Organization of the talk Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data
Predicting protein networks using protein interaction data Database of protein interactions Server/ Program Your Sequence (A) A BC D E F Continue until you are satisfied or completed the network
Summary Social behavior of the protein?!?!?!? Using meta text analysis Using phylogenetic profiling Using pathway homology Using protein docking Using microarray data Using protein interaction data