Correlated Mutations and Co-evolution May 1 st, 2002
What is Co-evolution (Correlated Mutation)? Individual regions of proteins interact Regions can be either on the same chain or on different chains (complexes) A mutation in one half of the pair induces a change in the other half of the pair “the tendency of positions in proteins to mutate co-ordinately” Pazos et. al. 1997
“Correlated Mutations Contain Information about Protein-protein interactions” Pazos et. al A possible aid to the “docking” problem, using only sequence information Docking: The process by which protein domains interact with one another fitting
Methodology The correlation coefficient S is the similarity between residues at the positions i/j of type k versus l Arbitrarily chosen cutoff M predicted contacts (greatest L/2 values) i.e. M=L/2
The Harmonic Average (Xd) Measure of “correlatedness” P ic percentage of correlated pairs with that distance, P ia for all pairs
Comparisons of Correlations
Docking solutions test Note: larger percentages imply worse performance Special mention of 2gcr and 3adk “sequence information does not seem to be sufficient to discriminate”
Figure 5: Scatter plot of Xd vs RMS distance 9pap Hemoglobin 1hbb
Prediction: Hsc70 Figure 6: predicted contacts of Nt and Ct domains of Hsc70 Could be verified experimentally
Coevolving Protein Residues: Maximum Likelihood and Relationship to Structure. Pollock et. al 1999 Using size and charge characteristics to define co-evolution (correlation) Negative Correlation: Correlation due to differences in charge (and thus also coevolution)
The Markov process model (simulated evolution) Two states, A and a Equation 1, the probability of transitioning state λ rate parameter π equilibrium frequency
Use of parameters in model Basic model for how they simulate evolutionary steps
Likelihood Test Characteristic (LR) L I and L D maximum likelihood values for independent and dependent model Method of determining whether dependence is statistically significant
Test of Significance (LR values for change in parameters)
Myoglobin Used structure of myoglobin; compared differences in sequences Variety of species used for sequence information; sperm whale 3D protein structure
LR distributions for myoglobin: size and charge Note the large negative correlation LR values in charge
Co-evolution of Proteins with their Interaction Partners, Goh et. al Applied to PGK Chemokines
What is PGK?
Methodology Two independent sequence alignments, for N and C regions, using PSI-BLAST ClustalW to create distance matrix between complete domains To determine correlation, used equation below X and Y correspond to domains; r a measure of relatedness between these domains
PGK correlations
Chemokines Role of chemokines; importance in immunity (HIV, cancer) Four categories, mean nothing to me
Clustering of Chemokines
Clustering of Chemokine receptors