Correlated Mutations and Co-evolution May 1 st, 2002.

Slides:



Advertisements
Similar presentations
5.4 Correlation and Best-Fitting Lines
Advertisements

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Direct-Coupling Analysis (DCA) and Its Applications in Protein Structure and Protein-Protein Interaction Prediction Wang Yang
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. We demonstrate an approach to protein clustering,
Probability & Statistical Inference Lecture 9
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
Molecular Evolution Revised 29/12/06
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Heuristic alignment algorithms and cost matrices
Regression and Correlation
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Sequence similarity.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Research Methodology Statistics 2 Maha Omair Teaching Assistant Department of Statistics, College of science King Saud University.
Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.
Multiple Sequence Alignments
Protein Interactions and Disease Audry Kang 7/15/2013.
Grade 6 Data Management Unit
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Regression and Correlation
Correlation & Regression
Linear Regression Modeling with Data. The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing.
Exploiting indirect neighbors and topological weight to predict protein function from protein– protein interactions Hon Nian Chua, Wing-Kin Sung and Limsoon.
Biostatistics Unit 9 – Regression and Correlation.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Topic 16 K Plaxco et al (1998), J Mol Biol, 227: D Baker (2000), Nature, 405:39-42.
Sec 1.5 Scatter Plots and Least Squares Lines Come in & plot your height (x-axis) and shoe size (y-axis) on the graph. Add your coordinate point to the.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Comp. Genomics Recitation 3 The statistics of database searching.
Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Pairwise Sequence Analysis-III
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
Correlation The apparent relation between two variables.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
Examining the Genetic Similarity and Difference of the Three Progressor Groups at the First and Middle Visits Nicole Anguiano BIOL398: Bioinformatics Laboratory.
42C.1 Non-Ideal Solutions This development is patterned after that found in Molecular Themodynamics by D. A. McQuarrie and John D. Simon. Consider a molecular.
Examining Genetic Similarity and Difference of the Three Progressor Groups at the First and Middle Visits Nicole Anguiano Bioinformatics Laboratory Loyola.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Univariate Point Estimation Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 1 Chapter 4: Statistical Approaches.
Statistics: Scatter Plots and Lines of Fit. Vocabulary Scatter plot – Two sets of data plotted as ordered pairs in a coordinate plane Positive correlation.
Examining Genetic Similarity and Difference of the Three Progressor Groups at the First and Middle Visits Nicole Anguiano BIOL398: Bioinformatics Laboratory.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Chapter 4: Basic Estimation Techniques
Basic Estimation Techniques
Regression and Correlation
Basic Estimation Techniques
2.6 Draw Scatter Plots and Best-Fitting Lines
1.3 Modeling with Linear Functions Exploration 1 & 2
Volume 19, Issue 8, Pages (August 2011)
6-1 Introduction To Empirical Models
Volume 112, Issue 7, Pages (April 2017)
Volume 19, Issue 7, Pages (July 2011)
The Most General Markov Substitution Model on an Unrooted Tree
Volume 15, Issue 9, Pages (September 2007)
Volume 19, Issue 8, Pages (August 2011)
Universality and diversity of the protein folding scenarios:a comprehensive analysis with the aid of a lattice model  Leonid A Mirny, Victor Abkevich,
Recognition of the Regulatory Nascent Chain TnaC by the Ribosome
Fig. 4 The oxidized and reduced forms of the A2 domain have different dynamics and stresses. The oxidized and reduced forms of the A2 domain have different.
Presentation transcript:

Correlated Mutations and Co-evolution May 1 st, 2002

What is Co-evolution (Correlated Mutation)? Individual regions of proteins interact Regions can be either on the same chain or on different chains (complexes) A mutation in one half of the pair induces a change in the other half of the pair “the tendency of positions in proteins to mutate co-ordinately” Pazos et. al. 1997

“Correlated Mutations Contain Information about Protein-protein interactions” Pazos et. al A possible aid to the “docking” problem, using only sequence information Docking: The process by which protein domains interact with one another  fitting

Methodology The correlation coefficient S is the similarity between residues at the positions i/j of type k versus l Arbitrarily chosen cutoff M predicted contacts (greatest L/2 values) i.e. M=L/2

The Harmonic Average (Xd) Measure of “correlatedness” P ic percentage of correlated pairs with that distance, P ia for all pairs

Comparisons of Correlations

Docking solutions test Note: larger percentages imply worse performance Special mention of 2gcr and 3adk “sequence information does not seem to be sufficient to discriminate”

Figure 5: Scatter plot of Xd vs RMS distance 9pap Hemoglobin 1hbb

Prediction: Hsc70 Figure 6: predicted contacts of Nt and Ct domains of Hsc70 Could be verified experimentally

Coevolving Protein Residues: Maximum Likelihood and Relationship to Structure. Pollock et. al 1999 Using size and charge characteristics to define co-evolution (correlation) Negative Correlation: Correlation due to differences in charge (and thus also coevolution)

The Markov process model (simulated evolution) Two states, A and a Equation 1, the probability of transitioning state λ rate parameter π equilibrium frequency

Use of parameters in model Basic model for how they simulate evolutionary steps

Likelihood Test Characteristic (LR) L I and L D maximum likelihood values for independent and dependent model Method of determining whether dependence is statistically significant

Test of Significance (LR values for change in parameters)

Myoglobin Used structure of myoglobin; compared differences in sequences Variety of species used for sequence information; sperm whale 3D protein structure

LR distributions for myoglobin: size and charge Note the large negative correlation LR values in charge

Co-evolution of Proteins with their Interaction Partners, Goh et. al Applied to PGK Chemokines

What is PGK?

Methodology Two independent sequence alignments, for N and C regions, using PSI-BLAST ClustalW to create distance matrix between complete domains To determine correlation, used equation below X and Y correspond to domains; r a measure of relatedness between these domains

PGK correlations

Chemokines Role of chemokines; importance in immunity (HIV, cancer) Four categories, mean nothing to me

Clustering of Chemokines

Clustering of Chemokine receptors