Correlated mutations The phenomenon of several mutations occurring simultaneously and dependent on each other According to the current hypothesis of molecular.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Direct-Coupling Analysis (DCA) and Its Applications in Protein Structure and Protein-Protein Interaction Prediction Wang Yang
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Protein Structure Prediction using ROSETTA
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
By: Valerie Scheirer, Tim Davis, and Aleksandra Kumor.
Measuring the degree of similarity: PAM and blosum Matrix
Chemotaxis Pathway How can physics help? Davi Ortega.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Determination of alpha-helix propensities within the context of a folded protein Blaber et al. J. Mol. Biol 1994.
Protein Sectors: Evolutionary Units of Three-Dimensional Structure Najeeb Halabi, Olivier Rivoire, Stanislas Leibler, and Rama Ranganthan Cell 138, ,
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Wrong assumptions and misinterpretations in explanations of biological models, phenomena and processes Jacek Leluk ICM UW or Is biologist logical, and.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Sequence Alignments Revisited
Single Motif Charles Yan Spring Single Motif.
Correlation & Regression
Amino acids, peptides and proteins The fundamental component of a protein is the polypeptide chain composed of amino acid residues; Twenty different residues.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
STRUCTURAL ORGANIZATION
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
1 Review What is a molecular clock Explain Why do molecular clocks use mutations that have no effect on phenotype 2 Relate Cause and Effect Why is gene.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
BIOL 200 (Section 921) Lecture # 2, June 20, 2006 Reading for lecture 2: Essential Cell Biology (ECB) 2nd edition. Chap 2 pp 55-56, 58-64, 74-75; Chap.
Technology Matrix: Grade 4 Alexandra Wilson EDU 521 Fall 2010.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Mathematical Modeling of Serial Data Modeling Serial Data Differs from simple equation fitting in that the parameters of the equation must have meaning.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Patterns of selection for or against amino acid change among different CD4 T-cell count progressor groups Michael Pina, Salomon Garcia Journal Club Presentation.
Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University CLASSIFICATION AND CHARACTERIZATION OF NATURAL PROTEIN.
Significance in protein analysis
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Why are there so few key mutant clones? Why are there so few key mutant clones? The influence of stochastic selection and blocking on affinity maturation.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Correlation & Regression Analysis
+ Chapter Scientific Method variable is the factor that changes in an experiment in order to test a hypothesis. To test for one variable, scientists.
3DM: Protein Super-family Platforms 3DM Protein super-family data integration Tom van den Bergh Bio-Prodict.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns with continuous independent variables.
Motif Search and RNA Structure Prediction Lesson 9.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
OBJECTIVE 11 NOTES. Explain the evolutionary significance of a nearly universal genetic code.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Quarternary structure. Many proteins contain more than one polypeptide chain. The quaternary structure describes the number and type of these sub-units.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
FREQUENCY DISTRIBUTION
Discovering the codon bias
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Smoothing Serial Data.
Smoothing Serial Data.
Prediction of protein structure
Product moment correlation
Patterns of HIV-1 evolution in individuals with differing rates of CD4 T cell decline Markham RB, Wang WC, Weisstein AE, Wang Z, Munoz A, Templeton A,
Predicting Gene Expression from Sequence
Presentation transcript:

Correlated mutations The phenomenon of several mutations occurring simultaneously and dependent on each other According to the current hypothesis of molecular positive Darwinian selection, correlated mutations are related to the changes occurring in their neighborhood, they reflect the protein-to-protein interaction and they preserve the biological activity and structural properties of the molecule

Eglin-like proteinase inhibitor family (25 sequences) Bowman-Birk proteinase inhibitor family (52 sequences) Myoglobins (74 sequences) Lysozymes (56 sequences) Four unrelated protein families have been studied for correlated mutations occurrence and characteristics:

The amino acids occurring at variable positions of eglin and Bowman-Birk family EGLIN-LIKE PROTEINS 8KR43 -EILV 9-ELNQRST44 -DL 10EFMQRST45 -ANS 11FW46 DGM 12P47 GQSTV 13EHQ48 AFHINPV 14LV49 -W 15CILV50 -AFIV 16EG51 -T 17ACKLMSTV52 AEKLMQT 18DGPRST53 DEN 19AGITV54 EFILY 20ADEKLS55 DKLNR 21ADEFKLQVY56 CFILPY 22A57 DEKNQ 23AEKMRV58 R 24AEGKQT59 IV 25-U1IKTVY60 FR 26FIV61 ILV 27EKLQT62 FLWY 28AEKLQRT63 DNVY 29DEHQ64 ADHNT 30KMNRY65 -DEIKLPRV 31PSV66 -AGLNRS 32DEKLNQRS67 -DGNT 33AILVY68 -DFIKLNSTY 34DEKQRST69 IV 35-AINV70 ANTV 36-E71 DKNQRSZ 37-V72 AHIMPTV 38-EHIQY73 -ASV 39-FILTV74 -P 40-LMSV75 -AHKQRSTV 41-P76 IV 42-EHIQRS77 AGT BOWMAN-BIRK INHIBITORS 3-DEKQST35 ADET 4-RSTVY36 C 5-KPST37 DEKLNS 6EGHKPSTW38 ADEFGHKLRST 7AEGKP39 C 8C40 AEGILMV 9C41 CEKP 10DNRS42 ANRSTV 11EFHIKLQRST43 EFHKLRTVY 12ACQ44 DGS 13-ADFIKLMPRSTV45 DEFIMNQSY 14C46 -DPS 15C47 -AGPS 16AKR48 KLMPQR 17S49 CHR 18DEFIKMNQR50 FHIQRSVY 19P51 -I 20AP52 C 21EFIKMQT53 ABEFGLQTVY 22C54 DN 23HQRSTV55 -IMQTV 24C56 DHKNQTY 25AEHMNQRSTV57 -DHIKNRTV 26DNQ58 -FGY 27-IKLMQTV59 -CDI 28GLRV60 -HPTY 29DEFIL61 -ADEGKP 30DEKNQRT62 -AKPQS 31-S63 -MT 32C64 -C 33AHPS65 -DEHKNR 34ADS66 -DENPS

The position variability patterns of myoglobins and lysozymes

The observed number and contribution of three correlation types in four different protein families The correlation sets consist of 2 to over 20 residues The protein family (number of correlated positions/set) The correlation statistics Total number of correlation sets observed Number of dispersed sets Number of narrow clusters Number of undirected clusters Number of sets related to active center Eglin-like proteins (2-13) Bowman-Birk proteinase inhibitors (2-28) Myoglobins (2- 29) n.a. Lysozymes (2-15) All families125 (100%)59 (47.2%)38 (30.4%)28 (22.4%)-

Program FEEDBACK – what does it do? The program FEEDBACK is designed to analyze the multiple aligned protein sequences for correlated mutations occurrence. It returns in result all possible residues occurring at all sequence positions of aligned proteins for each residue occurring at each position. The result visualization is assisted by MS EXCEL. This application is available as freeware upon request.

The three types of distribution of correlated positions present in eglin-like proteins. The residue location and relative distribution is shown on tertiary structure of eglin C (P01051) Position no. and occurring residues Correlation versus position [–DGNT]D (8)G (9) 10 [–ELNQRST]ETLNQRS The dispersed correlation

The three types of distribution of correlated positions present in eglin-like proteins. The residue location and relative distribution is shown on tertiary structure of eglin C (P01051) Position no. and occurring residues Correlation versus position [DEKQRST]T (10)Q (6) 15 [CILV]CILV 17 [DGPRST]PRST 27 [EKLQT]EQKL 28 [AEKLQRT]KTEQR 30 [KMNRY]NKM 32 [DEKLNQRS]KLSDEN 56 [CFILPY]CIP 68 [–DFIKLNSTY]DFI–KNT The narrow correlation cluster

The three types of distribution of correlated positions present in eglin-like proteins. The residue location and relative distribution is shown on tertiary structure of eglin C (P01051) Position no. and occurring residues Correlation versus position [KMNRY]K (6)N (15) 18 [DGPRST]SDGPRT 27 [EKLQT]LEQ 29 [DEHQ]DEQ 33 [AILVY]AILV 35 [–AINV]IV–AN 68 [–DFIKLNSTY]–NSDFIKLS Y The spot correlation cluster

The three types of distribution of correlated positions present in Bowman-Birk inhibitor family The residue location and relative distribution is shown on tertiary structure of Bowman- Birk inhibitor from soybean (P01055) The dispersed correlation Position no. and occurring residues Correlation versus position [DEFIL]L (37)E (12) 6 [EGHKPSTW]EGKSTW 13 [–ADFIKLMPRSTV]–AFILPRTM 40 [AEGILMV]AILMVE 48 [KLMPQR]KLMQR

The three types of distribution of correlated positions present in Bowman-Birk inhibitor family The residue location and relative distribution is shown on tertiary structure of Bowman- Birk inhibitor from soybean (P01055) The narrow correlation cluster Position no. and occurring residues Correlation versus position [–ADFIKLMPRSTV]L (11)M (10) A (8) 4 [–RSTVY]V–SS 5 [–KPST]K–SS 7 [AEGKP]APP 11 [EFHIKLQRST]TEHQS 21 [EFIKMQT]TQEQ

The three types of distribution of correlated positions present in Bowman-Birk inhibitor family The residue location and relative distribution is shown on tertiary structure of Bowman- Birk inhibitor from soybean (P01055) The spot correlation cluster Position no. and occurring residues Correlation versus position [AEHMNQRSTV]A (15)V (9) 11 [EFHIKLQRST]EFKLRSHQ 23 [HQRSV]QR 50 [FHIQRSVY]HRSFI

The three types of distribution of correlated positions present in myoglobins. The residue location and relative distribution is shown on tertiary structure of human myoglobin (P0244, pdb1bzp) The dispersed correlation Position no. and occurring residues Correlation versus position [AGPQST]A (6)G (49)N (9) 128 [ABEHQ]QBEHQQ 137 [ILNSV]LLILNSV

The narrow correlation cluster The three types of distribution of correlated positions present in myoglobins The residue location and relative distribution is shown on tertiary structure of human myoglobin (P0244, pdb1bzp) Position no. and occurring residues Correlation versus position [AEGST]A (7)G (55)S (10) 22 [AEGPSTV]PSTAEGP STV P 26 [EGHKLQ]LQEGH QK Q 27 [ADEFLNT]AENADEF T E 30 [ILMTV]ILIMTVI 53 [ADEGQ]DEQADEGD 54 [ADEILQ]AELDELQE 59 [ADEF]DEADEFE 128 [ABEHQ]QABEH Q Q

The three types of distribution of correlated positions present in myoglobins The residue location and relative distribution is shown on tertiary structure of human myoglobin (P0244, pdb1bzp) The spot correlation cluster Position no. and occurring residues Correlation versus position [AMSTV]A (58)S (7) 27 [ADEFLNT]ADEFNTE 31 [GKRS]GKRSR 78 [AKLQ]KALQ 109 [DEGNT]DEGTE 116 [AEHKQST]AEHKQSA 117 [AEKNQS]AEKQSE 122 [BDEN]BDEND

The three types of distribution of correlated positions present in lysozymes The residue location and relative distribution is shown on tertiary structure of lysozyme from rat (P00697, pdb5lyz) The dispersed correlation Position no. and occurring residues Correlation versus position [GHKNR]G (7)H (31)N (16) 30 [ILMV]MVILMVV 40 [DFKNR]DNNFKNR

The three types of distribution of correlated positions present in lysozymes The residue location and relative distribution is shown on tertiary structure of lysozyme from rat (P00697, pdb5lyz) The narrow correlation cluster Position no. and occurring residues Correlation versus position [FL]F (38)L (18) 26 [–ILMV]–ILMVL 33 [AISTV]AISTVA 44 [FIMRTVY]FIMRTVYT 54 [–KRSTY]–KRSTYT 84 [AKNRS]AKNRSS

The three types of distribution of correlated positions present in lysozymes The residue location and relative distribution is shown on tertiary structure of lysozyme from rat (P00697, pdb5lyz) The spot correlation cluster Position no. and occurring residues Correlation versus position [–DWY]W (16)Y (36) 13 [–ILM]MIL 15 [–AEKNQRS]RAEKNQ RS 20 [DEGKN]KNDG 27 [–AEGPR]GAEP 44 [FIMRTVY]TFIRTY 46 [GHKNPRTY]RGHNRY 105 [–AGHPQRV]GRV–APQ 109 [DGKNQRST]NGKRST 121 [–HKQRT]THKQR

CONCLUSIONS Almost 50% of the observed correlated mutations refer to residues that are not in contact nor interact with each other The dispersed correlations are present in various protein families and they occur independently on the mechanism of structure stabilization The phenomenon of correlated mutations is not limited to interacting residues and/or known biological activity determination The current hypothesis of positive Darwinian selection does not fully explain the mechanism and occurrence of correlated mutations

Łukasz Becella 1 Monika Sobczyk 1 Jacek Leluk 1,2 1 Institute of Biochemistry and Molecular Biology, Univeristy of Wrocław 2 Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw Monika Grabiec 1 Correlated mutations team Similarity estimation team