Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary

Slides:



Advertisements
Similar presentations
Chemistry 2100 Lecture 10.
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Review: Amino Acid Side Chains Aliphatic- Ala, Val, Leu, Ile, Gly Polar- Ser, Thr, Cys, Met, [Tyr, Trp] Acidic (and conjugate amide)- Asp, Asn, Glu, Gln.
Improved prediction of protein-protein binding sites using a support vector machine ( James Bradford, et al (2004)) Tapan Patel CISC841 Trypsin (and inhibitor.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Thomas Blicher Center for Biological Sequence Analysis
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
C enter For C omputational B iology and B ioinformatics Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Elements Primary to Quaternary Structure.
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
Protein Structural Prediction. Protein Structure is Hierarchical.
Protein Tertiary Structure Prediction
Protein Bioinformatics Course
What are proteins? Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, … Linear polymer chain composed.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Predicting protein disorder Peter Tompa Institute of Enzymology Hungarian Academy of Sciences Budapest, Hungary.
Makromolekulak_2010_12_07 Simon István. Prion protein.
Representations of Molecular Structure: Bonds Only.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I Prof. Corey O’Hern Department of Mechanical Engineering & Materials.
Prediction of protein disorder Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest,
PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry.
Question 1: Name, PDB codes Eric Martz (no research lab) Macromolecular visualization I chose*: 3onz HUMAN TETRAMERIC HEMOGLOBIN:
Protein Folding & Biospectroscopy F14PFB David Robinson Mark Searle Jon McMaster
BIOL 200 (Section 921) Lecture # 2, June 20, 2006 Reading for lecture 2: Essential Cell Biology (ECB) 2nd edition. Chap 2 pp 55-56, 58-64, 74-75; Chap.
Department of Mechanical Engineering
Secondary structure prediction
CS790 – BioinformaticsProtein Structure and Function1 Review of fundamental concepts  Know how electron orbitals and subshells are filled Know why atoms.
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
The α-helix forms within a continuous strech of the polypeptide chain 5.4 Å rise, 3.6 aa/turn  1.5 Å/aa N-term C-term prototypical  = -57  ψ = -47 
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Chap. 4. Problem 1. Part (a). Double and triple bonds are shorter and stronger than single bonds. Because the length of a peptide bond more closely resembles.
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
1 Web Site: Dr. G P S Raghava, Head Bioinformatics Centre Institute of Microbial Technology, Chandigarh, India Prediction.
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Question 1: Name, PDB codes Eric Martz Keiichi Namba Macromolecular visualization I chose: 3onz HUMAN TETRAMERIC HEMOGLOBIN: PROXIMAL.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Proteins.
Hyperthermophile subtilases
Fehérjék 3. Simon István. p27 Kip1 IA 3 FnBP Tcf3 Bound IUP structures.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
(H)MMs in gene prediction and similarity searches.
1 Mona Singh What is computational biology?. 2 Mona Singh Genome The entire hereditary information content of an organism.
Doug Raiford Lesson 14.  Reminder  Involved in virtually every chemical reaction ▪ Enzymes catalyze reactions  Structure ▪ muscle, keratins (skin,
Hydrodynamics Or How to study the structure of a protein you haven’t crystallized.
Intrinsically disordered proteins Zsuzsanna Dosztányi EMBO course Budapest, 3 June 2016.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semestre 3, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) e-documents:
Biochemistry Free For All
Protein Structure and Properties
Database extraction of residue-specific empirical potentials
Prediction of RNA Binding Protein Using Machine Learning Technique
Haixu Tang School of Inforamtics
R. Elliot Murphy, Alexandra B. Samal, Jiri Vlach, Jamil S. Saad 
Levels of Protein Structure
Protein Disorder Prediction
Example of regression by RBF-ANN
Discussion of Protein Disorder Prediction
Presentation transcript:

Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary

Protein Structure/Function Paradigm Dominant view: 3D structure is a prerequisite for protein function Amino acid sequence Structure Function

But…. Heat stability Protease sensitivity Failed attempts to crystallize Lack of NMR signals “Weird” sequences …

IDPs Intrinsically disordered proteins/regions (IDPs/IDRs) Do not adopt a well-defined structure in isolation under native-like conditions Highly flexible ensembles, little secondary structure, no folded structure Functional proteins

Protein disorder is prevalent LDR (40<) protein, % kingdom B A E

Protein disorder is important Prion proteinPrion disease CFTRCystic fibrosis  Alzheimer’s  -synucleinParkinson’s p53, BRCA1cancer

Protein disorder is functional regulatory signaling biosynthetic metabolic protein (%) 30< 40< 50< 60 < length of disordered region Iakoucheva et al. (2002) J. Mol. Biol. 323, 573

p53 tumor suppressor TAD DBDTDRD transactivation DNA-binding tetramerization regulation Wells et al. PNAS 2008; 105: 5762

Heterogeneity in protein disorder Flexible loop RC-like Compact Transient structures

Modularity in proteins Many proteins contains multiple domains Composed of ordered and disordered segments Average length of a PDB chain is < 300 Average length of a human proteins ~ 500 Average length of cancer-related proteins > 900 Structural properties of full length proteins …

Bioinformatics of protein disorder Part 1  Databases  Prediction of protein disorder Part 2  Prediction of functional regions within IDPs

Datasets Ordered proteins in the PDB  over structures  few 1000 folds Some structures in the PDB classify as disordered! only adopt a well-defined structure in complex in crystals, with cofactors, proteins, … Disorder in the PDB  Missing electron density regions from the PDB  NMR structures with large structural variations  Less than 10% of all positions  Usually short (<10 residues), often at the termini

Disprot Current release: 6.02 Release date: 05/24/2013 Number of proteins: 694 Number of disordered regions: 1539 Experimentally verified disordered proteins collected from literature (X-ray, NMR, CD, proteolysis, SAXS, heat stability, gel filtration, …)

Additional databases Combining experiments and predictions Genome level annotations  MobiDB:  D2P2:  IDEAL:

Amino acid compositions He et al. Cell Res. 2009; 19: 929

Sequence properties of disordered proteins Amino acid compositional bias High proportion of polar and charged amino acids (Gln, Ser, Pro, Glu, Lys) Low proportion of bulky, hydrophobhic amino acids (Val, Leu, Ile, Met, Phe, Trp, Tyr) Low sequence complexity Signature sequences identifying disordered proteins Protein disorder is encoded in the amino acid sequence

Uversky plot: charge-hydrophobicity (two parameters) Mean hydrophobicity Mean net charge Uversky (2002) Eur. J. Biochem. 269, 2

p53 Prilusky (2005) Bioinformatics 21, 3435 Making it position specific: FoldIndex

Disorder Prediction Methods Amino acid propensity scales GlobPlot Compare the tendency of amino acids:  to be in coil (irregular) structure.  to be in regular secondary structure elements Linding (2003) NAR 31, 3701

GlobPlot

From position specific predictions Where are the ordered domains? Longer disordered segments? Noise vs. real data

GlobPlot: downhill regions correspond to putative domains (GlobDom) up-hill regions correspond to predicted protein disorder

Disorder Prediction Methods Physical principles IUPred If a residue cannot form enough favorable interactions within its sequential environment, it will not adopt a well defined structure it will be disordered Dosztanyi (2005) JMB 347, 827

Energy description of proteins For example: L-I interaction is frequent (hydrophobic effect) L-I interaction energy is low (favorable) K-R interaction is rare (electrostatic repulsion) K-R interaction energy is high (unfavorable) Estimation of interaction energies based on statistical potentials: Calculated from the frequency of amino acid interactions in globular proteins alone, based on the Boltzmann hypothesis.

Predicting protein disorder - IUPred The algorithm: …PSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPRVAPAPAAPTPAA... Based only on the composition of environment of D’s we try to predict if it is in a disordered region or not: Amino acid composition of environ- ment: A – 10% C – 0% D – 12 % E – 10 % F – 2 % etc… Estimate the interaction energy between the residue and its environment Decide the probability of the residue being disordered based on this

IUPred:

Disorder Prediction Methods Machine learning DISOPRED2 Binary classification problem Ward (2004) JMB 337, 635

DISOPRED2 …..AMDDLMLSPDDIEQWFTED….. Assign label: D or O D O F(inp) SVM with linear kernel

DISOPRED2 Cutoff value!

PONDR VSL2 Differences in short and long disorder  amino acid composition  methods trained on one type of dataset tested on other dataset resulted in lower efficiencies PONDR VSL2: separate predictors for short and long disorder combined length independent predictions Peng (2006) BMC Bioinformatics 7, 208

PONDR-FIT Sequence Disorder prediction methods PONDR VLXT PONDR VL3 PONDR VSL2 IUPred FoldIndex TopIDP Prediction ANN Meta-predictor Xue et al. Biochem Biophys Acta. 2010; 180: 996

Complexity of protein disorder

Prediction of protein disorder Disordered residues can be predicted from the amino acid sequence  ~ 80% at the residue level Methods can be specific to certain type of disorder  accordingly, accuracies vary depending on datasets Predictions are based on binary classification of disorder