Intrinsically disordered proteins Zsuzsanna Dosztányi EMBO course Budapest, 3 June 2016.

Slides:



Advertisements
Similar presentations
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Secondary structure prediction from amino acid sequence.
Review: Amino Acid Side Chains Aliphatic- Ala, Val, Leu, Ile, Gly Polar- Ser, Thr, Cys, Met, [Tyr, Trp] Acidic (and conjugate amide)- Asp, Asn, Glu, Gln.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Improved prediction of protein-protein binding sites using a support vector machine ( James Bradford, et al (2004)) Tapan Patel CISC841 Trypsin (and inhibitor.
Structural bioinformatics
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
1 Levels of Protein Structure Primary to Quaternary Structure.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Introduction to bioinformatics
Protein Modules An Introduction to Bioinformatics.
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
C enter For C omputational B iology and B ioinformatics Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Protein Tertiary Structure Prediction
Protein Bioinformatics Course
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Predicting protein disorder Peter Tompa Institute of Enzymology Hungarian Academy of Sciences Budapest, Hungary.
Makromolekulak_2010_12_07 Simon István. Prion protein.
Representations of Molecular Structure: Bonds Only.
Prediction of protein disorder Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest,
PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Secondary structure prediction
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Fehérjék 3. Simon István. p27 Kip1 IA 3 FnBP Tcf3 Bound IUP structures.
Motif Search and RNA Structure Prediction Lesson 9.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semestre 3, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) e-documents:
Bioinformatics Overview
Madhavi Ganapathiraju Graduate student Carnegie Mellon University
Evaluating classifiers for disease gene discovery
Introduction Feature Extraction Discussions Conclusions Results
Predicting Active Site Residue Annotations in the Pfam Database
Support Vector Machine (SVM)
Protein Bioinformatics Course
Haixu Tang School of Inforamtics
Aligning Sequences You have learned about: Data & databases Tools
Levels of Protein Structure
Protein structure prediction.
Protein Disorder Prediction
Systems-wide Identification of cis-Regulatory Elements in Proteins
Volume 26, Issue 1, Pages e2 (January 2018)
Presentation transcript:

Intrinsically disordered proteins Zsuzsanna Dosztányi EMBO course Budapest, 3 June 2016

IDPs Intrinsically disordered proteins/regions (IDPs/IDRs) Do not adopt a well-defined structure in isolation under native-like conditions Highly flexible ensembles Functional proteins Involved in various diseases

Ordered structures from the PDB Over PDB structures Not everything in the PDB is ordered Cofactors, complex, DNA-RNA, crystal contacts

Failed attempts to crystallize Lack of NMR signals Heat stability Protease sensitivity Increased molecular volume “Freaky” sequences … In the literature Disprot database: Disprot database: Where can we find disordered proteins?

NMR structures with large structural variations In the PDB Missing electron density regions from the PDB tegniddsliggnasaegpegegtestv

p53 tumor suppressor Wells et al. PNAS 2008; 105: 5762 TAD DBD TD RD disordered ordered disordered

Funnels Flock et al Curr Opin Struct Biol. 2014; 26:62

Sequence properties of IDPs Amino acid compositional bias High proportion of polar and charged amino acids (Gln, Ser, Pro, Glu, Lys) Low proportion of bulky, hydrophobhic amino acids (Val, Leu, Ile, Met, Phe, Trp, Tyr) Low sequence complexity Signature sequences identifying disordered proteins

Protein disorder is encoded in the amino acid sequence TDVEAAVNSLVNLYLQAS YLS How can we discriminate ordered and disordered regions ?

Prediction: classification problem InputMethod 1.sequence 2.propensity vector 3.alignment (profile) 4.interaction energies 1. statistical methods 2. machine learning 3. structural approach Output (property) 1. DisProt 2. PDB 1. binary 2. score Training/Assessment

DISOPRED2 …..AMDDLMLSPDDIEQWFTED….. Assign label: D or O D O F(inp) SVM with linear kernel Trained in missing residues from X-ray structures Ward et al. J.Mol. Biol. 2004; 337:653

IUPred Globular proteins form many favorable interactions to ensure the stability of the structure Disordered protein cannot form enough favourable interactions Energy estimation method Based on globular proteins No training on disordered proteins Energy estimation method Based on globular proteins No training on disordered proteins Dosztanyi (2005) JMB 347, 827

Globular proteins form regular secondary structures, and different amino acids have different tendencies to be in them Linding (2003) NAR 31, 3701 GlobPlot Compare the tendency of amino acids: to be in coil (irregular) structure. to be in regular secondary structure elements

Typical output

GlobPlot downhill regions correspond to putative domains (GlobDom) up-hill regions correspond to predicted protein disorder

Different flavors of disorder Short and long disordered regions have different compositional biases

PONDR VSL2 Differences in short and long disorder  amino acid composition  Short disorder is often at the termini  methods trained on one type of dataset tested on other dataset resulted in lower efficiencies - Short version – Long version - PONDR VSL2: separate predictors for short and long disorder combined length independent predictions Peng (2006) BMC Bioinformatics 7, 208

Evaluation ROC curve

Prediction of protein disorder Disordered is encoded in the amino acid sequence Can be predicted from the sequence ~80% accuracy Large-scale studies Evolution Function Binary classification

Genome level annotations Bridging over the large number of sequences and the small number of experimentally verified cases Combining experiments and predictions  MobiDB:  D2P2:  IDEAL: Multiple predictors How to resolve contradicting experiments/ predictions?  Majority rules

Functions of IDPs I Entropic chains ‏ II Linkers III Molecular recognition IV Protein modifications (e.g. phosphorylation) V Assembly of large multiprotein complexes

Protein interactions of IDPs Complex between p53 and MDM2

Coupled folding and binding Entropic penalty Functional advantages Weak transient, yet specific interactions Post-translational modifications Flexible binding regions that can overlap Evolutionary plasticity SignalingRegulationSignalingRegulation

Binding regions within IDPs Experimental characterization is very difficult Computational methods Complexes of IDPs in the PDB:~ 200 Known instances:~ Estimated number of such interactions in the human proteome:~

Binding regions within IDPs SLIMs: Short linear motifs 3-11 residues long, average size 6-7 residues although enriched in IDRs, around 20% are in located within IDRs Disordered binding regions, Morfs undergo disorder to order transition upon binding usually less then 30 residues, can be up to 70 Intrinsically disordered domains evolutionary conserved disordered segments

p27 Inhibitor of CDK2-CyclinA complex.

Bioinformatical approaches (~10, as opposed to the more than 50 disorder prediction methods)  Biophysical properties (ANCHOR)  Machine Learning methods (MorfPred, Morf chibi, DISOPRED3)  Linear motifs (Regular Expression, PSSMs)  Conservations patterns (SlimPrints, PhyloHMM )

Prediction of binding sites located within IDPs Heterogeneity  adopted secondary structure elements  size of the binding regions  flexibility in the bound form  Interaction sites are usually linear (consist of only 1 part)  enrichment of interaction prone amino acids  can be predicted from sequence without predicting the structure

Prediction of disordered binding regions – ANCHOR What discriminates disordered binding regions? A cannot form enough favorable interactions with their sequential environment It is favorable for them to interact with a globular protein Based on simplified physical model Based on an energy estimation method using statistical potentials Captures sequential context

ANCHOR C-terminal region of human p53 Dosztanyi et al. Bioinformatics. 2009;25:2745

DISOPRED3  Uses three SVMs Simple sequence profile PSI-Blast profiles (very slow) PSI-Blast profiles with global features  trained on short chains in complex Jones DT, Cozzetto D. Bioinformatics. 2015; 31:857

DISOPRED3

Prediction of binding regions within IDPs Combined predictions provide more biologically meaningful predictions Lot of rooms for improvements... What is the binding partner?