Download presentation
Presentation is loading. Please wait.
Published byChristian Cross Modified over 9 years ago
1
Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary zsuzsa@enzim.hu zsuzsa@enzim.hu
2
Protein Structure/Function Paradigm Dominant view: 3D structure is a prerequisite for protein function Amino acid sequence Structure Function
3
But…. Heat stability Protease sensitivity Failed attempts to crystallize Lack of NMR signals “Weird” sequences …
5
IDPs Intrinsically disordered proteins/regions (IDPs/IDRs) Do not adopt a well-defined structure in isolation under native-like conditions Highly flexible ensembles, little secondary structure, no folded structure Functional proteins
6
Protein disorder is prevalent 20 40 60 0 LDR (40<) protein, % kingdom B A E
7
Protein disorder is important Prion proteinPrion disease CFTRCystic fibrosis Alzheimer’s -synucleinParkinson’s p53, BRCA1cancer
8
Protein disorder is functional regulatory signaling biosynthetic metabolic protein (%) 30< 40< 50< 60 < length of disordered region Iakoucheva et al. (2002) J. Mol. Biol. 323, 573
9
p53 tumor suppressor TAD DBDTDRD transactivation DNA-binding tetramerization regulation Wells et al. PNAS 2008; 105: 5762
10
Heterogeneity in protein disorder Flexible loop RC-like Compact Transient structures
11
Modularity in proteins Many proteins contains multiple domains Composed of ordered and disordered segments Average length of a PDB chain is < 300 Average length of a human proteins ~ 500 Average length of cancer-related proteins > 900 Structural properties of full length proteins …
12
Bioinformatics of protein disorder Part 1 Databases Prediction of protein disorder Part 2 Prediction of functional regions within IDPs
13
Datasets Ordered proteins in the PDB over 94000 structures few 1000 folds Some structures in the PDB classify as disordered! only adopt a well-defined structure in complex in crystals, with cofactors, proteins, … Disorder in the PDB Missing electron density regions from the PDB NMR structures with large structural variations Less than 10% of all positions Usually short (<10 residues), often at the termini
14
Disprot www.disprot.org Current release: 6.02 Release date: 05/24/2013 Number of proteins: 694 Number of disordered regions: 1539 Experimentally verified disordered proteins collected from literature (X-ray, NMR, CD, proteolysis, SAXS, heat stability, gel filtration, …)
15
Additional databases Combining experiments and predictions Genome level annotations MobiDB: http://mobidb.bio.unipd.it D2P2: http://d2p2.pro IDEAL: http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL
16
Amino acid compositions He et al. Cell Res. 2009; 19: 929
17
Sequence properties of disordered proteins Amino acid compositional bias High proportion of polar and charged amino acids (Gln, Ser, Pro, Glu, Lys) Low proportion of bulky, hydrophobhic amino acids (Val, Leu, Ile, Met, Phe, Trp, Tyr) Low sequence complexity Signature sequences identifying disordered proteins Protein disorder is encoded in the amino acid sequence
18
Uversky plot: charge-hydrophobicity (two parameters) Mean hydrophobicity Mean net charge Uversky (2002) Eur. J. Biochem. 269, 2
19
p53 Prilusky (2005) Bioinformatics 21, 3435 Making it position specific: FoldIndex http://bip.weizmann.ac.il/fldbin/findex
20
Disorder Prediction Methods Amino acid propensity scales GlobPlot Compare the tendency of amino acids: to be in coil (irregular) structure. to be in regular secondary structure elements Linding (2003) NAR 31, 3701
21
GlobPlot
22
From position specific predictions Where are the ordered domains? Longer disordered segments? Noise vs. real data
23
GlobPlot: http://globplot.embl.de/ downhill regions correspond to putative domains (GlobDom) up-hill regions correspond to predicted protein disorder
24
Disorder Prediction Methods Physical principles IUPred If a residue cannot form enough favorable interactions within its sequential environment, it will not adopt a well defined structure it will be disordered Dosztanyi (2005) JMB 347, 827
25
Energy description of proteins For example: L-I interaction is frequent (hydrophobic effect) L-I interaction energy is low (favorable) K-R interaction is rare (electrostatic repulsion) K-R interaction energy is high (unfavorable) Estimation of interaction energies based on statistical potentials: Calculated from the frequency of amino acid interactions in globular proteins alone, based on the Boltzmann hypothesis.
26
Predicting protein disorder - IUPred The algorithm: …PSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPRVAPAPAAPTPAA... Based only on the composition of environment of D’s we try to predict if it is in a disordered region or not: Amino acid composition of environ- ment: A – 10% C – 0% D – 12 % E – 10 % F – 2 % etc… Estimate the interaction energy between the residue and its environment Decide the probability of the residue being disordered based on this
27
IUPred: http://iupred.enzim.hu/
28
Disorder Prediction Methods Machine learning DISOPRED2 Binary classification problem Ward (2004) JMB 337, 635
29
DISOPRED2 …..AMDDLMLSPDDIEQWFTED….. Assign label: D or O D O F(inp) SVM with linear kernel
30
DISOPRED2 Cutoff value!
31
PONDR VSL2 Differences in short and long disorder amino acid composition methods trained on one type of dataset tested on other dataset resulted in lower efficiencies PONDR VSL2: separate predictors for short and long disorder combined length independent predictions Peng (2006) BMC Bioinformatics 7, 208
32
PONDR-FIT Sequence Disorder prediction methods PONDR VLXT PONDR VL3 PONDR VSL2 IUPred FoldIndex TopIDP Prediction ANN Meta-predictor Xue et al. Biochem Biophys Acta. 2010; 180: 996
33
Complexity of protein disorder
34
Prediction of protein disorder Disordered residues can be predicted from the amino acid sequence ~ 80% at the residue level Methods can be specific to certain type of disorder accordingly, accuracies vary depending on datasets Predictions are based on binary classification of disorder
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.