Download presentation
Presentation is loading. Please wait.
Published bySibyl Jordan Modified over 9 years ago
1
C enter For C omputational B iology and B ioinformatics Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics Indiana University School of Medicine Presented at: October 22, 2010
2
Outline What are “Intrinsically Disordered Proteins” ? Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research
3
Definitions: Intrinsically Disordered Proteins (IDPs) and ID Regions (IDRs) Whole proteins and regions of proteins are intrinsically disordered if they lack stable 3D structure under physiological conditions, But exist instead as highly dynamic, rapidly interconverting ensembles without particular equilibrium values for their coordinates or bond angles and with non- cooperative conformational changes.
4
Outline What are “Intrinsically Disordered Proteins” ? Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research
5
Why are IDPs / IDRs unstructured? From the 1950s to now, >> 1,000 IDPs / IDRs studied and characterized Visit: http://www.disprot.org Why do IDPs & IDRs lack structure? –Lack a ligand or partner? –Denatured during isolation? –Folding requires conditions found inside cells? –Lack of folding encoded by amino acid sequence?
6
Amino Acid Compositions Surface Buried
7
Why are IDPs / IDRs unstructured? To a first approximation, amino acid composition determines whether a protein folds or remains intrinsically disordered. Given a composition that favors folding, the sequence details determine which fold. Given a composition that favors not folding, the sequence details provide motifs for biological function.
8
Outline What are “Intrinsically Disordered Proteins” ? Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research
9
Prediction of Intrinsic Disorder Predictor Validation on Out-of-Sample Data Prediction Attribute Selection or Extraction Separate Training and Testing Sets Predictor Training Ordered / Disordered Sequence Data Aromaticity, Hydropathy, Charge, Complexity Neural Networks, SVMs, etc.
10
First Machine-learning Predictor SDR/MDR/LDR Predictors 1.Short Disordered Regions (SDR): 7 – 21 missing AA Medium Disordered Regions (MDR): 22 – 44 Long Disordered Regions (LDR): 45 or more 2.SDR / MDR / LDR predictors: Neural networks 3.Training dataset: proteins with missing AA SDR: 34 proteins, 11,050 AA, 38 IDR, 411 IDAA MDR: 20 proteins, 4,764 AA, 22 IDR, 464 IDAA LDR: 7 proteins, 2,069 AA, 7 IDR, 465 IDAA 4.Feature selection: standard sequential forward selection 5.Accuracy: 59 – 67% estimated by 5-cross validation 6.Better than chance; Better on self than on not self Romero P, et.al. Proc. IEEE International Conference on Neural Networks. 1:90-95 (1997)
11
Next: PONDR®VL-XT XN (1) XC (1) VL1 (2) VL-XT (2) 11 14 N-11 N-14 XN, VL1, and : neural networks (1) Li X et al., Genome Informat. 9:201-213 (1999) (2) Romero P et al., Proteins 42:38-48 (2001) Input features: XN: 8 VL1: 10 XC: 8
12
Inputs for PONDR ® VL-XT XNCoordination No. VVIYFWMNHDPEVK-- VL1Coordination No. Net chargeWFYWYFDEKR XCCoordination No. HydropathyVIYFWMTH-PEVK-R Accuracy (ACC) = (% Corr-O)/2 + (%Corr-D)/2 ACC ( estimated by cross-validation ) ~ 72 ± 4% Li X. et.al. Genome Informat. 9:201-213(1999) Romero P. et.al. Proteins 42:38-48(2001)
13
Disorder Prediction in CASP Critical Assessment of Structure Prediction http://predictioncenter.org CASP1(1994) to CASP9 (2010) Experimentalists provide amino acid sequences as they are determining the structures of proteins Groups register and make structure predictons After structures determined, predictions evaluated Disorder predictions introduced in CASP5 (2002) CASP PREDICTIONS ARE TRULY BLIND!!!
14
Disorder Prediction in CASP CASP5 (2002), sensitivity replaced AUC VSL2 PreDisorder
15
Our Performance in CASP Used VL-XT, poor on short disordered regions in CASP5, but very well on long disordered regions. VL trained mainly on long disordered regions. Changed predictor in CASP6 and CASP7, new predictor ranked #1. Big improvement !! Did not participate in CASP 8, but would not have ranked #1 with current predictors. What was change that led to large improvement in CASP6??
16
Predictors of Natural Disordered Regions PONDR ® VL-XT and PONDR ® VSL2 (1) Li X et al., Genome Informat. 9:201-213 (1999) (2) Romero P et al., Proteins 42:38-48 (2001) (3) Peng K et al., BMC Bioinfo. 7:208 (2006) N (1) C (1) VL1 (2) VL-XT (2) 11 14 N-11 N-14 VL2 (3) VS2 (3) VSL2 (3) OMOM 1-O M OLOL OSOS VSL2 Score = O L ×O M + O S ×(1-O M ) M1 (3) N, VL1, and C are neural networks N-term: 8 inputs VL1: 10 inputs C-term: 8 inputs M1, VSL2-L, and VSL2-S are support vector machines M1: 54 inputs VL2: 20 inputs VS2: 20 inputs
17
Comparison on CASP 8 Dataset Zhang P, et.al. (unpublished results; not quite same as CASP evaluation) ACC = 80% ACC = (%Corr-O)/2 + (%Corr-D)/2 AUC = Area Under Curve AUC = 0.89
18
(+) Disordered XPA (–) Structured PONDR ® VL-XT, PONDR ® VSL2B and PreDisorder Iakoucheva L et al., Protein Sci 3: 561-571 (2001) Dunker AK et al., FEBS J 272: 5129-5148 (2005) Deng X., et al., BMC Bioinformatics 10:436 (2009)
19
Published Predictors of Disordered Proteins He B, et al., Cell Res 19: 929-949 (2009) Year PONDRs: - VSL1: Ranked #1 in CASP 6 (2004); - VSL2: Ranked #1 in CASP 7 (2006); PONDRS # +,- / # phobics CASP 5 6 7 8
20
Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research
21
How Abundant are IDRs/IDPs? To Estimate Abundance of IDPs/IDRs: predict on whole proteomes from many organisms. ALERT!! Lack of membrane-protein-specific disorder predictors means that Estimates of disorder will be too low by a small percentage.
22
Organisms# Orgs. # Proteins Avg. # Proteins % Disordered AA %Proteins IDR >30 %Proteins Natively Unfolded Archaea73536 – 4234 219912.5 – 37.2% 0 – 60.0% 3.2 – 31.5% Bacteria951182 – 9320 333112.0 – 36.1% 11.5 – 53.7% 2.7 – 29.2% Single-cell Eukarya 581909 – 16365 909822.3 – 49.9% 17.0 – 76.8% 16.8 – 47.6% Multi-cell Eukarya 511775 – 35942 1129510.4 – 49.0% 4.4 – 66.5% 6.9 – 48.7% VSL2 Prediction of Abundance** of Intrinsically Disordered Proteins **Are organism-specific predictors sometimes needed?
23
Archaea Phylogenetic Tree >30% >21% >14% Todd Lowe (http://archaea.ucsc.edu/) >17% <14%
24
Predicted Disorder vs. Proteome Size
25
Why So Much Disorder? Hypothesis: Disorder Used for Signaling Sequence Structure Function – Catalysis, – Membrane transport, – Binding small molecules. Sequence Disordered Ensemble Function – Signaling, – Regulation, Dunker AK, et al., Biochemistry 41: 6573-6582 (2002) – Recognition, Dunker AK, et al., Adv. Prot. Chem. 62: 25-49 (2002 ) – Control. Xie H, et al., Proteome Res. 6: 1882-1932 (2007)
26
Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites Importance of bioinformatics to IDP research
27
A New Order / Disorder AA Scale, Part 1 Collect equal numbers of O and D windows of length 21. Calculate the value of attribute, x, for each window. For each interval of x, count how many windows are O and D; from this, determine P (O I x) and P (D I x) Plot P (O I x) and P (D I x) versus x. Determine the areas between the two curves. Area Ratio Value = (area between curves / total area) Apply to 517 aa scales: http://www.genome.jp/aaindex.http://www.genome.jp/aaindex Rank scales from smallest to largest Campen A, et al Protein Pept Lett 15: 956-963 (2008)
28
A New Order / Disorder AA Scale, Part 2 Overall idea: make random changes to a scale, test for higher ARV, repeat until no larger value is found. Genetic Algorithm Pseudocode: –Choose initial population –Repeat –Evaluate the fitness of each individual –Select a certain portion of best-ranking individuals –Breed new population through crossover + mutation –Until terminating condition ARV value improved from 0.69 for best of 517 scales to 0.76 for new scale, called TOP-ID Campen A, et al Protein Pept Lett 15: 956-963 (2008)
29
P (D l x) and P (O I x) Versus x Plots: Area Between Curves Used to Rank Attributes, X Flexibility Positive Charge Extracellular Protein AA Composition TOP-IDP Campen A et al., Protein & Peptide Lett 15: 956-963 (2008) ARV = 0.69, Rank = #1/517 ARV = 0.07, Rank #517/517 ARV = 0.36, Rank = #238/517 ARV = 0.76
30
Analysis of the disorder propensity in p53 by Top-IDP (A), PONDR® VLXT (B) and PONDR® VSL1 (C).
31
Chronology of Amino Acid Evolution DISORDER TO ORDER, NON-LIFE TO LIFE Di Mauro E, et al., in Genesis: Origin of Life on Earth and Other Planets (In press)
32
Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research
33
New Phosphorylation Predictor KNN – similarity to known sites (+ / -) of phosphorylation Disorder Scores – used VSL2 AA frequencies – at sequence positions before and after phophorylation sites Gao J et al Mol and Cell Proteomics (In press)
34
Disorder Score vs. Phosphorylation Gao J et al., Mol & Cell Proteomics 9 (Epub) (2010) Residue Positions 91.3% > 0.5 87.6% > 0.5 54.9% > 0.5 50.5% > 0.5
35
Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research
36
Signaling Example 1: Calcineurin and Calmodulin A-Subunit B-Subunit Autoinhibitory Peptide Active Site Kissinger C et al., Nature 378:641-644 (1995) Meador W et al., Science 257: 1251-1255 (1992)
37
Example 2: p27kip1: A Disordered Domain Cyclin A CDK p27kip1 3D Structure: Russo AA et al., Nature 382: 325-331 (1996) DD: Tompa P et al., Bioessays 4: 328-340 (2008) 25 93 (69 residues)
38
The p27kip1 Disordered Domain: Used for Signal Integration Y 88 T 187 pY 88 T 187 ATP pY 88 pT 187 Ub’n pY 88 pT 18 7 ♦ ♦♦ ♦ ? ? ? 1 2 3 4 1. NRTK phosphorylation @ Y88, signal #1. 2. Intra-molecular phosphorylation @ T187, #2. 3. Ubiquitination @ several possible loci, #3. 4. Proteasome digestion of p27, then cell cycle progression. Galea CA et al., J Mol Biol 376: 827-838 (2008) Dunker AK & Uversky VN, Nat Chem Biol 4: 229-230 (2008)
39
Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research
40
Importance of Bioinformatics to IDP and Protein Research Thousands of IDPs and IDRs have been found. Not one IDP or IDR is discussed in any current biochemistry textbook! Why? - IDPs and IDRs don’t fit Sequence Structure Function New paradigm developed from bioinformatics Sequence Disordered Ensemble Function IDP prediction is changing fundamental views of structure-function relationships!
41
Thank You ! ! !
42
Collaborators Indiana University Bin Xue Jake Chen Bill Sullivan Predrag Radivojac Jennifer Chen Pedro Romero Marc Cortese Derrick Johnson Chris Oldfield Amrita Mohan Yunlong Liu Ann Roman Tom Hurley Anna DePaoli-Roach Yuro Tagaki Siama Zaidi Jingwei Meng Wei-Lun Hsu Hua Lu Fei Huang Vladimir Uversky UCSD Lilia Iakoucheva Sebat Temple University Zoran Obradovic Slobodan Vucetic Vladimir Vacic Kang Peng Hiongbo Xie Siyuan Ren Uros Midic Enzyme Institute Peter Tompa Zsuzsanna Dosztanyi Istvan Simon Monika Fuxreiter USU Robert Williams Harbin Engineering University Bo He Kejun Wang University of Idaho Celeste J. Brown Chris Williams Molecular Kinetics Yugong Cheng Tanguy LeGall Aaron Santner Plant and Food Research Xaiolin Sun USF Gary Daughdrill Wright State University Oleg Paliy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.