C enter For C omputational B iology and B ioinformatics Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular.

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005.
Presented at: Pacific Symposium on Biocomputing January 3, 2012.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Thomas Blicher Center for Biological Sequence Analysis
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
C enter For C omputational B iology and B ioinformatics Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor.
Protein Tertiary Structure Prediction
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.
Predicting protein disorder Peter Tompa Institute of Enzymology Hungarian Academy of Sciences Budapest, Hungary.
Makromolekulak_2010_12_07 Simon István. Prion protein.
Introduction Prev i ous Work Method Backup Acknowledgments Reference Background Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan.
Prediction of protein disorder Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest,
PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry.
Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungary
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所.
Intrinsically disordered proteins: drug development and the most interesting examples Peter Tompa Institute of Enzymology Hungarian Academy of Sciences.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.
Protein Disordered Regions and the Evolution of Eukaryotes Allan Wu Phar 201 Phil Bourne.
1 Web Site: Dr. G P S Raghava, Head Bioinformatics Centre Institute of Microbial Technology, Chandigarh, India Prediction.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Intrinsic Disorder and Functional Proteomics Predrag Radivojac, Lilia M. Iakoucheva, Christopher J. Oldfield, Zoran Obradovic, Vladimir N. Uversky, A.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Fehérjék 3. Simon István. p27 Kip1 IA 3 FnBP Tcf3 Bound IUP structures.
Protein Folding recognition with Committee Machine Mika Takata.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Stephen Altschul National Center for Biotechnology Information
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Intrinsically disordered proteins Zsuzsanna Dosztányi EMBO course Budapest, 3 June 2016.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semestre 3, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) e-documents:
University of California at San Diego
The V3 Region Expresses More Diversity in Amino Acid Sequence in AIDS Diagnosed Patients than in Non-Trending and AIDS Progressing Patients HIV Structure.
Prediction of RNA Binding Protein Using Machine Learning Technique
Extra Tree Classifier-WS3 Bagging Classifier-WS3
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Predicting Active Site Residue Annotations in the Pfam Database
University of California at San Diego
Support Vector Machine (SVM)
1 Department of Engineering, 2 Department of Mathematics,
Genomes and Their Evolution
binding sites 58 of the 473 unambiguously assigned phosphorylation sites are predicted by Scansite to be sites for binding. 50 of these correspond.
Distribution of disorder in the cytosolic phosphoproteome
1 Department of Engineering, 2 Department of Mathematics,
Phosphorylation and sequence disorder in microtubule-associated protein Tau.A, schematic illustration of the domain profile of Tau with all known phosphorylation.
1 Department of Engineering, 2 Department of Mathematics,
Directed Mutagenesis and Protein Engineering
Protein Disorder Prediction
Combining Predictors for Short and Long Protein Disorder
Department of Chemical and Systems Biology
Volume 26, Issue 1, Pages e2 (January 2018)
Chloe Jones, Isabel Gonzaga, and Nicole Anguiano
Presentation transcript:

C enter For C omputational B iology and B ioinformatics Bioinformatics and Intrinsically Disordered Proteins (IDPs) A. Keith Dunker Biochemistry and Molecular Biology & Center for Computational Biology / Bioinformatics Indiana University School of Medicine Presented at: October 22, 2010

Outline What are “Intrinsically Disordered Proteins” ? Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research

Definitions: Intrinsically Disordered Proteins (IDPs) and ID Regions (IDRs) Whole proteins and regions of proteins are intrinsically disordered if they lack stable 3D structure under physiological conditions, But exist instead as highly dynamic, rapidly interconverting ensembles without particular equilibrium values for their coordinates or bond angles and with non- cooperative conformational changes.

Outline What are “Intrinsically Disordered Proteins” ? Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research

Why are IDPs / IDRs unstructured? From the 1950s to now, >> 1,000 IDPs / IDRs studied and characterized Visit: Why do IDPs & IDRs lack structure? –Lack a ligand or partner? –Denatured during isolation? –Folding requires conditions found inside cells? –Lack of folding encoded by amino acid sequence?

Amino Acid Compositions Surface Buried

Why are IDPs / IDRs unstructured? To a first approximation, amino acid composition determines whether a protein folds or remains intrinsically disordered. Given a composition that favors folding, the sequence details determine which fold. Given a composition that favors not folding, the sequence details provide motifs for biological function.

Outline What are “Intrinsically Disordered Proteins” ? Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research

Prediction of Intrinsic Disorder Predictor Validation on Out-of-Sample Data Prediction Attribute Selection or Extraction Separate Training and Testing Sets Predictor Training Ordered / Disordered Sequence Data Aromaticity, Hydropathy, Charge, Complexity Neural Networks, SVMs, etc.

First Machine-learning Predictor SDR/MDR/LDR Predictors 1.Short Disordered Regions (SDR): 7 – 21 missing AA Medium Disordered Regions (MDR): 22 – 44 Long Disordered Regions (LDR): 45 or more 2.SDR / MDR / LDR predictors: Neural networks 3.Training dataset: proteins with missing AA SDR: 34 proteins, 11,050 AA, 38 IDR, 411 IDAA MDR: 20 proteins, 4,764 AA, 22 IDR, 464 IDAA LDR: 7 proteins, 2,069 AA, 7 IDR, 465 IDAA 4.Feature selection: standard sequential forward selection 5.Accuracy: 59 – 67% estimated by 5-cross validation 6.Better than chance; Better on self than on not self Romero P, et.al. Proc. IEEE International Conference on Neural Networks. 1:90-95 (1997)

Next: PONDR®VL-XT XN (1) XC (1) VL1 (2) VL-XT (2) N-11 N-14 XN, VL1, and : neural networks (1) Li X et al., Genome Informat. 9: (1999) (2) Romero P et al., Proteins 42:38-48 (2001) Input features: XN: 8 VL1: 10 XC: 8

Inputs for PONDR ® VL-XT XNCoordination No. VVIYFWMNHDPEVK-- VL1Coordination No. Net chargeWFYWYFDEKR XCCoordination No. HydropathyVIYFWMTH-PEVK-R Accuracy (ACC) = (% Corr-O)/2 + (%Corr-D)/2 ACC ( estimated by cross-validation ) ~ 72 ± 4% Li X. et.al. Genome Informat. 9: (1999) Romero P. et.al. Proteins 42:38-48(2001)

Disorder Prediction in CASP Critical Assessment of Structure Prediction CASP1(1994) to CASP9 (2010) Experimentalists provide amino acid sequences as they are determining the structures of proteins Groups register and make structure predictons After structures determined, predictions evaluated Disorder predictions introduced in CASP5 (2002) CASP PREDICTIONS ARE TRULY BLIND!!!

Disorder Prediction in CASP CASP5 (2002), sensitivity replaced AUC VSL2 PreDisorder

Our Performance in CASP Used VL-XT, poor on short disordered regions in CASP5, but very well on long disordered regions. VL trained mainly on long disordered regions. Changed predictor in CASP6 and CASP7, new predictor ranked #1. Big improvement !! Did not participate in CASP 8, but would not have ranked #1 with current predictors. What was change that led to large improvement in CASP6??

Predictors of Natural Disordered Regions PONDR ® VL-XT and PONDR ® VSL2 (1) Li X et al., Genome Informat. 9: (1999) (2) Romero P et al., Proteins 42:38-48 (2001) (3) Peng K et al., BMC Bioinfo. 7:208 (2006) N (1) C (1) VL1 (2) VL-XT (2) N-11 N-14 VL2 (3) VS2 (3) VSL2 (3) OMOM 1-O M OLOL OSOS VSL2 Score = O L ×O M + O S ×(1-O M ) M1 (3) N, VL1, and C are neural networks N-term: 8 inputs VL1: 10 inputs C-term: 8 inputs M1, VSL2-L, and VSL2-S are support vector machines M1: 54 inputs VL2: 20 inputs VS2: 20 inputs

Comparison on CASP 8 Dataset Zhang P, et.al. (unpublished results; not quite same as CASP evaluation) ACC = 80%  ACC = (%Corr-O)/2 + (%Corr-D)/2 AUC = Area Under Curve AUC = 0.89 

(+) Disordered XPA (–) Structured PONDR ® VL-XT, PONDR ® VSL2B and PreDisorder Iakoucheva L et al., Protein Sci 3: (2001) Dunker AK et al., FEBS J 272: (2005) Deng X., et al., BMC Bioinformatics 10:436 (2009)

Published Predictors of Disordered Proteins He B, et al., Cell Res 19: (2009) Year PONDRs: - VSL1: Ranked #1 in CASP 6 (2004); - VSL2: Ranked #1 in CASP 7 (2006); PONDRS # +,- / # phobics CASP

Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research

How Abundant are IDRs/IDPs? To Estimate Abundance of IDPs/IDRs: predict on whole proteomes from many organisms. ALERT!! Lack of membrane-protein-specific disorder predictors means that Estimates of disorder will be too low by a small percentage.

Organisms# Orgs. # Proteins Avg. # Proteins % Disordered AA %Proteins IDR >30 %Proteins Natively Unfolded Archaea73536 – – 37.2% 0 – 60.0% 3.2 – 31.5% Bacteria – – 36.1% 11.5 – 53.7% 2.7 – 29.2% Single-cell Eukarya – – 49.9% 17.0 – 76.8% 16.8 – 47.6% Multi-cell Eukarya – – 49.0% 4.4 – 66.5% 6.9 – 48.7% VSL2 Prediction of Abundance** of Intrinsically Disordered Proteins **Are organism-specific predictors sometimes needed?

Archaea Phylogenetic Tree >30% >21% >14% Todd Lowe ( >17% <14%

Predicted Disorder vs. Proteome Size

Why So Much Disorder? Hypothesis: Disorder Used for Signaling Sequence  Structure  Function – Catalysis, – Membrane transport, – Binding small molecules. Sequence  Disordered Ensemble  Function – Signaling, – Regulation, Dunker AK, et al., Biochemistry 41: (2002) – Recognition, Dunker AK, et al., Adv. Prot. Chem. 62: (2002 ) – Control. Xie H, et al., Proteome Res. 6: (2007)

Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites Importance of bioinformatics to IDP research

A New Order / Disorder AA Scale, Part 1 Collect equal numbers of O and D windows of length 21. Calculate the value of attribute, x, for each window. For each interval of x, count how many windows are O and D; from this, determine P (O I x) and P (D I x) Plot P (O I x) and P (D I x) versus x. Determine the areas between the two curves. Area Ratio Value = (area between curves / total area) Apply to 517 aa scales: Rank scales from smallest to largest Campen A, et al Protein Pept Lett 15: (2008)

A New Order / Disorder AA Scale, Part 2 Overall idea: make random changes to a scale, test for higher ARV, repeat until no larger value is found. Genetic Algorithm Pseudocode: –Choose initial population –Repeat –Evaluate the fitness of each individual –Select a certain portion of best-ranking individuals –Breed new population through crossover + mutation –Until terminating condition ARV value improved from 0.69 for best of 517 scales to 0.76 for new scale, called TOP-ID Campen A, et al Protein Pept Lett 15: (2008)

P (D l x) and P (O I x) Versus x Plots: Area Between Curves Used to Rank Attributes, X Flexibility Positive Charge Extracellular Protein AA Composition TOP-IDP Campen A et al., Protein & Peptide Lett 15: (2008) ARV = 0.69, Rank = #1/517 ARV = 0.07, Rank #517/517 ARV = 0.36, Rank = #238/517 ARV = 0.76

Analysis of the disorder propensity in p53 by Top-IDP (A), PONDR® VLXT (B) and PONDR® VSL1 (C).

Chronology of Amino Acid Evolution DISORDER TO ORDER, NON-LIFE TO LIFE Di Mauro E, et al., in Genesis: Origin of Life on Earth and Other Planets (In press)

Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research

New Phosphorylation Predictor KNN – similarity to known sites (+ / -) of phosphorylation Disorder Scores – used VSL2 AA frequencies – at sequence positions before and after phophorylation sites Gao J et al Mol and Cell Proteomics (In press)

Disorder Score vs. Phosphorylation Gao J et al., Mol & Cell Proteomics 9 (Epub) (2010) Residue Positions 91.3% > % > % > % > 0.5

Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Why don’t IDPs form structure? –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research

Signaling Example 1: Calcineurin and Calmodulin A-Subunit B-Subunit Autoinhibitory Peptide Active Site Kissinger C et al., Nature 378: (1995) Meador W et al., Science 257: (1992)

Example 2: p27kip1: A Disordered Domain Cyclin A CDK p27kip1 3D Structure: Russo AA et al., Nature 382: (1996) DD: Tompa P et al., Bioessays 4: (2008) 25  93  (69 residues)

The p27kip1 Disordered Domain: Used for Signal Integration Y 88 T 187 pY 88 T 187 ATP pY 88 pT 187 Ub’n pY 88 pT 18 7 ♦ ♦♦ ♦ ? ? ? NRTK Y88, signal #1. 2. Intra-molecular T187, #2. 3. several possible loci, #3. 4. Proteasome digestion of p27, then cell cycle progression. Galea CA et al., J Mol Biol 376: (2008) Dunker AK & Uversky VN, Nat Chem Biol 4: (2008)

Outline What are “Intrinsically Disordered Proteins” (IDPs) Bioinformatics Applications to IDPs –Predicting IDPs from amino acid sequence –Some important results from IDP prediction –An improved order / disorder amino acid scale –Predicting phosphorylation sites –Disorder and function: two examples Importance of bioinformatics to IDP research

Importance of Bioinformatics to IDP and Protein Research Thousands of IDPs and IDRs have been found. Not one IDP or IDR is discussed in any current biochemistry textbook! Why? - IDPs and IDRs don’t fit Sequence  Structure  Function New paradigm developed from bioinformatics Sequence  Disordered Ensemble  Function IDP prediction is changing fundamental views of structure-function relationships!

Thank You ! ! !

Collaborators Indiana University Bin Xue Jake Chen Bill Sullivan Predrag Radivojac Jennifer Chen Pedro Romero Marc Cortese Derrick Johnson Chris Oldfield Amrita Mohan Yunlong Liu Ann Roman Tom Hurley Anna DePaoli-Roach Yuro Tagaki Siama Zaidi Jingwei Meng Wei-Lun Hsu Hua Lu Fei Huang Vladimir Uversky UCSD Lilia Iakoucheva Sebat Temple University Zoran Obradovic Slobodan Vucetic Vladimir Vacic Kang Peng Hiongbo Xie Siyuan Ren Uros Midic Enzyme Institute Peter Tompa Zsuzsanna Dosztanyi Istvan Simon Monika Fuxreiter USU Robert Williams Harbin Engineering University Bo He Kejun Wang University of Idaho Celeste J. Brown Chris Williams Molecular Kinetics Yugong Cheng Tanguy LeGall Aaron Santner Plant and Food Research Xaiolin Sun USF Gary Daughdrill Wright State University Oleg Paliy