1 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting A Computational.

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
05/27/2006 Modeling and Determining the Structures of Proteins and Macromolecular Assemblies Depts. of Biopharmaceutical Sciences and Pharmaceutical Chemistry.
Protein Tertiary Structure Prediction
Structural bioinformatics
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.
Protein Tertiary Structure Prediction
110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture 23  Protein Tertiary Structure Prediction #23_Oct15.
1 correlating graph-theoretical centrality indices with interface residue propensity or: where do things stick together? Stefan Maetschke Teasdale Group.
Bioinformatics and Computational Biology Graduate Program Carla Mann December 11, 2014 Rocky Mountain Bioinformatics Conference Snowmass, CO RNABindRPlus.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
School of Pharmacy Medical University of Sofia
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Computational prediction of protein-protein interactions Rong Liu
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
CSBSI 2007 Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Department of Computer Science Generating.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17.
Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
B IOINFORMATICS AND C OMPUTATIONAL B IOLOGY A Computational Method to Identify RNA Binding Sites in Proteins Jeff Sander Iowa State University Rocky 2006.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Structural proteomics Handouts. Proteomics section from book already assigned.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Bioinformatics and Computational Biology
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
 Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that.
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Extra Tree Classifier-WS3 Bagging Classifier-WS3
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Combining HMMs with SVMs
Support Vector Machine (SVM)

A Solution to Limited Genomic Capacity: Using Adaptable Binding Surfaces to Assemble the Functional HIV Rev Oligomer on RNA  Matthew D. Daugherty, Iván.
Presentation transcript:

1 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting A Computational Method to Identify Amino Acid Residues in RNA-protein Interactions Michael Terribilini & Jae-Hyung Lee Cornelia Caragea, Deepak Reyon, Ben Lewis, Jeffry Sander, Robert Jernigan, Vasant Honavar and Drena Dobbs Bioinformatics and Computational Biology Program Center for Computational Intelligence, Learning, and Discovery L.H. Baker Center for Bioinformatics & Biological Statistics BCB NSF IGERT

2 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting PROBLEM: Given the sequence of a protein (& possibly its structure), predict which amino acids participate in protein-RNA interactions APPROACH: Generate datasets of known complexes from PDB to train & test machine learning algorithms (Naïve Bayes, SVM, etc.) GOAL: Classify each amino acid in target protein as either interface or non-interface residue  Guiding hypothesis: Principal determinants of protein binding sites are reflected in local sequence features  Observation: Binding site residues are often clustered within primary amino acid sequence

3 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Sequence-Based Classifier: RB181 non-redundant dataset: 181 protein-RNA complexes from the PDB Input: window of amino acid identities centered on target & contiguous in protein sequence Classifier: Naïve Bayes Leave-one-out cross validation QSVSTSSFRYM Ser 28 Structure-Based Classifier: Calculate distance between each pair of residues in known structure Input: identities of the nearest n spatial neighbors Classifier: Naïve Bayes Leave-one-out cross validation SSFRLNKSGRT Ser 28 PSSM-Based Classifier: PSI-BLAST against NCBI nr database to generate PSSMs Input: PSSM vectors for residues contiguous in sequence Classifier: Support Vector Machine (SVM)‏ 10-fold cross validation Ser 28 -3,7,8,… 5,-4,-6, … … … …,5,9,-1,… QSVSTSSFRYM 20 PROBLEM: Given the sequence of a protein (& possibly its structure), predict which amino acids participate in protein-RNA interactions

4 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Dataset of RNA-protein Interface Residues Extract All Protein-RNA Complexes Select high resolution structures < 3.5Å Res PDB 503 Complexes 181 Chains 48,791 Residues Filter using PISCES < 30% pair-wise sequence identity Identify Interface Residues using distance cutoff 5 Å 7,456 Interface Residues (Positive examples) 41,335 Non-Interface Residues (Negative examples) PISCES: Wang and Dunbrack, 2003 Bioinformatics, 19:1589

5 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting ComplexProtein-ProteinProtein-DNAProtein-RNA Classifier 2-stage classifier SVM + Naïve Bayes Naïve Bayes Accuracy72 %77 %85 % Specificity58 %37 %51 % Sensitivity39 %43 %38 % Correlation coefficient Reference Yan et al., 2004 Bioinformatics Yan et al., 2006 BMC Bioinformatics Terribilini et al., 2006 RNA Related work Jones & Thornton, Ofran & Rost many others Jones et al., Thornton et al., Ahmad & Sarai Jeong et al., Miyano et al., Go et al. Performance in predicting interface residues Using only protein sequence as input

6 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Na ï ve Bayes 2-stage classifier SVM + Na ï ve Bayes Protein-RNAProtein-DNAProtein-Protein Yan Bioinformatics 2004 ; Yan BMC Bioinformatics 2006 ; Terribilini RNA 2006 Ab FabN10 Acc = 87% CC = 0.65 Repressor Acc = 88% CC = 0.66 dsRNA Binding Protein Acc = 86% CC = 0.59 A few "good" predictions mapped onto structures Using only protein sequence as input

7 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting ID-SeqID-StructID-PSSMCombined 1 Specificity Sensitivity Accuracy Correlation Coefficient AUC of ROC Combining Sequence, Structure & PSSM-Based Classifiers Improves Prediction of RNA-Binding Residues Predictions illustrated on 3D structures: 30S ribosomal protein S17 (PDB ID 1FJG:Q)‏ Sequence-BasedStructure-BasedPSSM-Based Combined (For clarity, bound RNA is not shown) TP = True Positive = interface residues predicted as such FP = False Positive = non-interface residues predicted as interface residues TN = True Negative = non-interface residues predicted as such FN = False Negative = interface residues predicted as non-interface Combined Results for 1FJG:Q: Spec+ = 0.89 Sens+ = 0.96 Accuracy = 0.91 Correlation Coefficient = Specificity (Precision for the positive, RNA-binding class) 2 Sensitivity (Recall for the positive, RNA-binding class) 3 Area Under the Curve (AUC) from a Receiver Operating Characteristic (ROC) curve

8 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting IDSeq Predictions Accuracy = 80% Specificity = 56% Sensitivity = 21% CC = 0.25 Combined Predictions Accuracy = 82% Specificity = 55% Sensitivity = 75% CC = 0.52 Predictions for Signal Recognition Particle 19kDa protein (PDB ID 1JID_A)

9 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting RNABindR: An RNA Binding Site Prediction Server

10 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Applications Lentiviral Rev proteins Telomerase Reverse Transcriptase (TERT)

11 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Rev - a potential target for novel HIV therapies Rev is a multifunctional regulatory protein that plays an essential role in the production of infectious virus A small nucleo-plasmic shuttling protein (HIV Rev 115 aa; EIAV Rev 165 aa) Recognizes a specific binding site on viral RNA Rev Responsive Element (RRE) Contains specific domains that mediate nuclear localization, RNA binding and nuclear export Rev's critical role in lentiviral replication makes it an attractive target for antiviral (AIDs) therapy

12 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Why? –Rev aggregates at concentrations needed for NMR or X- ray crystallography –The only high resolution information available is for short peptide fragments of HIV-1 Rev: a 22 amino acid fragment of Rev bound to a 34 nucleotide RRE RNA fragment What about insights from sequence comparisons? –HIV Rev sequence has low sequence identity with proteins with known structure –Very little sequence similarity among different Rev family members (e.g., EIAV vs HIV < 10%) Problem: no high resolution Rev structure! - not even for HIV Rev, despite intense effort

13 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting HIV-1 Rev: Predictions vs Experiments Prediction on RNA-binding protein HIV-1 Rev DTRQARRNRR RRWRERQRAA AA Actual IR Predicted Sequence based prediction on HIV-1 Rev (not included in the training set) identified every interface residue, plus 3 false positives Predicted Actual NMR structure (1ETF:B): 22 aa Rev peptide bound to RNA Battiste et al., 1996,Science 273:1547 Interface residues = red Non-interface residues = grey RNA = green

14 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting PREDICTED: Structure Protein binding residues RNA binding residues KRRRK RRDRW EIAV Rev: Predictions vs Experiments QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI GP L ESDQWCRV L RQS L PEEKISSQTCI RRDRW ERLE KRRRK NES NLS Lee J Virol 2006; Terribilini RNA 2006 VALIDATED: Protein binding residues RNA binding residues MBP WT Ihm Ho Carpenter

15 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting AADAA AALA KAAAK ERDE  RRDRW ERLE KRRRK NES NLS  KAAAK AADAA AALA ERDE WT  QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDS KRRRK HL ARRHLGPGPTQHTPS RRDRW IREQILQAEVLQ ERLE WRI GP L ESDQWCRV L RQS L PEEKISSQTCI Mutations in EIAV Rev: Experimental evaluation of RNA binding sites Lee J Virol 2006; Terribilini RNA 2006

16 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Summary KRRRK RRDRW HIV-1 Rev EIAV Rev Results show predicted protein & RNA binding sites in Rev proteins of HIV-1 & EIAV agree with available experimental data

17 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Telomerase Reverse Transcriptase (TERT) Functions: –“ Cap ” ends of chromosomes to prevent: – Recombination – End-to-end fusion – Degradation –Allow complete replication of chromosomes Interactions: Protein-DNA –Binds linear chromosome ends (& extends them) Protein-RNA –Telomerase reverse transcriptase (TERT) subunit contains an essential RNA component Protein-Protein –Dyskerin - component of active human telomerase complex –Many other interacting proteins: e.g., PPI1, RAP1, TEP1, HSP90 Lingner (1997) Science 276: Adapted from P. J. Mason

18 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Human TERT: Preliminary docking of 3 modeled domains Preliminary model (lacking TEN domain) Kurcinski Kolinski Kloczkowski

19 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Predicted vs Actual RNA-Binding Residue in Human TRBD PredictedActual

20 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Current & future work Future: –Experimentally interrogate protein-RNA interfaces suggested by this work –Investigate these interfaces as potential therapeutic targets Progress towards our Goals? √ Model TERT domains from human √ Dock domains to generate a complete model for TERT protein  Generate a working model for TERT-TR complex  Predict TR RNA tertiary structure, then dock with protein Underway…

21 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Conclusions A combined classifier that uses the query sequence plus additional information derived from the known structure & a PSSM generated using PSI-BLAST sequence homologs (trained and tested on RB181, a dataset of diverse protein-RNA interfaces), predicts interface residues with ~ 86% overall accuracy, CC = 0.43 Combining structure prediction with machine learning has potential to provide valuable insights into structure & function of important large RNP complexes - especially those for which high-resolution experimental structural information is not yet available Computational methods can provide insight into protein-RNA interfaces, even for "recalcitrant" proteins whose structures are not yet available

22 7/27/2008 Center for Computational Intelligence, Learning, and Discovery Bioinformatics and Computational Biology Program ROC 2008 meeting Acknowledgements Dobbs Iowa State University Drena Dobbs, BCB & GDCB –Michael Terribilini –Jeffry Sander –Peter Zaback –Deepak Reyon –Ben Lewis Kolinski University of Warsaw Andrzej Kolinski, Chemistry –Mateusz Iowa State University Andrzej Kloczkowski, BBMB Robert Jernigan, BBMBKai-Ming Ho, Physics Supported by: NSF IGERT Computational Molecular Biology USDA MGET Animal Genomics Iowa State University: Bioinformatics & Computational Biology Program (BCB) LH Baker Center for Bioinformatics & Biological Statistics Center for Integrated Animal Genomics (CIAG) Center for Computational Intelligence, Learning & Discovery (CILD) Honavar Iowa State University Vasant Honavar, BCB & Computer Science –Cornelia Washington State University Susan Carpenter, Vet Micro & UCLA Yungok Ihm, Biochemistry