Madhavi Ganapathiraju Graduate student Carnegie Mellon University

Slides:

Advertisements

Similar presentations

Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,

Advertisements

Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.

Secondary structure prediction from amino acid sequence.

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.

Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp

PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.

Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.

Protein structure (Part 2 of 2).

Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.

The Protein Data Bank (PDB)

CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.

Structure Prediction in 1D

Similar Sequence Similar Function Charles Yan Spring 2006.

Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.

Protein Structures.

Protein Structure Prediction and Analysis

Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics.

TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,

Developing Pairwise Sequence Alignment Algorithms

Protein Tertiary Structure Prediction

CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.

Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.

Rising accuracy of protein secondary structure prediction Burkhard Rost

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.

BINF6201/8201 Hidden Markov Models for Sequence Analysis

Predicting the Cellular Localization Sites of Proteins Using Decision Tree and Neural Networks Yetian Chen

Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.

Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

Secondary structure prediction

TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.

2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”

Construction of Substitution Matrices

Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran 1,2, Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics,

Protein Secondary Structure Prediction G P S Raghava.

Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.

1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.

Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.

Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.

Fehérjék 3. Simon István. p27 Kip1 IA 3 FnBP Tcf3 Bound IUP structures.

Construction of Substitution matrices

Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.

Emidio Capriotti, Piero Fariselli and Rita Casadio Biocomputing Unit

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.

Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.

Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.

Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)

BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S

Neural Network Architecture Session 2

Challenges in Creating an Automated Protein Structure Metaserver

Prediction of RNA Binding Protein Using Machine Learning Technique

Extra Tree Classifier-WS3 Bagging Classifier-WS3

חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף

Support Vector Machine (SVM)

Protein Structures.

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

Homology Modeling.

Volume 19, Issue 7, Pages (July 2011)

Protein structure prediction.

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

Neural Networks for Protein Structure Prediction Dr. B Bhunia.

Presentation transcript:

Madhavi Ganapathiraju Graduate student Carnegie Mellon University TM PRO & Comparison of Algorithms for “Protein Stability Prediction Upon Mutations” Madhavi Ganapathiraju Graduate student Carnegie Mellon University

Overview TMpro evaluations on PDBTM, TMPDB and MPTOPO are complete Additional inputs to TMPro are being studied Yule values (not successful) Evolutionary Profile (promising) TMPro website has been completed Evaluation of algorithms to predict protein stability changes upon mutations

Part 1: TM pro

TMPro Evaluations Segment Residue level Method  Qok F Score Recall Precision Q2 Misclassified as Soluble MPtopo (101 TM proteins) 2a TMHMM 66 91 89 94 84 5 2b TMpro NN 60  93  92 79 PDBTM (191 TM proteins) 3a 68 90 13 3b 57 81 2

is fully functional! Competition TMPro web-server is fully functional! Competition for TMpro Logo Prize: See your logo on the web!

Attempts to overcome confusion with globular soluble helices (1) Yule value features to be added Yule value features that discriminate amino acid neighbor propensities between TM and nonTM helices were computed earlier Tried to add these features as input to NN predictor, but could not achieve quantitative improvement I will discuss this in future when I have any results to present

Attempts to overcome confusion with globular soluble helices (2) Evolutionary profile information It is known that knowledge of evolutionary profile of a protein can improve prediction accuracy to a great extent TMPro is capable of predicting TMs without requiring knowledge of profile Useful when you cannot extract sequence alignments from known proteins But where profile is known, we would like to use that additional information

Profile generation Get multiple sequence alignments Those of you who have worked with evolutionary analysis before, please give feedback Get multiple sequence alignments Compute position specific scoring matrix for each protein 21 rows (20 amino acids, and 1 row for gaps) Profile is generated for each protein in the training and test sets PSSM (i,j) = log(C(i,j)/total counts at position j) log(C(i,j)/unigram count of i in the protein)

Doubts We have labels for training sequences What labels to assign to gaps? We have labels for training sequences But when original sequence has gaps when aligned, how to interpret the labels of the gaps? --n------n----n------nnn-----n------n-----------------M----- 2a65 369 --D------E----L------KLS-----R------K-----------------H----- 377 2A65_A 369 --.------.----.------...-----.------.-----------------.----- 377 AAC07817 369 --.------.----.------...-----.------.-----------------.----- 377 YP_001956 364 --E------S----F------G.K-----.------.-----------------T----- 372 -M------M------M------M-------M----------M---------MM------- 2a65 378 -A------V------L------W-------T----------A---------AI------- 385 2A65_A 378 -.------.------.------.-------.----------.---------..------- 385 AAC07817 378 -.------.------.------.-------.----------.---------..------- 385 YP_001956 373 -S------C------.-----------------------------------IL------- 377 Even TM regions are having gaps such as shown above

What do with missing segment info for some sequences Doubts What do with missing segment info for some sequences When nothing is shown (gap/alignment) for some sequences, I am counting those as gaps XP_659910 47 L-......K.----------...KAP----RSNQV.-..FVAGTMGLASAVGA.AT 86 AAW43619 100 .....A..A-----------KNP----NTTRNV-..FMVGALGALGASSV.ST 136 CAB59195 59 ----.N.RP.-A..VIGSARFAYMAWTRVA 83 XP_466001 107 SKRA.-A.FVLSGGRFIYASLLRLL 130 AAA20832 103 SKRA.-A.FVLTGGRFVYASLVRLL 126

Using profile for prediction Studied independent of TMpro Neural network with 21 input, 21 hidden and 1 output neurons Residue Number (nonmembrane=0, membrane =1) Predicted output Experimental observed locations of TM helices

Another output

NN architecture needs to be modified But instead I did post-processing of Neural network output Computed Wavelet Transform Mexican hat wavelet, scale = 10

Some more wavelet outputs Note that these are from the training data itself.. Yet to check how it performs overall

Part 2: Stability upon Mutations

Evaluation of predictions of protein stability changes upon mutations Effects of mutations on 2 TM proteins are available in our group The two proteins are rhodopsin and bacteriorhodopsin Data available for how much mis-folding occurs How stability of protein is affected There are algorithms that can also predict these changes We compared how accurate or reliable the prediction methods are, by comparing their results with our experimental data

3 Prediction algorithms I mutant 2.0 Support vector machine Features: amino acid neighbors in 9nm sphere, temperature, pH, relative solvent accessibility surface are http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi DFIRE Knowledge based statistical potentials http://phyyz4.med.buffalo.edu/hzhou/mutation.html FOLDX Statistical mechanics.. Account for various energy terms http://fold-x.embl-heidelberg.de:1100/

Authors’ claims in 3 papers

Our results Rhodopsin (PDB: 1U19) Bacteriorhodopsin (PDB: 1QM8)

Bias in # of mutations that increase/decrease stability Database bias affects apparent accuracies of algorithms I-mutant for example, predicts decrease in stability for a majority of the mutations. Whether the mutations studied through experiments preserve the natural bias of decreasing stability mutations, affects the apparent accuracy of the prediction algorithms

Correlation with known data Reported correlations for these methods are quite large (>0.7) On data compared here the correlations are quite low

Notes .. Local installation of blast and netblast are on cologne: /usr1/blast-2.2.13/ /usr1/netblast-2.2.13/ Java SDK on Cologne /usr1/j2sdk1.4.2_11/

Acknowledgements Judith Klein-Seetharaman Christopher Jon Jursa Pitt Information sciences (for developing web interface)