Presentation is loading. Please wait.

Presentation is loading. Please wait.

TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.

Similar presentations


Presentation on theme: "TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith."— Presentation transcript:

1

2 TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith Klein-Seetharaman Carnegie Mellon University 6 th International Conference on Bioinformatics, Hong Kong, PR China, August 29 th, 2007

3 2 Outline Introduction Membrane proteins Transmembrane helix prediction Previous methods Drawbacks Amino acid properties Approach Algorithm Features and models Evaluations Web server IntroductionPropertiesApproachAlgorithmWeb ServerPrevious Methods

4 3 Membrane Proteins Important class of proteins Many important functions carried out by them Provide access to cell for drug targeting Embedded in the cell / organelle membrane Cell Membrane Membrane Protein Soluble Protein IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

5 4 Transmembrane Segment Characteristics Cytoplasm (Aqueous medium) Transmembrane 30Å hydrophobic core A helix has to be 19 residues long to go from one side to the other Extracellular (Aqueous medium) Side view Questions to be addressed by prediction algorithm How many transmembrane segments are there? Where are the transmembrane locations in primary sequence? IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

6 5 Transmembrane Helix Prediction Important protein family structure and function regions accessible from extracellular side Challenges Little available training data Overtraining Difficulty in discovery of novel architectures IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

7 6 Hydrophobicity scale 9 residue window average hydrophobicity Limitations: segment boundary unclear & low accuracy KD scale, GES scale, WW scale… IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm Kyte-Doolittle hydrophobicity profile

8 7 Current best methods use HMMs Limitations: too many parameters & restrictive topology Hidden Markov Model Methods (TMHMM) IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm actualpredicted Potassium channel

9 TMpro: property based algorithm for transmembrane helix prediction

10 9 Opportunities for Improvement Previous methods: Do not employ all possible property distributions Find average occurrences of amino acids Nonpolar residuesCharged Residues IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm Aromatic Residues Amino acid properties

11 10 Properties We Studied IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

12 11 Modified Representation of Primary Sequence Amino Acid Property Sequences Charge Polarity Aromaticity Size Electronic properties IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

13 12 Predictive Capability of Each Property Adjust parameters of TMHMM (v 1.0): To make it emit one of the property values Properties considered Polarity : polar, non-polar Aromaticity: aromatic, aliphatic, neutral Electronic properties: strong donor, weak donor, neutral, weak acceptor, strong acceptor 3-valued property observations achieve 91% accuracy of that of 20-valued amino acid observation IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

14 13 Approach Biology-Language Analogy Mapping Biology: Knowledge about a topic Language: Multiple genome sequences Raw text stored in databases, libraries, websites Expression, folding, structure, function and activity of proteins Meaning of words, sentences, phrases, paragraphs Understand complex biological systems Retrieval Summarization Translation Extraction Decoding Mapping Biology: Knowledge about a topic Language: Multiple genome sequences Raw text stored in databases, libraries, websites Expression, folding, structure, function and activity of proteins Meaning of words, sentences, phrases, paragraphs Understand complex biological systems Retrieval Summarization Translation Extraction Decoding IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm Ganapathiraju, et al (2004) LNCS 3345

15 14 Text Domain Equivalent Words: Property-values VQLAHHFSEPEITLIIFGVMAGVIGTILLISYGIRRLIKK ----ppn-n-n---- -p--pp-p----p-- -.-.RRR....-.-- OOO.OOO.O.OOoOO W1 : positively charged W2 : polar W3 : nonpolar W4 : aromatic W5 : aliphatic W6 : strong electron acceptor W7 : strong electron donor W8 : weak electron acceptor W9 : weak electron donor W10 : medium sized Documents and Words Documents: 15-residue windows IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

16 15 Latent Semantic Analysis Words Documents Build Word-Document Matrix IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm Dimension 1 Dimension 2 Distinct features of TM and nonTM achieved W = USV T For classification feature vectors SV T can be used Reduced dimensions: 4

17 16 Different Classifiers/Models Support vector machines Neural networks Linear classifier Hidden Markov modeling Decision trees Neural network with LSA features is called TMpro IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

18 17 Evaluations Uses evolutionary information and many more model parameters IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm Benchmark Server Results http://cubic.bioc.columbia.edu/services/tmh_benchmark/ Evaluation on larger datasets

19 18 TMpro Web Interface Novel features for manual annotation http://linzer.blm.cs.cmu.edu/tmpro/ IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

20 19 Acknowledgements Co-authors: Judith Klein-Seetharaman Raj Reddy N. Balakrishnan Web-site Development: Christopher Jon Jursa Hassan A. Karimi IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

21 Thank you!

22 21 Larger training data does not improve TMHMM STMHMM is TMHMM trained with recent 145 TM proteins IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm

23 22 Performance on Recent Large Dataset MethodQ ok FQhtm Q2 Confusion with soluble Score%obs%prd PDBTM (191 proteins, 789 TM segments) 1TMHMM689089908413 2SOSUI60878687 17 3DAS TMfilter6290 918510 4TMpro NN5793 812 MPtopo (101 proteins, 443 TM segments) 6TMHMM66918994845 7SOSUI68899187827 8DAS TMfilter66888790785 9TMpro NN60939295791 IntroductionPropertiesApproachWeb ServerPrevious MethodsAlgorithm


Download ppt "TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith."

Similar presentations


Ads by Google