Alpha-helical transmembrane protein fold prediction using residue contacts Timothy Nugent and David Jones Bioinformatics Group, Department of Computer.

Slides:



Advertisements
Similar presentations
Números.
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
SKELETAL QUIZ 3.
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Sequential Logic Design
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
1 When you see… Find the zeros You think…. 2 To find the zeros...
Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.
EQUS Conference - Brussels, June 16, 2011 Ambros Uchtenhagen, Michael Schaub Minimum Quality Standards in the field of Drug Demand Reduction Parallel Session.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Introduction to Turing Machines
ASCII stands for American Standard Code for Information Interchange
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent.
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.
Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Sampling in Marketing Research
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
© 2010 Concept Systems, Inc.1 Concept Mapping Methodology: An Example.
MM4A6c: Apply the law of sines and the law of cosines.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
CSE 6007 Mobile Ad Hoc Wireless Networks
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Artificial Intelligence
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Daily Quiz and Journal Ch 1 Sect 1
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Foundation Stage Results CLL (6 or above) 79% 73.5%79.4%86.5% M (6 or above) 91%99%97%99% PSE (6 or above) 96%84%100%91.2%97.3% CLL.
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Biostatistics course Part 14 Analysis of binary paired data
UNDERSTANDING THE ISSUES. 22 HILLSBOROUGH IS A REALLY BIG COUNTY.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
úkol = A 77 B 72 C 67 D = A 77 B 72 C 67 D 79.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

Alpha-helical transmembrane protein fold prediction using residue contacts Timothy Nugent and David Jones Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT Introduction Alpha-helical transmembrane (TM) proteins constitute roughly 30% of a typical genome and are involved in a wide variety of important biological processes including cell signaling, transport of membrane-impermeable molecules and cell recognition. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. Despite significant efforts to predict TM protein topology, little attention has been directed toward developing a method to pack the helices together. Since the membrane-spanning region is predominantly composed of alpha- helices with a common alignment this task should, in principle, be easier than predicting the fold of globular proteins. However, structural features including re-entrant, tilted and kinked helices render simple lattice models that may work for regularly packed proteins unable to model the diverse packing arrangements now present in structural databases. We present a novel method to predict lipid exposure, residue-residue contacts, helix-helix contacts and finally the helical packing arrangement of TM proteins, benchmarked with full cross-validation on a data set of 74 sequences, each containing at least two TM helices, and all with crystal structures available. Predicting lipid exposure In order to predict lipid exposure, PSI-BLAST profile data with a sliding window approach was used to train a support vector machine (SVM) classifier. To label training examples, we used the Course Grain Database, a repository of molecular dynamics simulation data, for which the proportion of simulation time each residue is exposed to lipid is available. All residues within the membrane that were exposed to lipid for more than half of simulation time were therefore labelled as positive training examples. Under stringent cross validation using a jack knife test, we were able to predict lipid exposure for each residue with 70% accuracy and a Mathew's correlation coefficient (MCC) of This compares favourably with result from the LIPS server when benchmarked on the same test set (62% accuracy, MCC 0.23). Predicting residue contacts and helix-helix interactions Using topology information derived from crystal structures, we analysed interactions between residues on different TM helices. For this study, we only considered interactions within a single chain, rather than between chains. To define residue-residue contacts, and compare our approach with other methods, three contact definitions were used to label data: (i) backbone/side chain heavy atoms are within 5.5 Å, (ii) C-beta atoms are within 6 Å or the distance between interacting pair is less than the sum of their VDW radii Å and (iii) C-beta atoms are within 8 Å (C-alpha for glycine). To predict residue contacts, we also used an SVM classifier. Features included: a 7 residue window centred on each residue in the interacting pair (again using PSI-BLAST profiles), predicted lipid exposure for each residue, and a number of sequence based statistics. These included a binary vector representing distance between residues and values representing the relative position of residues in each TM helix – equivalent to a Z coordinate. SVM training files were roughly balanced producing a positive/negative ratio of 1:1.25. Results are shown in table 1. Table 1. Residue-residue contact prediction results. L5 = top L/5 scoring predictions assessed, where L is the combined length of all TM helices. Our method is labelled 'TM Contact Predictor'. In order to assess helix-helix interactions one pair of residue-residue contacts was required to be correctly predicted. Our data set therefore contained 593 interacting helices and 815 non-interacting helices. Helix-helix prediction results are shown in table 2. Despite significant efforts to predict alpha-helical transmembrane protein topology, relatively little attention has been directed towards developing a method to pack the helices together. We present a novel approach that uses predicted lipid exposure, residue contacts, sequence statistics and a force-directed algorithm to find the optimal helix packing arrangement. This work was funded by the Biotechnology and Biological Sciences Research Council, and supported by the BioSapiens project, which is funded by the European Commission within its FP6 Programme, under the thematic area "Life sciences, genomics and biotechnology for health,"contract number LSHG-CT MethodContactPrecisionFPRFNRMCC TM Contact Predictor TMHCon L5 TM Contact Predictor TMHIT L5 TM Contact Predictor SVMCon SVMCon L5 ProfCon ProfCon L5 5.5 Å 6 Å 8 Å Table 2. Helix-helix interaction results. + No cross validation on 41 sequences common to TMHIT training set. Results show a significant improvement in accuracy and MCC scores against all methods other than TMHIT. TMHIT was however trained on 41 sequences which were common to our test set, so this score is likely to be an overestimate as we were unable to cross-validate TMHIT results. In the absence of cross-validation for these 41 sequences, our method performs substantially better. Full cross validation on a smaller test set of 14 sequences resulted in scores of 68.4% accuracy for our method and 66.5% for TMHIT. A graph-based approach to find the optimal helix packing arrangement To find the optimal helix packing arrangement the structure is represented as a graph with helices forming vertices and helix-helix interactions forming edges. By employing a force-directed algorithm the method attempts to minimise edge crossing while maintaining uniform edge length, attributes common in native structures. Once the helix positions are determined, a genetic algorithm is used to rotate all helices such that the sum of predicted residue-residue contact distances in minimised. Results for Halorhodopsin are shown in figure 1. Figure 1. Predicted packing arrangement for Halorhodopsin [PDB: 1E1K] (above) and PDB structure with observed helix-helix interactions labelled (below). Conclusions Our results demonstrate that the use of predicted lipid exposure data combined with evolutionary information and sequence-based statistics can be used to accurately predict the packing arrangement of TM proteins. This method can be used to gain insights into TM protein folding and direct further experimental work. MethodContactPrecisionFPRFNRMCCAccuracy TM Contact Predictor TMHCon L5 TM Contact Predictor TM Contact Predictor+ TMHIT L5 TM Contact Predictor SVMCon SVMCon L5 ProfCon ProfCon L5 5.5 Å 6 Å 8 Å % 52.3% 66.7% 82.6% 73.2% 66.7% 59.3% 59.5% 45.4% 62.0%