Improved prediction of protein-protein binding sites using a support vector machine ( James Bradford, et al (2004)) Tapan Patel CISC841 Trypsin (and inhibitor.

Slides:



Advertisements
Similar presentations
Section 3 Curved Mirrors
Advertisements

Bioinformatics Vol. 21 no (Pages ) Reporter: Yu Lun Kuo (D )
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein Docking and Interactions Modeling CS 374 Maria Teresa Gil Lucientes November 4, 2004.
Docking of Protein Molecules
SiteEngine: Functional Sites Structural Search Engine
Chapter 3 (part 2) – Protein Function. Test Your Knowledge (True/False) All proteins bind to other molecules. Explain. What sort chemical interactions.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
An Integrated Approach to Protein-Protein Docking
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
QSD – Quadratic Shape Descriptors Surface Matching and Molecular Docking Using Quadratic Shape Descriptors Goldman BB, Wipke WT. Quadratic Shape Descriptors.
Structure-Function Analysis 117 Jan 2006 DNA/Protein structure-function analysis and prediction Protein-protein Interaction (PPI) and Docking: Protein-protein.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Identifying similar surface patches on proteins using a spin-image surface representation M. E. Bock Purdue University, USA G. M. Cortelazzo, C. Ferrari,
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
LINEAR PROGRAMMING: THE GRAPHICAL METHOD
Protein Tertiary Structure Prediction
1 correlating graph-theoretical centrality indices with interface residue propensity or: where do things stick together? Stefan Maetschke Teasdale Group.
Physical Mapping of DNA Shanna Terry March 2, 2004.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
by B. Zadrozny and C. Elkan
Modelling binding site with 3DLigandSite Mark Wass
On the nature of cavities on protein surfaces: Application to the Identification of drug-binding sites Murad Nayal, Barry Honig Columbia University, NY.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Catalytic Mechanisms.
Computational prediction of protein-protein interactions Rong Liu
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark 1/31 Prediction of significant positions in biological sequences.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Self-organized Models of Selectivity in Ca and Na Channels Bob Eisenberg 1, Dezső Boda 2, Janhavi Giri 1,3, James Fonseca 1, Dirk Gillespie 1, Douglas.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
PROTEIN FOLDING: H-P Lattice Model 1. Outline: Introduction: What is Protein? Protein Folding Native State Mechanism of Folding Energy Landscape Kinetic.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
05/02/2008 Jae Hyun Kim Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Faulon, J. L.,
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
CIVET seminar Presentation day: Presenter : Park, GilSoon.
Computer Graphics CC416 Lecture 04: Bresenham Line Algorithm & Mid-point circle algorithm Dr. Manal Helal – Fall 2014.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Structure Visualization
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Simplified picture of the principles used for multiple copy simultaneous search (MCSS) and for computational combinatorial ligand design (CCLD). Simplified.
3-Dimensional structure of membrane-bound coagulation factor VIII: modeling of the factor VIII heterodimer within a 3-dimensional density map derived by.
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Predicting Active Site Residue Annotations in the Pfam Database
Support Vector Machine (SVM)
Virtual Screening.
Structure of β2-bungarotoxin: potassium channel binding by Kunitz modules and targeted phospholipase action  Peter D Kwong, Neil Q McDonald, Paul B Sigler,
Do enzyme-inhibiting drugs show increased reliance
An Integrated Approach to Protein-Protein Docking
Hongwei Wu, Mark W. Maciejewski, Sachiko Takebe, Stephen M. King 
Volume 3, Issue 6, Pages (June 1999)
LC8 is structurally variable but conserved in sequence.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Suvobrata Chakravarty, Roberto Sanchez  Structure 
Volume 21, Issue 6, Pages (June 2013)
Volume 21, Issue 6, Pages (June 2013)
Presentation transcript:

Improved prediction of protein-protein binding sites using a support vector machine ( James Bradford, et al (2004)) Tapan Patel CISC841 Trypsin (and inhibitor binding site Thermitase

Motivation Annotation of unknown proteins Protein structures exist whose function is not known Knowing possible binding sites can give us a hint at a protein’s function Binding site residue makeup gives important information about enzymatic reactions MAIN REASON: Binding site residue makeup gives important information about enzymatic reactions and organic mechanisms that can be possible

Motivation Pharmaceutical applications – designing an inhibitor that can occupy the binding site of a harmful protein e.g. HIV protease Prediction of possible binding sites can thus reduce the search space for a biologist trying to identify binding site by mutagenesis experiment

Is this even possible?? There are thousands of proteins each with unique 3D structure – how can we even begin to predict whether a region of protein surface is a binding site? Protein-protein interface has several fundamental properties that are different from rest of the protein. We can thus use these properties to predict interface regions A solution is possible!

Overall Method Calculate solvent excluded surface Label each surface vertex with seven chemical, geometrical or physical properties Define true binding site Generate interface and non-interface patchesGenerate patches Train SVMPredict Calculate patch attributes Generate training data

Training data Comprehensive set of complexes chosen from PDB  Homodimers, enzymes, obligomers, transient complexes  Heterodimers, inhibitors, etc.. Filter to avoid redundancy (would result in biased SVM):  Remove >20% structurally similar proteins  Documented in vivo interaction required (true positive)  Complex interfaces such as those spanning more than one chain or having >1 binding site removed (keep things simple)

After filtering… 180 total proteins remain 36 (enzyme-inhibitor), 27 (hetero-obligate), 87 (homo-obligate), 30 (transient).

Overall Method Calculate solvent excluded surface Label each surface vertex with seven chemical, geometrical or physical properties Define true binding site Generate interface and non-interface patchesGenerate patches Train SVMPredict Calculate patch attributes Generate training data

Surface generation From 3D strucutre, solvent excluded surface generated with probe sphere of radius 1.5A (MSMS code) SAS traced by center of rolling probe (solvent) SES: contour inaccessible by probe

Patch generation Patch: a small region of protein surface containing surface vertices (atoms on the surface) Interface patch:  Circular  Center of the center of binding site  Radius = 0.08*size of smallest protein oIn actuality, interface size ~ 13% size of smallest protein Non-interface patch:  Same as interface patch but center randomly selected from set of non-interface vertices

So far we have… Non-interface patch Interface patch Actual binding site A 3D structure with known binding site Generated solvent excluded surface with annotated patch regions

Overall Method Calculate solvent excluded surface Label each surface vertex with seven chemical, geometrical or physical properties Define true binding site Generate interface and non-interface patchesGenerate patches Train SVMPredict Calculate patch attributes Generate training data

Properties for distinction Seven properties used to distinguish binding site from rest of protein  Surface shape (shape index, curvedness)  Conservation  Electrostatic potential  Hydrophobicity  Residue interface propensity  Solvent accessible surface area (ASA)

Conservation Residues are conserved at binding sites more so than at non-binding sites For a given protein (in training set) BLAST search to id homologous sequences and do MSA using CLUSTAL W. Clusters of conserved residues may characterize functional site

Conservation Conserved Intermediate Variable Conservation at the BPTI binding site on trypsin (PDB code: 2ptc) InterfaceConservation Non-interface Interface

Surface shape Shape index – describe the shape of local surface  Ranges from -1 (concave) to 0 (flat) to 1 (convex) Curvedness – curvature (change in tangent-tangent correlation vector) Convex Flat Concave Highly curved Curved Less curved Shape indexCurvedness Non-interface Interface Trypsin inhibitor (PDB: 1tab)

Electrostatic potential Interface region may be especially positively or negatively charged for stabilization of a complex upon binding a polar partner. Positive Neutral Negative Eglin c binding site on thermitase Non-interface Interface Electrostatic potential Positive potential on eglin c complementary to the negative potential on its partner thermitase (right)

Other properties Hydrophobicity – use existing scale Residue interface propensity = Since each patch has many vertices and we calculated these 7 properties for each vertex, get the mean and standard deviation for each patch. This gives us total of 14 SVM attributes for each patch

Overall Method Calculate solvent excluded surface Label each surface vertex with seven chemical, geometrical or physical properties Define true binding site Generate interface and non-interface patchesGenerate patches Train SVMPredict Calculate patch attributes Generate training data

Train SVM Use mySVM software (Ruping 2000) Φ = radial kernel = exp(-.01r 2 ) Use labeled interface patch and non-interface patch (each with 14 attributes) to train SVM to distinguish interacting patches from non-interacting patches At the end, rank the patches according to confidence value and filter to remove overlapping patches (>10% residue similarity)

Leave one-out cross-validation Assess the accuracy of trained SVM by –Taking one protein out of the training set –Train SVM on the reduced training set (less by 1) –Using this SVM, predict the binding site of the known protein –Apply specificity and sensitivity measures to each predicted patch –Specificity = # of interface residues in patch/ # of patch residues Proportion of patch residues that are interface residues –Sensitivity = # of interface residues in patch/ # of interface residues Proportion of interface residues that are included in patch –Repeat until satisfied Want high specificity and reasonable sensitivity Success if patch w/ >50% specificity and >20% sensitivity ranked in the top three

Leave one-out and overall success Able to predict location of interface on 76% of proteins (136/180) 64% (23/36) for enzyme-inhibitor 82% (93/114) for obligate binding site 65% (43/66) for transient SVM may be biased towards obligate due to large # in training set Or transient just harder to predict

Heterogeneous cross-validation Train SVM on only obligate type proteins and predict on transient types Vice versa Success rate comparable to leave-one out Implies that transient and obligate share enough similarity to be distinguished from non- interacting parts

Unbound proteins SVM originally trained on proteins in their bound states In practice, crystal structure of an unknown protein is usually in its unbound state – can our SVM successfully predict such unbound states? Tested enzyme-inhibitor complexes: –Take an enzyme in its unbound form and predict the binding site –Compare the prediction with the (known) binding site of the same enzyme-inhibitor complex Overall, SVM is good for predicting unbound protein interface (good!)

Conclusion Developed an SVM based method for predicting protein- protein binding sites 14 attributes used in prediction Using great number of attributes may increase success rate Improvement to old methods that could only predict on either obligate or transient binding sites. We can predict on both Limitation: patches that matched interface size and shape were rarely produced (limiting specificity and sensitivity). Better way of estimating patch size would improve results.