A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D95922037)

Slides:



Advertisements
Similar presentations
11/9/99ICTAI-99, Chicago1 Protein Secondary Structure Prediction Using Data Mining Tool C5 Meiliu Lu †, Du Zhang †, Hongjun Xu †, Ken Tse-yau Lau ‡, and.
Advertisements

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
By Groysman Maxim. Let S be a set of sites in the plane. Each point in the plane is influenced by each point of S. We would like to decompose the plane.
Bioinformatics Vol. 21 no (Pages ) Reporter: Yu Lun Kuo (D )
Background Goals Methods Results Conclusions Implications.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
2. Voronoi Diagram 2.1 Definiton Given a finite set S of points in the plane , each point X of  defines a subset S X of S consisting of the points of.
Improved prediction of protein-protein binding sites using a support vector machine ( James Bradford, et al (2004)) Tapan Patel CISC841 Trypsin (and inhibitor.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Reduced Support Vector Machine
Docking of Protein Molecules
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
FLEX* - REVIEW.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
An Integrated Approach to Protein-Protein Docking
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Protein Tertiary Structure Prediction
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel:
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
Computational prediction of protein-protein interactions Rong Liu
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Protein backbone Biochemical view:
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
1 Three-Body Delaunay Statistical Potentials of Protein Folding Andrew Leaver-Fay University of North Carolina at Chapel Hill Bala Krishnamoorthy, Alex.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
J Comput Chem 26: 334–343, 2005 By SHURA HAYRYAN, CHIN-KUN HU, JAROSLAV SKRˇ IVA′ NEK, EDIK HAYRYAN, IMRICH POKORNY.
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Introduction Feature Extraction Discussions Conclusions Results
Students: Meiling He Advisor: Prof. Brain Armstrong
Extra Tree Classifier-WS3 Bagging Classifier-WS3
K Nearest Neighbor Classification
Virtual Screening.
Prediction of Protein Structure and Function on a Proteomic Scale
Feature Selection Ioannis Tsamardinos Machine Learning Course, 2006
Eeva K. Leinala, Peter L. Davies, Zongchao Jia  Structure 
An Integrated Approach to Protein-Protein Docking
COSC 4335: Other Classification Techniques
2. Generating All Valid Inequalities
Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick  Biophysical Journal 
AnchorDock: Blind and Flexible Anchor-Driven Peptide Docking
Volume 19, Issue 7, Pages (July 2011)
Reporter: Yu Lun Kuo (D )
Complementarity of Structure Ensembles in Protein-Protein Binding
Structure of Bax  Motoshi Suzuki, Richard J. Youle, Nico Tjandra  Cell 
Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey  András.
A protein domain interaction interface database: InterPare
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D ) Date: November 10, 2008 Bioinformatics Vol. 23 no (Pages )

Introduction A protein-protein docking procedure traditionally consist of two successive tasks –A search algorithm generates a large number of candidate solutions Over the rotational/translational degrees of freedom Two partners contact each other in many different orientations –A scoring function is used to rank them The best solutions are selected by evaluating a score 2

Introduction Scoring function –Express the geometric complementarity of the two molecular surfaces in contact –The strength of the interaction Based on the physico-chemical characteristics of the amino acids in contact with each other 3

Introduction The formation of a complex –Concern only the side chain conformation of amino acid residues at the interface –Imply motions of the protein backbone The nature and amplitude of which remains very difficult to predict Unbound docking predictions much more difficult than bound ones 4

Introduction Protein-protein complexes in a complete genome with thousands of genes –Very reliable The score of the best solution is high The best solution is close to the native complex –Very fast The inspection of a whole genome requires the modeling of many hundreds of thousands potential complexes 5

Methods The Voronoi diagrams –Includes all points of space that are closer to the cell centroid than to any other centroid –Smallest polyhedron defined by bisecting planes between its centroid and all others The Dalaunay tessellation is obtained by tracing the vertices joining centroids 6

Voronoi diagrams 7

Tessellation Definitions (1/2) –Two residues are neighbors If their Voronoi cells share a common face –A residue belongs to the protein interior If all its neighbors are residues of the same protein –A residue belongs to the protein surface If one ore more of its neighbors is solvent 8

Tessellation Definitions (2/2) –A residue belongs to the protein–protein interface If one or more of its neighbors belongs to the other protein –An interface residue belongs to the core of the interface If none of its neighbors is solvent –The cell facets shared by residues of both proteins constitute the interface 9

Voronoi Description (1p2k) 10 Protein chains Solvent By Voronoi polyhedra of residues of two proteins, representating the interface

Training Set The training set consists in two subsets –Positive examples Complexes of known 3D structure –The 2004 #1 of the Protein Data Bank (PDB) –Negative examples Generated from the positive examples using a docking procedure Decoys 11

Decoys An imperfect scoring function can –Mislead by predicting incorrect ligand geometries by selecting nonbinding molecules over true ligands –These false-positive hits may be considered decoys 12

Training Attributes (1/2) Number of parameters that may be used in –training is limited by the size of the training set To define pair attributes, we grouped residue types in six categories –Hydrophobic H –Aromatic Φ –Positive charged + –Negatively charged – –Polar P –Small S 13

Training Attributes (2/2) –Attributes includes 84 parameters in sixe classes P1 The Voronoi interface area (1 parameter) P2 Total number of core interface residues (1) P3 Number fraction of each type of core interface residues (20) P4 The mean volume of the Voronoi cells for the core interface residues of each type (20) P5 number fraction of pairs of each category (21) P6 The mean centroid-centroid distance in pairs of each category (21) 14

15

Learning Methods The values of the 84 parameters were measured on –The 102 native complexes (positive) –The decoys of training set (negative) Logistic function SVM (Support Vector Machines) ROGER (a ROc based GEnetic learner) 16

Learning Methods Logistic function –Linear combination of the parameters with weighted optimized to yield a value Close to 1 on the native models 0 on the decoys –Using the general linear model (GLM) of the R software 17

Learning Methods SVM (Support Vector Machine) –Divide a high-dimensional space into regions containing only positive examples or only decoys –Using SVMTorch Efficiently solve large-scale regression problems 18

Learning Methods ROGER (Roc based genetic learner) –The receiver operating characteristics (ROC) procedure is often used evaluate learning By cross-validation on examples –Uses genetic algorithm to find a family of functions that optimized the ROC criterion 19 Central value ci and weight wi

Results and Discussion Performance of the learning procedures –The ROC curve was evaluated on the training set for four different scores Sum of the mean square deviation Logistic function SVMs ROGER scoring function –A perfect selection (100% true positive &no false positive) should make the area under the ROC curve AUC equal to 1 20

ROGER and SVMs did much better –AUC of 0.98 and 0.99, respectively –Very few false positives among their best scoring solutions Retained the ROGER score for further studies as the SVMs only give a binary classification Ill-suited to our problem of “finding a needle in a hay stack” 21

Results on the targets of CAPRI rounds 3-6 We tested the scoring functions on models of the targets of CAPRI rounds 3-6 –By two docking programs DOCK (1991) HADDOCK (2003) 22

–Fnat: fraction of native contacts present in the solution –Fint: fraction of interface residues correctly predicted Class 1: Fnat > 0.75 Class 2: 0.5 < Fnat < 0.75 Class 3: 0.25 < Fnat < 0.5 Class4: 0 < Fnat < 0.25 Class5: Fint > 0 and Fnat =0 Class6: Fint = 0 and Fnat = 0 23

24 target target 18: the best solution in the set has 31% of native contacts. The best solution is the set is thus class 3. The best rank given by our scoring function to a solution of the best class in the set he first solution of the second best class (class 4 in this case) is ranked 1 Original rank of the rank 1 solution Number of models with ROGER ranks < 50 that belong to the best class Rank given by our function is better than given by HADDOCK Best class (class 2), re-ranked 4 Next best class (class 3) was1st Top 50 ROGER scores included 4 models of class 2, and 44 models of class 3 Thus Very few if any of top50 were false positive Best class (class 2), re-ranked 4 Next best class (class 3) was1st Top 50 ROGER scores included 4 models of class 2, and 44 models of class 3 Thus Very few if any of top50 were false positive

Conclusion For most targets, a best or second best class solution was found in the top 10 ranking solutions –More than half of the cases the top ranking solution belonged to the best or second best class –For all targets but one Rank given to the first best class solution by our scoring function is better than the rank given by the original method (DOCK or HADDOCK) 25

2016/7/726 Thanks for your attention