Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures Rachel Kolodny Patrice Koehl Michael Levitt Stanford University.

Slides:



Advertisements
Similar presentations
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Advertisements

Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Chapter 6: Model Assessment
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Protein structure (Part 2 of 2).
Proteins  Proteins control the biological functions of cellular organisms  e.g. metabolism, blood clotting, immune system amino acids  Building blocks.
Protein Structure Space Patrice Koehl Computer Science and Genome Center
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
Protein threading Structure is better conserved than sequence
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
BMI 731 Protein Structures and Related Database Searches.
Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Comparing Database Search Methods & Improving the Performance of PSI-BLAST Stephen Altschul.
Supplementary material Figure S1. Cumulative histogram of the fitness of the pairwise alignments of random generated ESSs. In order to assess the statistical.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Protein Structure Similarity
The dynamic nature of the proteome
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Centre for Integrative Bioinformatics.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein structure – introduction “Bioinformatics: genes, proteins and computers” Orengo, Jones and Thornton (2003).
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
Evaluating Results of Learning Blaž Zupan
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
A Global View of the Protein Structure Universe and Protein Evolution Sung-Hou Kim University of California, Berkeley, CA U.S.A. June 27, 2006.
Comparing and Classifying Domain Structures
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
EMBL-EBI MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB Data Mining Predicting Networks through Bayesian Integration #1 - Theory Mark Gerstein, Yale University.
Protein Classification
Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding Xu Linhe 14S
Testing sequence comparison methods with structure Organon, Oss Tim Hulsen.
Data Analytics CMIS Short Course part II Day 1 Part 4: ROC Curves Sam Buttrey December 2015.
Performance measures Morten Nielsen, CBS, Department of Systems Biology, DTU.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
METHOD: Family Classification Scheme 1)Set for a model building: 67 microbial genomes with identified protein sequences (Table 1) 2)Set for a model.
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Several motifs (  -sheet, beta-alpha-beta, helix-loop-helix) combine to form a compact globular.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Chapter 14 Protein Structure Classification
Protein Structure Comparison
Evaluating Results of Learning

Prediction of Protein Structure and Function on a Proteomic Scale
Protein structure prediction.
Protein Disorder Prediction
Presentation transcript:

Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures Rachel Kolodny Patrice Koehl Michael Levitt Stanford University

Myoglobin Perutz 1960 compare Hemoglobin Database (PDB) Human growth hormone Human Prolactin … Oxy-Myoglobin Today: structures Compare(, ) Generate List want ordered list … Similar structures

The Structural Alignment Problem comparison of structures Two chains in R 3 A=(a 1,a 2,…,a n ), B=(b 1,b 2,…,b m ) Find sub-chains s a and s b s.t. - (a s a (1),a s a (2),…, a s a (k) ),(b s b (1),b s b (2),…, b s b (k) ) are similar –k is maximal tradeoff

Similarity in Structure Two common similarity measures cRMS dRMS Captures how close the corresponding pairs are in space Euclidean similarity

Alignment Scores single number that allows comparison of two alignments Given two alignments, judging which is better We have: length k, cRMS Use SI, MI [Kleywegt & Jones 1994 ]

Easy To See Best Alignment … ABCD (1,2) (1,3) (N,N-1) method pair Distribution of stars shows which is best

‘Best-of-All’ Can Join Efforts … ABCDBest- of-All (1,2) (1,3) (N,N-1) method pair

Large Scale Comparison 2930 CATH domains –769 fold classes –Sequence diverse All against all 8,581,970 alignments Over 800 CPU days on 2.8GHz processors class architecture topology

Methods Compared SSAPTaylor & Orengo, 1989 STRUCTALSubbiah, Laurents & Levitt, 1993 Gerstein & Levitt 1998 DALIHolm & Sander, 1993 Holm & Park, 2000 DEJAVU /LSQMANKleywegt, 1996 CEShindyalov & Bourne, 1998 SSMKrissinel & Henrick, 2003 Best-of-AllBest of above methods

Previous Comparisons Sierk & Pearson [2004] –ROC curves using CATH Novotny et al. [2004] –Checked a few dozen cases –Use CATH as gold standard Leplae & Hubbard [2002] –ROC curves using SCOP

Comparison Using ROC Curves Gold Standard 1 Positives Negatives 1 1 … … 0 0 Sort by similarity 2 Score/SAS …… Draw ROC curves True Positives % (sensitivity) False Positives % (100 – specificity) random Perfect measure

ROC Curve Issues Uses only internal ordering –Estimation of similarity can be very wrong Converts a classification gold standard into binary truth Native scores or SAS …… …… 9400

SAS & Native ROC Curves

Comparing SAS Values Directly Best-of-All

GSASSAS CAT CA Cross Fold Similarities

GSAS & SAS Distributions Best-of-All Same CAT Pairs All Pairs percent

More Tests Hard cases for all but one –STRUCTAL and LSQMAN do well –Relies on comparing alignments directly Time –SSM does best, then LSQMAN –To be fast: give up quickly in hard cases

Summary A new methodology for comparing structural alignment methods Allows defining ‘Best-of-All’ method Now can use Best-of-All data –Maybe to improve database-wide comparisons to new structures ?

Thank You