Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab.

Slides:



Advertisements
Similar presentations
Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Secondary structure assignment
Protein Structure Prediction using ROSETTA
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Protein Secondary Structures
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
Determination of alpha-helix propensities within the context of a folded protein Blaber et al. J. Mol. Biol 1994.
FAST: A Novel Protein Structure Alignment Algorithm Jianhua Zhu and Zhiping Weng PROTEINS: Structure, Function, and Bioinformatics 58:618–627 (2005) Created.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
It & Health 2009 Summary Thomas Nordahl Petersen.
Structures and Structure Descriptions Chapter 8 Protein Bioinformatics.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Hybrid Protein Model Quality Assessment Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri, Columbia, MO, USA.
Structural Bioinformatics Dr. Avraham Samson Course no.: Credit points: 1.5 Final grade is based on 10 assignments Course homepage:
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Evolving Models of Biological Sequence Similarity Daniel P. Miranker The University of Texas at Austin [Chenetal98]
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
3-Dimensional Structure of Proteins 4 levels of protein structure:
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Representations of Molecular Structure: Bonds Only.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Order independent structural alignment of circularly permutated proteins T. Andrew Binkowski Bhaskar DasGupta  Jie Liang ‡ Bioengineering Computer Science.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Designing Example Critiquing Interaction Boi Faltings Pearl Pu Marc Torrens Paolo Viappiani IUI 2004, Madeira, Portugal – Wed Jan 14, 2004 LIAHCI.
Protein Secondary Structure Prediction
Secondary structure prediction
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
A Technical Introduction to the MD-OPEP Simulation Tools
Protein Secondary Structure Prediction G P S Raghava.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
FM Model Assessment: Old scores and New Combinations ShuoYong Shi Nick Grishin Lab.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Graph clustering to detect network modules
Chapter 14 Protein Structure Classification
Database extraction of residue-specific empirical potentials
Assessment of CASP11 Contact-Assisted Predictions
TEMPLATE-BASED METHODS FOR PROTEIN MODEL QA
Introduction to Bioinformatics II
Protein Folding and Protein Threading
Rosetta: De Novo determination of protein structure
Protein structure prediction.
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Blind Test of Physics-Based Prediction of Protein Structures
A Molecular Dynamics Study of Ca2+-Calmodulin: Evidence of Interdomain Coupling and Structural Collapse on the Nanosecond Timescale  Craig M. Shepherd,
Volume 20, Issue 3, Pages (March 2012)
Label propagation algorithm
Protein structure prediction
Presentation transcript:

Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab

Tesla: curiosity driven Me: laziness driven Qian Cong from Grishin lab Development of automatic score to simulate manual assessment for CASP FM targets

Tesla: curiosity driven Me: laziness driven Qian Cong from Grishin lab 36 targets (whole chain + domain) around models Development of automatic score to simulate manual assessment for CASP FM targets

Inspiration from expert’s manual analysis Expert: global features + local features Local feature: secondary structure assignment of each residue Global feature: global positions of each Secondary Structure Elements (SSEs) packing and interactions between SSEs

Inspiration from expert’s manual analysis Expert: global features + local features Local feature: secondary structure assignment of each residue Global feature: global positions of each Secondary Structure Elements (SSEs) packing and interactions between SSEs Develop a score to “mimic” expert inspection: check each secondary structure element, and inspect their packing and interactions.

Overview of features get considered Measurements on single secondary structure element or residue The global position of each SSE The length of each SSE The residue DSSP assignment Measurements on secondary structure pairs or residue pairs The angle between SSE pair The interactions between SSE pair The residue contact score (used for CASP8)

Overview of features get considered Measurements on single secondary structure element or residue The global position of each SSE The length of each SSE The residue DSSP assignment Measurements on secondary structure pairs or residue pairs The angle between SSE pair The interactions between SSE pair The residue contact score (used for CASP8) Local features as modulator

Step 1.1: Get SSE definition and vector set for target T0531

Step 1.1: Get SSE definition and vector set for target I.Majumdar et al. (2005) BMC Bioinformatics HELIX SER 26 PRO 32 7 SHEET GLU 14 CYS 19 6 SHEET GLU 19 CYS 24 6 SHEET GLU 39 CYS 43 5 SHEET GLY 43 SER 48 6 SHEET SER 49 CYS 57 9 Type Start End Length PALSSE T0531

Step 1.1: Get SSE definition and vector set for target I.Majumdar et al. (2005) BMC Bioinformatics HELIX SER 26 PRO 32 7 SHEET GLU 14 CYS 19 6 SHEET GLU 19 CYS 24 6 SHEET GLU 39 CYS 43 5 SHEET GLY 43 SER 48 6 SHEET SER 49 CYS 57 9 Type Start End Length PALSSE T0531

Step 1.1: Get SSE definition and vector set for target I.Majumdar et al. (2005) BMC Bioinformatics HELIX SER 26 PRO 32 7 SHEET GLU 14 CYS 19 6 SHEET GLU 19 CYS 24 6 SHEET GLU 39 CYS 43 5 SHEET GLY 43 SER 48 6 SHEET SER 49 CYS 57 9 Type Start End Length PALSSE T0531

Step 1.2: Get the interacting residue pairs Interactions criteria: 1. The shortest distance of central part of two SSEs 2. Below 8.5 Å T0531

Step 1.2: Get the interacting residue pairs 15, 46 15, 52 17, 42 20, 44 21, 42 21, 55 23, 29 32, 41 32, 54 42, 52 45, 55 Interactions criteria: 1. The shortest distance of central part of two SSEs 2. Below 8.5 Å T0531

Step 1.2: Get the interacting residue pairs 15, 46 15, 52 17, 42 20, 44 21, 42 21, 55 23, 29 32, 41 32, 54 42, 52 45, 55 Interactions criteria: 1. The shortest distance of central part of two SSEs 2. Below 8.5 Å T0531

Step 1.2: Get the interacting residue pairs 15, 46 15, 52 17, 42 20, 44 21, 42 21, 55 23, 29 32, 41 32, 54 42, 52 45, 55 Interactions criteria: 1. The shortest distance of central part of two SSEs 2. Below 8.5 Å T0531

Step 2: Simplify models into vectors and key points The SSE definition and interacting residue pair definition are propagated to models, and thus models are simplified as a set of vectors and point pairs too.

Step 2: Simplify models into vectors and key points TS490_2 TS399_4 The SSE definition and interacting residue pair definition are propagated to models, and thus models are simplified as a set of vectors and point pairs too.

Step 2: Simplify models into vectors and key points TS490_2 TS399_4 The SSE definition and interacting residue pair definition are propagated to models, and thus models are simplified as a set of vectors and point pairs too.

What should we look at? TargetA good model A “bad” model

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein Step 3.1: compare the global position of SSE vectors

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein Step 3.1: compare the global position of SSE vectors

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein Step 3.1: compare the global position of SSE vectors

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein Step 3.1: compare the global position of SSE vectors

Definition of global position: the distance between the geometry center of SSE and the center of the whole protein Step 3.1: compare the global position of SSE vectors

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length Step 3.2: compare the length of SSE vectors

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length Step 3.2: compare the length of SSE vectors

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length Step 3.2: compare the length of SSE vectors

Assumption: wrong SS prediction and improper break of SSE will result in difference from target in vector length Step 3.2: compare the length of SSE vectors Too broad a measurement for secondary structure quality measurement

Percent agreement of DSSP assignment reflects the detailed quality of secondary structures Step 3.3: compare DSSP assignment Target:CCCCCCCCSTTTSCEEEEEEEEECCHHHHHHCGGGTTTCEEEEEEETTTTEEEEEECCHHHHSCC Model1:CCHHHHSHHHHTTCEECCCCCSGGGCSSCSSCCCCGGGTCCEEEEETTTTEEEEESSCHHHHTCC Model2:CCCCSSSTTSSTTHHHHTTSCSSCSSCCSSCCCCCCSSCCCCEEEETTTTEEEECCCSCHHHHHC

Step 3.4: compare the angle between SSE vector pairs

Step 3.5: compare the interactions between SSE pairs Motivation: some key interactions defined the general packing of elements, they should be emphasized more

C-alpha contact score is added as a modulator for key SSE interaction score Step 3.6: compare all C-alpha contact score Define all alpha contact at a cut off of 8.44 Å, similar program is proved to be good measurement by CASP8 assessors Shuoyong ShiShuoyong Shi, Jimin Pei, Ruslan I. Sadreyev, Lisa N. Kinch, Indraneel Majumdar, Jing Tong, Hua Cheng, Bong-Hyun Kim, Nick V. Grishin. Analysis of CASP8 targets, predictions and assessment methods. Database: The Journal of Biological Database and Curation (2009).Nick V. GrishinDatabase: The Journal of Biological Database and Curation

Let’s sum up all the scores

superimposition independent global and local comparison manual analysis simulating score SIGLACMASS ?

Let’s sum up all the scores superimposition independent global and local comparison manual analysis simulating score SIGLACMASS ? Qian Cong score = QCS

Let’s sum up all the scores superimposition independent global and local comparison manual analysis simulating score SIGLACMASS ? Qian Cong score = QCS = Quality control score

Global view: Correlations

Optimization on weight??? Correlation improve 2%

Global view: Correlations

Most cases, the top models selected by GDT generally agree with top models selected by QCS and manual assessment Go to individual: Top picks

Most cases, the top models selected by GDT generally agree with top models selected by QCS and manual assessment There are cases where QCS reveals features we like Go to individual: Top picks

324_5 382_5 Example where QCS reveals better model TS324_5 QCS: 67.4 GDT: 39.4 TS382_5 QCS: 80.7 GDT: 30.9 Target 561

324_5 382_5 T0561 QCS reveals good global topology TS324_5 QCS: 67.4 GDT: 39.4

324_5 382_5 T0561 QCS reveals good global topology TS324_5 QCS: 67.4 GDT: 39.4

324_5 382_5 T0561 QCS reveals good global topology TS324_5 QCS: 67.4 GDT: 39.4

324_5 QCS reveals good global topology T0561 TS382_5 QCS: 80.7 GDT: 30.9

QCS reveals model with good interactions

Get all 3 key SSE interactions

Get only 1 key SSE interactions QCS reveals model with good interactions Get all 3 key SSE interactions

Being lazy cannot be a final solution To assess CASP need a lot of diligent, efficient and smart and careful analysis in a large scale