Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio.

Slides:



Advertisements
Similar presentations
Antibody Structure Prediction and the Use of Mutagenesis in Docking Arvind Sivasubramanian, Aroop Sircar, Eric Kim & Jeff Gray Johns Hopkins University,
Advertisements

PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
SAN DIEGO SUPERCOMPUTER CENTER Blue Gene for Protein Structure Prediction (Predicting CASP Targets in Record Time) Ross C. Walker.
Protein Structure Prediction using ROSETTA
Xin Gao PhD student Outline Traditional Protein Structure Prediction  Introduction  Methods Review  Experimental Results Refinement  Motivation.
Protein Threading Optimization Using Consensus Homology Modeling Maliha Sarwat ( ), Tasmin Tamanna Haque ( ) Department of Computer Science.
Abstracts of main servers in CASP11
Fold Recognition Ole Lund, Assistant professor, CBS.
Contact Lens: Evaluating Protein Structure by Contacts Contact Lens: Evaluating Protein Structure by Contacts RMSD vs. Contact Lens Root Mean Square Distance.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Protein structure (Part 2 of 2).
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Jianlin Cheng Computer Science Department & Informatics Institute
Fold Recognition Ole Lund, Associate professor, CBS.
Protein Fold recognition
MULTICOM – A Combination Pipeline for Protein Structure Prediction
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Hybrid Protein Model Quality Assessment Jianlin Cheng Computer Science Department & Informatics Institute University of Missouri, Columbia, MO, USA.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Modelling Workshop - Some Relevant Questions Prof. David Jones University College London Where are we now? Where are we going? Where should.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Modelling binding site with 3DLigandSite Mark Wass
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Graphical Modeling of Multiple Sequence Alignment Jinbo Xu Toyota Technological Institute at Chicago Computational Institute, The University of Chicago.
Jianlin Jack Cheng Computer Science Department University of Missouri, Columbia, USA Mexico, 2014.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Structure prediction: Homology modeling
Protein structure prediction Anttu Kurttio Ville Pietiläinen.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
Lecture 7. Computing Protein Structures Current attempts: Threading: RAPTOR Consensus: ACE Fragment assembly Can we compute the protein structures eventually?
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Molecular modelling José R. Valverde CNB/CSIC © José R. Valverde, 2014 CC-BY-NC-SA.
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
Molecular modelling Practical session
Challenges in Creating an Automated Protein Structure Metaserver
7. (Predicted) residue pair contacts guide ab initio modeling
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Prediction of Protein Structure and Function on a Proteomic Scale
Protein dynamics Folding/unfolding dynamics
Protein Folding and Protein Threading
Hahnbeom Park, Frank DiMaio, David Baker  Structure 
3-Dimensional Structure
Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick  Biophysical Journal 
Rosetta: De Novo determination of protein structure
Homology Modeling.
Protein structure prediction.
7. (Predicted) residue pair contacts guide ab initio modeling
Blind Test of Physics-Based Prediction of Protein Structures
Folding Membrane Proteins by Deep Transfer Learning
Volume 20, Issue 3, Pages (March 2012)
Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling  Jian Zhang, Yu Liang, Yang Zhang  Structure 
Protein structure prediction
High-Resolution Comparative Modeling with RosettaCM
Yang Zhang, Jeffrey Skolnick  Biophysical Journal 
Presentation transcript:

Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio

1.Domain parsing and assembly 2.Alignment cluster ranking 3.Sequence covariance restraints (GREMLIN) 4.Plus various bug fixes! CAMEO Benchmark Updates since last CASP

Robetta Server Pipeline RosettaCM (template hybridization) All targetsHard RosettaAB (fragment assembly) Difficulty prediction Model selection Domain assembly HHSearchRaptorXSPARKS-X Domain parsing, template alignment and spatial restraint generation Sequence For each domain Informatics Modeling

Alignment clusters HHSearchRaptorXSPARKS-X PDB100 Cluster partial threads for distinct topologies Up to 10 alignments from each method Rank clusters by P(correct) – probability that an alignment is within Δ GDT of the best Probability distribution varies considerably with target difficulty Easy targetHard target

Target Difficulty Prediction GDT and predicted difficulty correlation Used in: Domain parsing Modeling decision making Run RosettaAB also if < 0.2 (twilight regime) Amount of RosettaCM sampling Very easy targets run locally on cluster (>0.80 and sequence identity > 40%): Runs on Easy (>0.80): 2000 Medium (>0.30): 4000 Hard (<=0.30): 8000 Predict target difficulty based on the degree of structural consensus between the top-ranked alignment from each threading program.

Domain Parsing Objective is to identify optimal non-overlapping alignment clusters 1.Run alignment method and partial-thread clustering on sequence 2.Identify potential “chunks” based on windowed difficulty of top alignment cluster 30 residue window 0.1, 0.2, 0.3 difficulty thresholds 30 residue minimum domain length 3.Boundaries are fine-tuned using PSIPRED loop probability 4.All potential “chunks” are run through steps 1 to 3. 5.Final “chunks” are selected based on difficulty. 6.For twilight “chunks” (difficulty < 0.2) parsing is also based on Pfam and MSA (CASP9 GINZU method)

Modeling Method Sequence Template Alignments Sequence-based fragments Restraint functions Threaded templates Gradient-based energy minimization/lo op closure Torsion space fragment insertion Cartesian space template chunk recombination Full-atom refinement Gremlin

BAKER SERVER MODELS BAKER SERVER MODELS ALIGNMENT HHblits Jackhmmer 90% identity redundancy cutoff To find at least 2L sequences vary evalue and coverage 1e-20 to 1e-4 and 75% to 50% If >= 1L Sequences Contact PREDICTIONS Contact PREDICTIONS GREMLIN TARGET GREMLIN used if difficulty = 1L Used for 27 domains

RosettaCM performance using GREMLIN CASP11 targets Best vs Best (rerun without GREMLIN)Model1 vs Model1 (rerun without GREMLIN) T0768-D1 GREMLIN is used in ranking alignment clusters and sampling

RosettaAB performance using GREMLIN CASP11 targets T0789-D1 T0790-D2 Best vs Best (rerun without GREMLIN)Model1 vs Model1 (rerun without GREMLIN) GREMLIN is used in sampling

T0789-D1 trimmed (76 aa) GREMLIN predicted contacts helped w/ ~2L sequences Domain over parsed as vs official parse 2.84 Å RMSD over 71 res NATIVE T0789-D1 Models generated: Top scoring models clustered: 4519

T0790-D2 (130aa) GREMLIN predicted contacts helped w/ ~3L sequences Domain under parsed as vs official parse NATIVE T0790-D2 Models generated: Top scoring models clustered: Å RMSD over 92 res

T0767-D2 (180aa) Domain correctly parsed as vs official parse 3.99 Å RMSD over 92 res Models generated: Top scoring models clustered: 1565 MSA based domain parse was accurate

What went wrong 1.Ranking Twilight target ranking (to ab initio or not to ab initio that is the ?) New hybrid domain assembler (T0840-D1, T0852-D2) 2.Informatics Domain parse errors (T0808-D1, T0812-D1) Incorrect template (T0816-D1)

Ranking improved by a simple switch for twilight targets Submitted model 1 (CM) vs Submitted model 2 (AB) Simply choose submitted model 2 for AB targets (difficulty < 2.0)

Model1 CM GDT vs Model1 AB GDT Colored by difficulty

Acknowledgements Hetunandan Kamichetty (GREMLIN) Johannes Söding (HHpred) Jinbo Xu (RaptorX) Yaoqi Zhou and Yuedong Yang (Sparks-X) Rosetta Commons David Baker users for generous computing resources Juergen Haas (CAMEO CASP organizers, assessors, structural biologists who provided structures Andriy Kryshtafovych