Abstracts of main servers in CASP11

Slides:



Advertisements
Similar presentations
PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
Advertisements

Protein Structure Prediction using ROSETTA
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Xin Gao PhD student Outline Traditional Protein Structure Prediction  Introduction  Methods Review  Experimental Results Refinement  Motivation.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
MULTICOM – A Combination Pipeline for Protein Structure Prediction
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Similar Sequence Similar Function Charles Yan Spring 2006.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
Multiple Sequence Alignments
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Graphical Modeling of Multiple Sequence Alignment Jinbo Xu Toyota Technological Institute at Chicago Computational Institute, The University of Chicago.
Jianlin Jack Cheng Computer Science Department University of Missouri, Columbia, USA Mexico, 2014.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure prediction: Homology modeling
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Motif Search and RNA Structure Prediction Lesson 9.
Step 3: Tools Database Searching
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio.
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
7. (Predicted) residue pair contacts guide ab initio modeling
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Rosetta: De Novo determination of protein structure
Protein structure prediction.
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Protein structure prediction
High-Resolution Comparative Modeling with RosettaCM
Presentation transcript:

Abstracts of main servers in CASP11 presented by Chao Wang

Offical Ranking by SUM Z-scoer (>-2.0)

We focus on We don’t focus on Zhang-Server QUARK BAKER-ROSETTASERVER RaptorX MULTICOM-CLUSTER Pcons We don’t focus on HHPred ZHOU-SPARKS-X MUFOLD-Server

No HHpred and ZHOU-SPARKS-X abstracts in proceedings of CASP11 MUFOLD: formulates the structure prediction problem as a graph realization problem employs the multi-dimensional scaling (MDS) technique Cut the sequence into different segments and generate distance matrices of the blocks. Cluster distance matrices on each block. Recombine these cluster centers of each block to generate new distance matrices and filter out some poor distance matrices by a set of criteria such as triangle law. Generate new structures according to the sampled distance matrices. Use a consensus method to select best candidates from the new structures.

Zhang-Server based on the I-TASSER pipeline

In addition to the classic I-TASSER pipeline, several approaches were recently developed and integrated into I-TASSER to enhance its ability of structure modeling for distant-homology targets. First, the top models generated by the QUARK ab initio folding were merged into the threading template pool, which were used as the starting conformations of I-TASSER simulations. Second, since the hard targets generally lack global templates, the sequences were broken into segments of 2-4 consecutive secondary structure elements which were then threaded through the PDB by the segmental threading tool SEGMER9 to identify super-secondary structure motifs. Third, SVM-SEQ and SPcon (Shen et al, in preparation) are used to generate residue contact maps. For multiple-domain proteins, ThreaDom was used to predict the domain boundary and linker regions.

QUARK QUARK has been developed for ab initio protein structure prediction. It starts with the collection of continuously distributed structural fragments with 1-20 residues from unrelated proteins in the PDB. Full-length structure models are then assembled from the fragments by replica-exchanged Monte Carlo (REMC) simulations, which are guided by a composite physics- and knowledge-based force field that contains a variety of local structure features derived from sequence. For the proteins that are deemed by LOMETS as the Easy or Medium targets, i.e. there are at least one structure template with Z-score above the confidence cutoff, a new template-based QUARK pipeline is exploited to generate the structure prediction. In this pipeline, each replica in the REMC simulation starts from different top LOMETS templates.

The weights of the QUARK force field have been reparameterized in this pipeline to enhance the knowledge-based components derived from threading alignments. multiple-domain proteins: ThreaDom

BAKER-ROSETTASERVER Robetta is a fully automated structure prediction server that consists of three main steps: domain boundary identification, structure modeling, and domain assembly. Domain boundary identification: Domain boundaries are predicted by identifying PDB templates with optimal sequence similarity and structural coverage to the target through an iterative process. For each iteration, we use locally installed programs, HHSearch, Sparks, and Raptor, to identify templates and generate alignments. The target sequence is threaded onto the template structures to generate partial-threaded models, which are then clustered to identify distinct topologies that are ranked based on the likelihood of the alignments. Regions of the target sequence that are not covered by the partial-threads or are not similar in structure within the top ranked cluster are passed on to the next search iteration.

Structure modeling: For each predicted domain, models are generated using our comparative modeling protocol, RosettaCM, which recombines structural elements from the clustered partial-threads and models missing segments using a combination of fragment insertion and mixed torsion-Cartesian space minimization. For difficult domains, models are also generated using the Rosetta fragment assembly methodology (Rosetta Abinitio), and if GREMLIN contacts are predicted, they are used as restraints for sampling and refinement. All models are refined using a relax protocol that minimizes the Rosetta full-atom energy in torsion and Cartesian space to allow bond angle flexibility. Final models are selected by clustering the best scoring 100 models from each topologically distinct alignment cluster, and then averaging the models within each cluster and refining the final averaged models.

RaptorX RaptorX is a template-based protein modeling server. Not finished. To significantly advance homology detection and fold recognition, we have developed a Markov Random Fields (MRFs) modeling of an MSA (multiple sequence alignment). MRFs can model long-range residue interactions and thus, encodes information for the global 3D structure of a protein family. Each node is associated with a function describing position-specific amino acid mutation pattern. Similarly, each edge is associated with a function describing correlated mutation statistics between two columns.

To score the similarity of two MRFs, we use both node and edge alignment potentials, which measure the node (i.e., residue) similarity and edge (i.e., interaction pattern) similarity, respectively. To derive the node alignment potential, we use a set of 1400 protein pairs as the training data, which covers 458 SCOP folds. The reference alignment for a protein pair is generated by a structure alignment tool DeepAlign2. The edge alignment potential is derived from a software package EPAD3, which takes as input PSSM and residue interaction strength and outputs the inter-residue distance probability distribution. The interaction strength of two residues can be calculated by different ways. In current implementation we calculate the mutual information matrix (MI). It is computationally challenging to optimize the MRFalign scoring function due to the edge alignment potential. We formulate this problem as an integer programming problem and then develop an ADMM (Alternative Direction Method of Multipliers) algorithm to solve it efficiently to a suboptimal solution.

MULTICOM-CLUSTER The method was based on a conformation ensemble approach to protein tertiary structure prediction. The basic conformation ensemble protocol in MULTICOM-CLUSTER generated an ensemble of protein models for each target using multiple templates identified by more than a dozen of sequence/profile comparison tools (e.g., BLAST, PSI-BLAST, HHSearch, SAM, HMMer, MUSTER, RaptorX), combination of alternative target-template alignments, and complementary model generation tools.

An ensemble of hundreds (e. g An ensemble of hundreds (e.g., 150-250) of models generally approximated the near native conformations of a relatively easy target well if one or more homologous templates were identified for the target. For relatively hard targets for which no good template was found, additional tens of models selected from hundreds of template-free models generated by a fragment assembly based tool (i.e. Rosetta) were added into the ensemble in order to increase the diversity of the model pool. The conformations of all the chunks will be combined into a full-length model using Modeller.

The ensemble of models of a target were evaluated by several different methods, including the single-model absolute model quality assessment tool – ModelEvaluator, the fully pairwise model comparison tool-APOLLO, a protein energy calculation tool-SELECTpro, and the frequency of the templates (i.e., number of times that a template was chosen by different sequence/profile comparison tools) used to generate models if any. From the ensemble, MULTICOM-CLUSTER selected top five models ranked mostly by the APOLLO scores supplemented by other information. Trick (Chao comments) Furthermore, several exception handling strategies were applied to remove seemly bad models ranked in the top five models, including replacing template-based models with very low template coverage, filling in terminal regions of models not covered by any template by model combination, replacing the same models within top five models, removing models based on false positives of blast-based search.

Pcons PconsFold is a fully automated pipeline for ab-initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. PconsC2, is a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions.

Advantages

RaptorX: alignment accuracy, statistical model and learning methods MULTICOM: model selection, template selection Rosetta: assembly I-TASSER & QUARK: You see

We need to improve … Clean vs. Dirty Clean Discovery vs. Performance For understanding vs. For CASP Clean Single secondary structure element Interaction of pair SSEs Topology

Domain parsing: to avoid directly from threading alignments Template selection: to avoid from only p-value or threading raw scores Model generation: MODELLER is really NOT reliable when the gap length is over 15(?) Model selection: to avoid selection based on only dDFIRE score to develop: loop modeling tools template selection tools (SVM?) consensus based selection strategy