Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio
1.Domain parsing and assembly 2.Alignment cluster ranking 3.Sequence covariance restraints (GREMLIN) 4.Plus various bug fixes! CAMEO Benchmark Updates since last CASP
Robetta Server Pipeline RosettaCM (template hybridization) All targetsHard RosettaAB (fragment assembly) Difficulty prediction Model selection Domain assembly HHSearchRaptorXSPARKS-X Domain parsing, template alignment and spatial restraint generation Sequence For each domain Informatics Modeling
Alignment clusters HHSearchRaptorXSPARKS-X PDB100 Cluster partial threads for distinct topologies Up to 10 alignments from each method Rank clusters by P(correct) – probability that an alignment is within Δ GDT of the best Probability distribution varies considerably with target difficulty Easy targetHard target
Target Difficulty Prediction GDT and predicted difficulty correlation Used in: Domain parsing Modeling decision making Run RosettaAB also if < 0.2 (twilight regime) Amount of RosettaCM sampling Very easy targets run locally on cluster (>0.80 and sequence identity > 40%): Runs on Easy (>0.80): 2000 Medium (>0.30): 4000 Hard (<=0.30): 8000 Predict target difficulty based on the degree of structural consensus between the top-ranked alignment from each threading program.
Domain Parsing Objective is to identify optimal non-overlapping alignment clusters 1.Run alignment method and partial-thread clustering on sequence 2.Identify potential “chunks” based on windowed difficulty of top alignment cluster 30 residue window 0.1, 0.2, 0.3 difficulty thresholds 30 residue minimum domain length 3.Boundaries are fine-tuned using PSIPRED loop probability 4.All potential “chunks” are run through steps 1 to 3. 5.Final “chunks” are selected based on difficulty. 6.For twilight “chunks” (difficulty < 0.2) parsing is also based on Pfam and MSA (CASP9 GINZU method)
Modeling Method Sequence Template Alignments Sequence-based fragments Restraint functions Threaded templates Gradient-based energy minimization/lo op closure Torsion space fragment insertion Cartesian space template chunk recombination Full-atom refinement Gremlin
BAKER SERVER MODELS BAKER SERVER MODELS ALIGNMENT HHblits Jackhmmer 90% identity redundancy cutoff To find at least 2L sequences vary evalue and coverage 1e-20 to 1e-4 and 75% to 50% If >= 1L Sequences Contact PREDICTIONS Contact PREDICTIONS GREMLIN TARGET GREMLIN used if difficulty = 1L Used for 27 domains
RosettaCM performance using GREMLIN CASP11 targets Best vs Best (rerun without GREMLIN)Model1 vs Model1 (rerun without GREMLIN) T0768-D1 GREMLIN is used in ranking alignment clusters and sampling
RosettaAB performance using GREMLIN CASP11 targets T0789-D1 T0790-D2 Best vs Best (rerun without GREMLIN)Model1 vs Model1 (rerun without GREMLIN) GREMLIN is used in sampling
T0789-D1 trimmed (76 aa) GREMLIN predicted contacts helped w/ ~2L sequences Domain over parsed as vs official parse 2.84 Å RMSD over 71 res NATIVE T0789-D1 Models generated: Top scoring models clustered: 4519
T0790-D2 (130aa) GREMLIN predicted contacts helped w/ ~3L sequences Domain under parsed as vs official parse NATIVE T0790-D2 Models generated: Top scoring models clustered: Å RMSD over 92 res
T0767-D2 (180aa) Domain correctly parsed as vs official parse 3.99 Å RMSD over 92 res Models generated: Top scoring models clustered: 1565 MSA based domain parse was accurate
What went wrong 1.Ranking Twilight target ranking (to ab initio or not to ab initio that is the ?) New hybrid domain assembler (T0840-D1, T0852-D2) 2.Informatics Domain parse errors (T0808-D1, T0812-D1) Incorrect template (T0816-D1)
Ranking improved by a simple switch for twilight targets Submitted model 1 (CM) vs Submitted model 2 (AB) Simply choose submitted model 2 for AB targets (difficulty < 2.0)
Model1 CM GDT vs Model1 AB GDT Colored by difficulty
Acknowledgements Hetunandan Kamichetty (GREMLIN) Johannes Söding (HHpred) Jinbo Xu (RaptorX) Yaoqi Zhou and Yuedong Yang (Sparks-X) Rosetta Commons David Baker users for generous computing resources Juergen Haas (CAMEO CASP organizers, assessors, structural biologists who provided structures Andriy Kryshtafovych