Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Slides:



Advertisements
Similar presentations
MULTICOM – A Combination Pipeline for Protein Structure Prediction
Advertisements

Development of automatic score to simulate manual assessment for CASP FM targets Qian Cong from Grishin lab.
Domains or not domains? ShuoYong Shi, Indraneel Majumdar and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of.
Volume 11, Issue 8, Pages (August 2003)
Structural Basis of Substrate Methylation and Inhibition of SMYD2
The loop E–loop D region of Escherichia coli 5S rRNA: the solution structure reveals an unusual loop that may be important for binding ribosomal proteins 
Mechanism and Substrate Recognition of Human Holo ACP Synthase
Conformational Changes of the Flavivirus E Glycoprotein
Crystallographic Structure of SurA, a Molecular Chaperone that Facilitates Folding of Outer Membrane Porins  Eduard Bitto, David B. McKay  Structure 
Volume 21, Issue 3, Pages (March 2013)
Volume 17, Issue 1, Pages (January 2005)
Structural Basis for Vertebrate Filamin Dimerization
Molecular Model of the Human 26S Proteasome
Volume 124, Issue 1, Pages (January 2006)
Volume 8, Issue 12, Pages (December 2000)
Volume 18, Issue 11, Pages (November 2010)
Volume 108, Issue 6, Pages (March 2002)
Volume 11, Issue 8, Pages (August 2003)
Volume 34, Issue 4, Pages (May 2009)
Volume 10, Issue 3, Pages (March 2002)
Volume 18, Issue 6, Pages (June 2010)
Volume 24, Issue 1, Pages (October 2006)
Structure and RNA Interactions of the N-Terminal RRM Domains of PTB
Molecular Basis of Lysosomal Enzyme Recognition: Three-Dimensional Structure of the Cation-Dependent Mannose 6-Phosphate Receptor  David L Roberts, Daniel.
Volume 13, Issue 2, Pages (February 2005)
David R Buckler, Yuchen Zhou, Ann M Stock  Structure 
Volume 16, Issue 4, Pages (November 2004)
Volume 19, Issue 5, Pages (May 2011)
Volume 94, Issue 4, Pages (August 1998)
De Novo Design of Foldable Proteins with Smooth Folding Funnel
Structure of a Human Inositol 1,4,5-Trisphosphate 3-Kinase
Crystal Structure of the Human High-Affinity IgE Receptor
Sandeep Kumar, Yuk Yin Sham, Chung-Jung Tsai, Ruth Nussinov 
Solution Structure of the Core NFATC1/DNA Complex
Hong Ye, Young Chul Park, Mara Kreishman, Elliott Kieff, Hao Wu 
Structural Insights into the Inhibition of Wnt Signaling by Cancer Antigen 5T4/Wnt- Activated Inhibitory Factor 1  Yuguang Zhao, Tomas Malinauskas, Karl.
Crystal Structure of Archaeal Recombinase RadA
Volume 5, Issue 3, Pages (March 2000)
A Conformational Switch in the CRIB-PDZ Module of Par-6
The 1.9 Å Structure of α-N-Acetylgalactosaminidase
Supertertiary Structure of the MAGUK Core from PSD-95
Volume 124, Issue 5, Pages (March 2006)
Structural Basis for Vertebrate Filamin Dimerization
Structural Basis for Protein Recognition by B30.2/SPRY Domains
Structure of the Human IgE-Fc Cε3-Cε4 Reveals Conformational Flexibility in the Antibody Effector Domains  Beth A. Wurzburg, Scott C. Garman, Theodore.
Volume 9, Issue 12, Pages (December 2001)
Elizabeth J. Little, Andrea C. Babic, Nancy C. Horton  Structure 
Fan Zheng, Jian Zhang, Gevorg Grigoryan  Structure 
Oliver Weichenrieder, Kostas Repanas, Anastassis Perrakis  Structure 
Volume 99, Issue 2, Pages (July 2010)
Volume 6, Issue 1, Pages (July 2000)
Volume 111, Issue 6, Pages (December 2002)
Volume 15, Issue 6, Pages (December 2001)
Structure of Dihydroorotate Dehydrogenase B
Volume 29, Issue 6, Pages (March 2008)
Solution Structure of a TBP–TAFII230 Complex
Structure of the Staphylococcus aureus AgrA LytTR Domain Bound to DNA Reveals a Beta Fold with an Unusual Mode of Binding  David J. Sidote, Christopher.
Crystal structures of Nova-1 and Nova-2 K-homology RNA-binding domains
Volume 23, Issue 4, Pages (April 2015)
Volume 12, Issue 11, Pages (November 2004)
Volume 127, Issue 7, Pages (December 2006)
Structure of the Oxygen Sensor in Bacillus subtilis
Volume 13, Issue 5, Pages (May 2005)
Sebastian Fritsch, Ivaylo Ivanov, Hailong Wang, Xiaolin Cheng 
The Structure of T. aquaticus DNA Polymerase III Is Distinct from Eukaryotic Replicative DNA Polymerases  Scott Bailey, Richard A. Wing, Thomas A. Steitz 
Structural Basis for Ligand Recognition and Activation of RAGE
High-Resolution Comparative Modeling with RosettaCM
Segmentation and Comparative Modeling in an 8
Crystal Structure of Escherichia coli RNase D, an Exoribonuclease Involved in Structured RNA Processing  Yuhong Zuo, Yong Wang, Arun Malhotra  Structure 
Presentation transcript:

Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas

1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

NF – new fold – historic category in CASP New fold: were there any? 2008 – where did the new folds go? 176 domains: 2 possibly new folds: ~1%

N-domain of T0397: 3d4r chain A residues New fold #1: N-domain of T0397

First models for T0397_1: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow- framed brown and black, correspond to bandwidths 1, 2 and 4 respectively. First server models for T0397_1

structure and topology diagrams of ferredoxin fold – fold closest to T0397_1 Most similar: ferredoxin-like fold

N-domain of T0496: 3do9 chain A, residues New fold #2: N-domain of T0496

First models for T0496_1: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow- framed brown and black, correspond to bandwidths 1, 2 and 4 respectively. First server models for T0496_1

structure and topology diagrams of RNAseH fold – fold closest to T0496_1 Most similar: RNAse H fold

1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

E.g.#1: T0460 Know fold: some predicted no better than new! First models for T0460: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

T0460: very difficult target Cartoon diagram of 460: 2k4n model 1 residues 1-52,67-10 Jumping through 20 NMR models of 2k4n

Cartoon diagram of 460: 2k4n model 1 residues 1-52,67-10 Cartoon diagram of NADH- quinone oxidoreductase: 2fug chain 5 residues T0460 is homologous to Nqo5 This homologous template was NOT FOUND BY ANY SERVER ! Why? Singleton sequence!

E.g.#2: C-domain of T0407 Know fold: some predicted no better than new! First models for T0407_2: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

Date: Mon, 2 Jun :56: (CDT) From: Nick Grishin To: David Baker Cc: Ruslan Sadreyev, Robert M Vernon Subject: Re: C-terminus of T0407 I liked IG because of 1) length; 2) ~7 strands; 3) many IG are interaction domains in enzymes. These are very compelling reasons.

Cartoon diagram of 407, C-domain: 3e38 chain A residues Cartoon diagram of VAP-A MSP Homology Domain: 3z9l T0407_2 has Immunoglobulin fold

IG-based Baker model Top GDT server model: Phyre_de_novo TS1 No server predicted IG fold for T0407_2 Cartoon diagram of 407, C-domain: 3e38 chain A residues

1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

T0465: who found the template? HHpred !!!

T0465 is a diverged FYSH domain FYSH domain of hypothetical protein AF0491: 1t95 chain A residues Cartoon diagram of T0465: 3dfd chain A residues

T0465 fold is predicted by HHpred HHpred2 TS1 Cartoon diagram of T0465: 3dfd chain A residues Falcon TS1

1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

T0467: most interesting target ! Bioinfo.pl provides these predictions:

T0467: is bioinfo.pl correct ?

T0467 OB-fold C-terminal fragment: 2k5q model 1 residues Sso7d SH3-fold C-terminal fragment: 2bf4 chain A residues You can say so (if you want)

However, only local prediction is correct: extending it to cover the domain results in a wrong fold prediction ! T0467 OB-fold: 2k5q model 1 residues 7-97 Sso7d SH3-fold: 2bf4 chain A

1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

T0510: “server only” target with a twist Cartoon diagram of 510 domains: 3doa, N-, middle and C-domains are shown in blue, green and red, respectively. Cartoon diagram of MutM domains: 1ee8_A, N-, middle and C-domains are shown in blue, green and red, respectively.

Closer look at the N-domains reveals large topological differences N-domain of 510: 3doa residues N-domain of MutM: 1ee8 chain A residues insertion close to the N-terminus is red insertion in the middle of the domain is blue

N-domains are nevertheless homologous

1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

T0480: easy alignment with templates

NADH pyrophosphatase intervening domain 1vk6: residues Ribbon diagram of 480: 2k4x model 1 residues Zinc ion is shown in magenta and side chains of its ligands (four Cys) are displayed. T0480: most predictions had an error 480 MULTICOM-CLUSTER TS1

Jumping through 20 NMR models of 2k4x Ribbon diagram of 480: 2k4x model 1 residues Zinc ion is shown in magenta and side chains of its ligands (four Cys) are displayed. T0480: unusual bulge

T0480: bulge could have been predicted

Summary: 1. New folds constitute less than 2% of newly solved non-redundant structures. 2. Many known folds cannot be predicted because templates are impossible to find. 3. Globalization of correct local alignment may or may not yield correct fold prediction. 4. Large structural changes happen in protein cores. 5. Careful inspection of alignments may solve some modeling problems.

Acknowledgement Our group Collaborators HHMI, NIH, UTSW, The Welch Foundation Shuoyong Shi Jing Tong Ruslan Sadreyev Lisa Kinch Jimin Pei Ming Tang Sasha Safronova Yuan Qi Hua Cheng Jamie Wrabl Indraneel Majumdar Erik Nelson Yong Wang S. Sri Krishna Bong-Hyun Kim Dorothee Staber David Baker U. Washington Kimmen Sjölander UC Berkeley William Noble U. Washington