Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,

Similar presentations


Presentation on theme: "Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,"— Presentation transcript:

1

2 Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas http://prodata.swmed.edu/CASP8

3 1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

4 NF – new fold – historic category in CASP New fold: were there any? 2008 – where did the new folds go? 176 domains: 2 possibly new folds: ~1%

5 N-domain of T0397: 3d4r chain A residues -7-82 New fold #1: N-domain of T0397

6 First models for T0397_1: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow- framed brown and black, correspond to bandwidths 1, 2 and 4 respectively. First server models for T0397_1

7 structure and topology diagrams of ferredoxin fold – fold closest to T0397_1 Most similar: ferredoxin-like fold

8 N-domain of T0496: 3do9 chain A, residues 4-126 New fold #2: N-domain of T0496

9 First models for T0496_1: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow- framed brown and black, correspond to bandwidths 1, 2 and 4 respectively. First server models for T0496_1

10 structure and topology diagrams of RNAseH fold – fold closest to T0496_1 Most similar: RNAse H fold

11 1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

12 E.g.#1: T0460 Know fold: some predicted no better than new! First models for T0460: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

13 T0460: very difficult target Cartoon diagram of 460: 2k4n model 1 residues 1-52,67-10 Jumping through 20 NMR models of 2k4n

14 Cartoon diagram of 460: 2k4n model 1 residues 1-52,67-10 Cartoon diagram of NADH- quinone oxidoreductase: 2fug chain 5 residues 1-106 T0460 is homologous to Nqo5 This homologous template was NOT FOUND BY ANY SERVER ! Why? Singleton sequence!

15 E.g.#2: C-domain of T0407 Know fold: some predicted no better than new! First models for T0407_2: Gaussian kernel density estimation for GDT-TS scores of the first server models, plotted at various bandwidths (=standard deviations). The GDT-TS scores are shown as a spectrum along the horizontal axis: each bar represents first server model. The bars are colored green, gray and black for top 10, bottom 25% and the rest of servers. The family of curves with varying bandwidth is shown. Bandwidth varies from 0.3 to 8.2 GDT-TS % units with a step of 0.1, which corresponds to the color ramp from magenta through blue to cyan. Thicker curves: red, yellow-framed brown and black, correspond to bandwidths 1, 2 and 4 respectively.

16 Date: Mon, 2 Jun 2008 23:56:39 -0500 (CDT) From: Nick Grishin To: David Baker Cc: Ruslan Sadreyev, Robert M Vernon Subject: Re: C-terminus of T0407 I liked IG because of 1) length; 2) ~7 strands; 3) many IG are interaction domains in enzymes. These are very compelling reasons.

17 Cartoon diagram of 407, C-domain: 3e38 chain A residues 277-363 Cartoon diagram of VAP-A MSP Homology Domain: 3z9l T0407_2 has Immunoglobulin fold

18 IG-based Baker model Top GDT server model: Phyre_de_novo TS1 No server predicted IG fold for T0407_2 Cartoon diagram of 407, C-domain: 3e38 chain A residues 277-363

19 1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

20 T0465: who found the template? HHpred !!!

21 T0465 is a diverged FYSH domain FYSH domain of hypothetical protein AF0491: 1t95 chain A residues 11-94 Cartoon diagram of T0465: 3dfd chain A residues 21-136

22 T0465 fold is predicted by HHpred HHpred2 TS1 Cartoon diagram of T0465: 3dfd chain A residues 21-136 Falcon TS1

23 1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

24 T0467: most interesting target ! Bioinfo.pl provides these predictions:

25 T0467: is bioinfo.pl correct ?

26 T0467 OB-fold C-terminal fragment: 2k5q model 1 residues 64-97 Sso7d SH3-fold C-terminal fragment: 2bf4 chain A residues 30-64 You can say so (if you want)

27 However, only local prediction is correct: extending it to cover the domain results in a wrong fold prediction ! T0467 OB-fold: 2k5q model 1 residues 7-97 Sso7d SH3-fold: 2bf4 chain A

28 1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

29 T0510: “server only” target with a twist Cartoon diagram of 510 domains: 3doa, N-, middle and C-domains are shown in blue, green and red, respectively. Cartoon diagram of MutM domains: 1ee8_A, N-, middle and C-domains are shown in blue, green and red, respectively.

30 Closer look at the N-domains reveals large topological differences N-domain of 510: 3doa residues 1-165 N-domain of MutM: 1ee8 chain A residues 1-121 insertion close to the N-terminus is red insertion in the middle of the domain is blue

31 N-domains are nevertheless homologous

32 1. New folds: 397_1, 496_1; 2. A few known folds are predicted no better than new folds: 460, 407_2; 3. Short motif recognition = success: 465; 4. Short motif recognition = failure: 467; 5. Structural changes not predicted: 510; 6. Inspect your alignments carefully: 480

33 T0480: easy alignment with templates

34 NADH pyrophosphatase intervening domain 1vk6: residues 94-127 Ribbon diagram of 480: 2k4x model 1 residues 17-50. Zinc ion is shown in magenta and side chains of its ligands (four Cys) are displayed. T0480: most predictions had an error 480 MULTICOM-CLUSTER TS1

35 Jumping through 20 NMR models of 2k4x Ribbon diagram of 480: 2k4x model 1 residues 17-50. Zinc ion is shown in magenta and side chains of its ligands (four Cys) are displayed. T0480: unusual bulge

36 T0480: bulge could have been predicted

37 Summary: 1. New folds constitute less than 2% of newly solved non-redundant structures. 2. Many known folds cannot be predicted because templates are impossible to find. 3. Globalization of correct local alignment may or may not yield correct fold prediction. 4. Large structural changes happen in protein cores. 5. Careful inspection of alignments may solve some modeling problems.

38 Acknowledgement Our group Collaborators HHMI, NIH, UTSW, The Welch Foundation Shuoyong Shi Jing Tong Ruslan Sadreyev Lisa Kinch Jimin Pei Ming Tang Sasha Safronova Yuan Qi Hua Cheng Jamie Wrabl Indraneel Majumdar Erik Nelson Yong Wang S. Sri Krishna Bong-Hyun Kim Dorothee Staber David Baker U. Washington Kimmen Sjölander UC Berkeley William Noble U. Washington


Download ppt "Lessons from CASP targets ShuoYong Shi, Lisa Kinch, Jimin Pei, Ruslan Sadreyev, and Nick V. Grishin Howard Hughes Medical Institute, Department of Biochemistry,"

Similar presentations


Ads by Google