Download presentation
Presentation is loading. Please wait.
1
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Comprehensive strategy for integrated target selection in structural genomics Burkhard Rost CUBIC Columbia University http://cubic.bioc.columbia.edu/mis/talks/ http://cubic.bioc.columbia.edu
2
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Comprehensive strategy for integrated target selection Our research goal and current reality Unit: sequence-structure families Goals: cover all entire families with good models STAGE 1: CHOP + CLUP + filtering -> novel automatic organization of sequence-structure space STAGE 2: Refined, manual selection -> model all family members? stop-work/hold-work? STAGE 3: Explore experimental structure Answers and perspectives How many structures needed for completion? Euka-proka-archae: overlap? Why collaborate on targets? Multiplexing helpful? High-throughput protein production in eukaryotes?
3
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Computational biology & bioinformatics
4
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Sequence-structure family Sequence-structure family U’ Sequence-structure family U
5
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) EVA: comparative modelling V Eyrich, MA Marti-Renom, D Przybylski, A Fiser, F Pazos, A Valencia, A Sali & B Rost (2001) Bioinformatics 17, 1242-1243 MA Marti-Renom, MS Madhusudhan, A Fiser, B Rost, A Sali (2002) Structure 10, 435-440 Marc Marti Renom & Andrej Sali (UCSF) http://eva.compbio.ucsf.edu/~eva/cm/ http://cubic.bioc.columbia.edu/eva AccuracyCoverage Cumulative distribution PSI-BLAST 10 -3
6
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) How to decide when we exclude/include? C Sander & R Schneider 1991 Proteins, 9, 56-68 B Rost 1999 Prot Engng, 12, 85-94
7
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Scooping families from proteomes, in practice Problems: domains overlaps
8
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Choose targets: single-linkage clustering Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins, in press Liu & Rost 2003 Proteins, submitted ~100,000 eukaryotic proteins (yeast, fly, worm, weed, human) 22 112 clusters 46 318 in largest cluster NONSENSE! Conclusions: NO clustering of full- length proteins have to chop into structural-domain- like fragments (single-linkage DOES work on PrISM)
9
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) CHOP proteins into structural domains Liu & Rost 2003 Proteins, submitted
10
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) CHOP: dissection of proteins into domains Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins, in press Liu & Rost 2003 Proteins, submitted Single-domain proteins: 61% in PDB 28% in 62 proteomes Average domain length in proteins ≥ 2 domains: ~100 residues in proteins with 1 domain: 1.7-3 times longer
11
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) To take or not to take Take if > 50 globular residues and no known 3D
12
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Structural residue coverage in reality (any) J Liu & B Rost 2002 Bioinformatics, 18, 922-933 53% of residues to do ! ~28%~19%
13
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) If you believe 53% is pessimistic... 53% residue coverage today based on E-value 1!!
14
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Clustering after CHOP 103 796 eukaryotic proteins (Yeast, Fly, Worm, Arabidopsis, Human/30) 247 222domain-like fragments 167 717 no PDB (E-value 10-1, HSSP-distance -3) 44 718 not good 4 us (membrane, coil, SEG, NORS, signal peptide) 122 9992 go 95 330non-singleton Liu, Montelione & Rost 2003 Proteins, in press Jinfeng 21,000 fragment clusters
15
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Computational biology & bioinformatics
16
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Main goal of Stage 2 analysis Refine Stage 1 automatic target selection through manual sequence analysis Concept: USE comparative modeling and structural features directly for refined target selection For each sequence-structure family from Stage 1: predict minimal set of exp. structures needed to high-quality model entire family. Diana Murray, Cornell
17
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) 1. Fold recognition and sequence-to-structure profiles 2. Comparative modeling (PrISM, Nest) 3. Structure evaluation tools (e.g. Verify3d) 4. Calculate biophysical properties Recommend 2 do additional structure if: 1) NESG-cluster members poorly modeled 2) Biophysical properties of models incompatible with known function 3) Models suggest novel functionality Toolbox Input: PDB + NESG cluster Refinement protocol 4 new 3D Target re-prioritization based on weekly PDB updates Diana Murray, Cornell
18
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) TargetStatus IR21solved, PDB: 1MOS ET28Purified JR15Expressed TT777Expressed GR7Expressed AR12Cloned WR204Selected XR4Expressed Stop work SPINE/ ZebaView Experimental structure of IR21 yielded high-quality models for all members of this NESG sequence/structure family Example of stop work recommendation Diana Murray, Cornell
19
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) NESG family: HR291 (99% identical to 1P9O), AR1731, HR2295, KR12, DR11 breaks into two clusters: A = (HR291, AR1731, HR2295) and B = (KR12, DR11) Two structures required to cover family: Predicted by Stage 2 analysis and verified by Stage 3 analysis HR291 AR1731 HR2295 HR291 AR1731 HR2295 HR291 AR1731 HR2295 KR12 DR11 KR12 DR11 A B Recommendation: Solve structure of KR12 (purified) Diana Murray, Cornell
20
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Archaeal structure NESG ID: GR2; PDB ID: 1QXF Archaeoglobus fulgidis S27e protein has only archae and eukaryotic members. Archae and eukaryotes share conserved hydrophobic motif (yellow). Only eukaryotes have N-terminal extension, and their models have strikingly different electrostatic properties. Human protein recommended for structure determination! Model suggests novel function: 30S ribosomal protein S27 Model for human homologue Diana Murray, Cornell
21
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Summary Stage 2 refinement Statistics: Many families currently under investigation Hold work recommendation: family member at advanced experimental stage predicted to yield good models for entire family -> hold-work for members at early exp. stages re-assess once structure done! Diana Murray, Cornell familiestargetsresult 62200+145stop-work 40110hold-work 12 another 3D
22
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Computational biology & bioinformatics
23
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Exploit structure to speculate about function 43 no previous annotation about function defined by ‘no publication in biological journal’ 39 analyzed 31 result in some predictions about function 8 clear success: functional annotation achieved e.g. predicted active site based on structure typically: conformation of annotation transfer 23 some hints (16 ‘hypothetical proteins’) e.g. some clue about active site mostly completely new! 8 no clue Sharon Goldsmith & Barry Honig
24
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Answers How many structures needed for completion? Euka-proka-archae: overlap? Why collaborate on targets? Multiplexing helpful? High-throughput protein production in eukaryotes? How many structures needed for completion? Euka-proka-archae: overlap? Why collaborate on targets? Multiplexing helpful? High-throughput protein production in eukaryotes?
25
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) How many targets for prokaryotes + archae? 16,000 min 8,000 give: 72% fragments 72% proteins 67% residues
26
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) How many targets for euka-proka-archae? 8,000 8,000 give: 67% fragments 67% proteins 59% residues BUT: 50% of residues remaining
27
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Overlap between euka-proka-archae? surprisingly small overlap overall even lower for largest families most big families are eukaryotic! ~60% of fragments from eukaryotes no sequence- structure family member from prokaryotes or archae much higher for ‘largest 8,000’: 2,690 (34%) proka+archae only 4,277 (53%) euka only 1,033 (13%) mix
28
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Why collaborate on target list? competition between consortia has already hampered success-rate considerably! 32% overlap
29
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Does multiplexing help? Date: 2003-07-28 Multiplex DOUBLES success rate! ~4%
30
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Integrated strategy NESG unique, comprehensive, integrated strategy optimized to organize sequence space in structural terms: Stage 1: CHOP+CLUP+filter yields high success in focusing on sequence-structure families Stage 2: detailed refinement embeds comparative models into selection and optimizes structural coverage for family Stage 3: use experimental structure to increase structural family coverage and to allow functional exploitation Needed to do ‘em all: ~38,000 non-singletons 8,000 largest -> 50% of the residues that remain! Genomics: Surprises + our structural perspective changed the ‘world’! The revolutions continue...
31
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Thanksgiving $$: NIH/NSF Data: Jinfeng Liu (CUBIC) Hedi Hegyi & Phil Carter (CUBIC), Marc-Marti Renom (UCSD) NESG: Guy Montelione (Rutgers) Barry Honig (Columbia) Diana Murray (Cornell, NYC) Tom Acton (Rutgers), Liang Tong & John Hunt (Columbia), George DeTitta (Buffalo), Cheryl Arrowsmith (Toronto) Wayne Hendrickson (Columbia) EVA: Andrej Sali & Marc-Marti Renom (UCSD), Alfonso Valencia (Madrid) Volker Eyrich, Ingrid Koh & Dariusz Przybylski (CUBIC) Data: Jinfeng Liu (CUBIC) Hedi Hegyi & Phil Carter (CUBIC), Marc-Marti Renom (UCSD) NESG: Guy Montelione (Rutgers) Barry Honig (Columbia) Diana Murray (Cornell, NYC) Tom Acton (Rutgers), Liang Tong & John Hunt (Columbia), George DeTitta (Buffalo), Cheryl Arrowsmith (Toronto) Wayne Hendrickson (Columbia) EVA: Andrej Sali & Marc-Marti Renom (UCSD), Alfonso Valencia (Madrid) Volker Eyrich, Ingrid Koh & Dariusz Przybylski (CUBIC)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.