Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do not reproduce without permission 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Protein Structure: Surveys of a.

Similar presentations


Presentation on theme: "Do not reproduce without permission 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Protein Structure: Surveys of a."— Presentation transcript:

1 Do not reproduce without permission 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List Talk at Keck Symposium 02.04.11

2 Do not reproduce without permission 2 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List real thing, Apr ‘00 ‘98 spoof

3 Do not reproduce without permission 3 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein folds & families as scaffolds simplifying complex genomics data The next step: ~1000 folds ~30000 genes ~1000 genes (human) (T. pallidum) Different than genomics

4 Do not reproduce without permission 4 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Structural Genomics: World-wide Effort to Map Fold Space, determining a periodic table for biology 7 Original NIH Centers focusing on large-scale structure determination  Eisenberg, Kim, Burley, Prestegard  Part of NESG (Montelione) Computational Effort to Classify Folds  Measure Flexibility

5 Do not reproduce without permission 5 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Structure: Surveys of a Finite Parts List PartsList integrative DB system  Map folds & families onto genomes  Part of world-wide effort to chart "fold universe" Identify unique folds & families in small genomes  Significance: suggests targets for further work and for drugs (antibiotics)  non-symmetrical folds, tending not to be  Characterize folds & families in pseudogenes  Different distribution of  G compared to genes On chromosome ends Environmental response proteins  Worm survey; Suggested mechanism in yeast  Significance: fossils shed light on past history & future evolution Future Directions

6 Do not reproduce without permission 6 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Structure: Surveys of a Finite Parts List PartsList integrative DB system  Map folds & families onto genomes  Part of world-wide effort to chart "fold universe" Identify unique folds & families in small genomes  Significance: suggests targets for further work and for drugs (antibiotics)  non-symmetrical folds, tending not to be  Characterize folds & families in pseudogenes  Different distribution of  G compared to genes On chromosome ends Environmental response proteins  Worm survey; Suggested mechanism in yeast  Significance: fossils shed light on past history & future evolution Future Directions

7 Do not reproduce without permission 7 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu PartsList.org ORF Query Alignment Server Alignment Database PDB Query Detailed Tables RanksTrees Integrative Database System

8 Do not reproduce without permission 8 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu PartsList Aspects #1: Merging definitions of protein folds, characterize fold flexibility Globins

9 Do not reproduce without permission 9 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu PartsList Aspects #1: Merging definitions of protein folds, characterize fold flexibility FAD/NAD-linked reductasesGlobins

10 Do not reproduce without permission 10 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu PartsList Aspects #2 & #3 Map Folds onto Genomes  Large calculation (50 CPU days for worm)  Parallel computing + large DBs Protein DB chromosome Annotation Integrating heterogeneous, dynamically changing annotation  Changing sequences, gene predictions, repeats Sequence 1 Sequence 2 Sequence 3 Genes A Repeats 1 Genes B Repeats 2

11 Do not reproduce without permission 11 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu PartsList Aspects #4: Tracking Database for NESG Consortium

12 Do not reproduce without permission 12 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Structure: Surveys of a Finite Parts List PartsList integrative DB system  Map folds & families onto genomes  Part of world-wide effort to chart "fold universe" Identify unique folds & families in small genomes  Significance: suggests targets for further work and for drugs (antibiotics)  non-symmetrical folds, tending not to be  Characterize folds & families in pseudogenes  Different distribution of  G compared to genes On chromosome ends Environmental response proteins  Worm survey; Suggested mechanism in yeast  Significance: fossils shed light on past history & future evolution Future Directions

13 Do not reproduce without permission 13 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu A Parts List Approach to Bike Maintenance

14 Do not reproduce without permission 14 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu A Parts List Approach to Bike Maintenance What are the shared parts (bolt, nut, washer, spring, bearing), unique parts (cogs, levers)? What are the common parts - - types of parts (nuts & washers)? How many roles can these play? How flexible and adaptable are they mechanically? Where are the parts located? Which parts interact?

15 Do not reproduce without permission 15 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Common Folds Worm E. coli Extracellular, signaling and trans. factor folds (Ig, kinase, lectin, nuc. receptor, ZnF) Metabolic Folds (TIM, Ferrodoxin, Rossman, P-loop hydrolase, FAD- binding) (partslist.org)

16 Do not reproduce without permission 16 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Shared Folds of 339 worm yeast E. coli 149 16 2142 843 35

17 Do not reproduce without permission 17 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu (human) (T. pallidum) Drug (Pathogen only folds as possible targets) Practical Relevance of Structural Genomics OspA protein  in Lyme-disease spirochete B. burgdorferi  previously identified as the antigen for vaccine  novel fold [Lawson]

18 Do not reproduce without permission 18 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Unique folds tend to be "non-symmetric"  all-  all-  small Unique Common * symmetrical

19 Do not reproduce without permission 19 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Prospective Target Selection Identify 23 proteins in M. genitalium that are most atypical structurally Characterize 11 of these with homologs biophysically by CD L Regan

20 Do not reproduce without permission 20 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu M. gen. CD results S U N N S U Strange Normal Unstructured Ellipticity S N T--> normal melt no melt

21 Do not reproduce without permission 21 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Structure: Surveys of a Finite Parts List PartsList integrative DB system  Map folds & families onto genomes  Part of world-wide effort to chart "fold universe" Identify unique folds & families in small genomes  Significance: suggests targets for further work and for drugs (antibiotics)  non-symmetrical folds, tending not to be  Characterize folds & families in pseudogenes  Different distribution of  G compared to genes On chromosome ends Environmental response proteins  Worm survey; Suggested mechanism in yeast  Significance: fossils shed light on past history & future evolution Future Directions

22 Do not reproduce without permission 22 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes (  G) as Disabled Homologies S/T Protein Phosphatase PP1 (C-term) …SRILCMHGGLSPHLQTLDQLRQLPRPQDPPNPSIGIDLLWADPDQWVKGWQAN TRGVSYVFGQDVVADVCSRLDIDLVARAHQVVQDGYEFFASKKMVTIFSAPHYC GQFDNSAATMKVDENMVCTFVMYKPTPKSMRRG* IIIIIIIVVX Worm Genome Pseudogenic fragment TKRTSNGFGQDVVVDLFSILDSGLVARAHX VLQDIFEFFASKKMVTIFS # APHSPHSAPH YCAQFDNSAATVKV Most Multiply Disabled #

23 Do not reproduce without permission 23 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Large-scale Assignment of Pseudogenes (*Chr 21+22 only)

24 Do not reproduce without permission 24 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu  G: Questions 1)What are they composed of? (composition) 2)Where are they? (which organisms, chr. position) 3)Which type of proteins are they? Why? (functions)

25 Do not reproduce without permission 25 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Worm  G: Lots! Good Stats 1)What are they composed of? (composition) 2)Where are they? (which organisms, chr. position) 3)Which type of proteins are they? Why? (functions)

26 Do not reproduce without permission 26 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Amino-acid composition of Pseudogenes is midway between Genes and translated Intergenic DNA Worm Amino Acid (sorted) Frequency

27 Do not reproduce without permission 27 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Genes Default  G Distribution chr. X  G Distribution on Worm Chromosomes: On Ends

28 Do not reproduce without permission 28 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Genes Observed  G Distribution chr. X  G Distribution on Worm Chromosomes: On Ends I II III IV V

29 Do not reproduce without permission 29 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Default: #  G  #genes, in a family RT 28 59 # pseudogenes in family # genes in family

30 Do not reproduce without permission 30 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu 0 Completely Dead Families in Worm Genome Extinction?

31 Do not reproduce without permission 31 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu 0 Completely Dead Families in Worm Genome Extinction? Horz. Transfer?

32 Do not reproduce without permission 32 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu 0 Completely Dead Families in Worm Genome Extinction? Horz. Transfer? Contamination?

33 Do not reproduce without permission 33 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Worm  G families: chemoreceptors & transposon functions Environ- mental Response Family

34 Do not reproduce without permission 34 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Worm  G families: Unique or highly expanded relative to fly Environ- mental Response Family

35 Do not reproduce without permission 35 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Most Common Worm “Pseudofolds” #1 genes rank

36 Do not reproduce without permission 36 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Most Common Worm “Pseudofolds” #1 genes rank

37 Do not reproduce without permission 37 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Common worm Pseudofolds [Scop, Murzin]  G rank Genes rank Genes rank

38 Do not reproduce without permission 38 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Common worm Pseudofolds [Scop, Murzin]  G rank Genes rank Genes rank

39 Do not reproduce without permission 39 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Yeast  G: Simple Story & Mechanism 1)What are they composed of? (composition) 2)Where are they? (which organisms, chr. position) 3)Which type of proteins are they? Why? (functions)

40 Do not reproduce without permission 40 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Yeast  G concentrated near telomeres Pseudogenes Genes ……

41 Do not reproduce without permission 41 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Environmental response functions of yeast  G 5 most common families in  G Not same as the most common families in genes Environ- mental Response Family

42 Do not reproduce without permission 42 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Yeast  G come from yeast-specific families GG genes Fraction having a non-yeast homolog 40%80%

43 Do not reproduce without permission 43 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Resurrecting pseudogenes: is it possible? Hypothetical example of a flocculin Idea of "untranslatable intermediates" in protein evolution has been around for a while [Nei, '70; Koch, '72] [Walsh] Functioning FLO8 causes filamentous growth in most strains [ Fink ] FLO8 disabled in lab strain (S288C)

44 Do not reproduce without permission 44 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu A speculative mechanism for resurrecting yeast  G, via [PSI+], perhaps in environmental response [PSI+] [ Lindquist ] Prion of Sup35p, translation-termination protein Causes read-through of stops Causes phenotypic diversity, through the expression of new or altered proteins [Partridge]

45 Do not reproduce without permission 45 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu We find suggestive evidence that PSI resurrects  G: 35  G easily resurrectable with only 1 stop Microarrays show some of these are expressed [M Snyder] Many involved in environmental response A speculative mechanism for resurrecting yeast  G, via [PSI+], perhaps in environmental response

46 Do not reproduce without permission 46 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Practical Backdrop to Integrated Gene Annotation & Interpretation of DNA Arrays 137 potential new yeast genes Integrated approach: homology search + transposons + microarrays Small ORFs & anti-sense to existing ORFs [Snyder] I IIIII IV V VI VII VIII

47 Do not reproduce without permission 47 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Structure: Surveys of a Finite Parts List PartsList integrative DB system  Map folds & families onto genomes  Part of world-wide effort to chart "fold universe" Identify unique folds & families in small genomes  Significance: suggests targets for further work and for drugs (antibiotics)  non-symmetrical folds, tending not to be  Characterize folds & families in pseudogenes  Different distribution of  G compared to genes On chromosome ends Environmental response proteins  Worm survey; Suggested mechanism in yeast  Significance: fossils shed light on past history & future evolution Future Directions

48 Do not reproduce without permission 48 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Future Plans: Extension to Human Genome, as part of CEGS Human Arrays Center NIH CEGs Center on Human DNA Arrays All chr22 & further chromosomes on chip in ~1kb chunks, probe for expression Need mapped landscape (genes,  Gs, repeats, SNPs, &c) to design chip & interpret results Analysis of folds in human genome allow identification of unique pathogen folds 100 Mb worm to 3Gb Human! Large infrastructure

49 Do not reproduce without permission 49 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Impact of Keck Funds '98: Want to do -omic science  not DIY  Large-scale, expensive, require collaborations (clubby) '99: Keck !!  $  preliminary results  credibility  collaborations '00: NESG structural genomics center '01: CEGS human arrays center '02+... stay on budget but scale up!

50 Do not reproduce without permission 50 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Future Plans: scale up infrastructure to human genome proportions Informatics Core with Robust Infrastructure for Human Genome Computing (DB admins, Techs, Servers, Programmers) CEGs Human Arrays Center NESG (SG Center) Yale Center for Genomics & Proteomics Keck Flexible funds Constrained Funds

51 Do not reproduce without permission 51 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Structure: Surveys of a Finite Parts List PartsList integrative DB system  Map folds & families onto genomes  Part of world-wide effort to chart "fold universe" Identify unique folds & families in small genomes  Significance: suggests targets for further work and for drugs (antibiotics)  non-symmetrical folds, tending not to be  Characterize folds & families in pseudogenes  Different distribution of  G compared to genes On chromosome ends Environmental response proteins  Worm survey; Suggested mechanism in yeast  Significance: fossils shed light on past history & future evolution Future Directions

52 Do not reproduce without permission 52 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Acknowledgements Unique Folds in Pathogens Hedi Hegyi, Jimmy Lin, Dov Greenbaum Analysis of Pseudogenes Paul Harrison, Zhaolei Zhang, Nathaniel Echols, Suganthi Balasubramanian, Nicholas Luscombe, Paul Bertone, Ted Johnson, Patrick McGarvey Other Projects Yang Liu, Jochen Junker, Rajdeep Das, Ronald Jansen, Amar Drawid,Yuval Kluger, Hayiuan Yu Collaborators CEGS: M Snyder, S Weissman... (A Kumar, H Zhu, M Bilgin, C Horack …) NESG: G Montelione, C Arrowsmith, A Edwards, L Regan… PartsList.org, GeneCensus.org, NESG.org PartsList System Jiang Qian, Vadim Alexandrov, Werner Krebs, Brad Stenger, Cyrus Wilson, Ning Lan


Download ppt "Do not reproduce without permission 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparing Genomes in terms of Protein Structure: Surveys of a."

Similar presentations


Ads by Google