Presentation is loading. Please wait.

Presentation is loading. Please wait.

JCSG Bioinformatics core overview: 2006

Similar presentations


Presentation on theme: "JCSG Bioinformatics core overview: 2006"— Presentation transcript:

1 JCSG Bioinformatics core overview: 2006

2 BIC - last two years Organizational and personal changes
Two sites (UCSD & Burnham) Six people left, five new people hired Transformation to a production center Core tools developed, but still significant tool development Increasing role of data analysis

3

4 Bioinformatics - convergence of methods, but also challenges
Maximizing production Data management for high throughput Covering the universe of proteins with structures Maximizing impact of structures Making sense of structures one at a time Understanding protein universe using structures

5 Bioinformatics core of JCSG - integration within and outside
Integrating data across JCSG Flow of data connects cores across physical locations, different proteins “intuitive crystallography” doesn’t scale up to high throughput, centralized data management does Growing production, growing challenges, new robot, new databases Leveraging JCSG experiences and results CAMERA: developing new generation of biological databases, new horizons in protein universe JCMM: improving modeling by protein structure analysis “experimental bioinformatics” - JCSG structures and bioinformatics function predictions leading biochemistry and biology experiments

6 CAMERA: first look at the ever expanding universe of proteins
New type of genomics New types of data (and lots of it) 17M new (predicted proteins!) 4-5 x growth in just few months New challenges of really high throughput genomics Genomics without genomes - metagenomics and its challenges

7 Joint Center for Molecular Modeling
Newly funded (3/28/06) P20 center in response to NIGMS RFA “High accuracy protein structure modeling” Burnham/UCSD collaboration PI - Adam Godzik, coPIs - Pavel Pevzner (UCSD), Yuzhen Ye (Burnham) Goals: Improve modeling by analysis of existing structures Methods New approaches to structure comparison Evolution of protein structures Protein is a graph Comparing graphs has a long history and many tools are available New ways of evaluating protein models

8 These tools allow us to study entire structural families

9 Multiple structural alignment is actually a graph (POG)
Partial order graphs have been extensively studied in mathematics and have many interesting properties

10 Using these tools we can identify “microdomains” in proteins
d1a06_ d1blxa d1byga d1ckia d1cm8a d1csn_ d1f3mc d1fgka d1fmk_ d1fota d1fvra d1gjoa d1gz8a d1gzka d1h4la Protein Kinases (SCOP family d ) Aligned segments length: 98 aa, Ca-RMSD: 1.8Å

11 These “microdomains” move independently from each other
d1a06_ d1blxa d1byga d1ckia d1cm8a d1csn_ d1f3mc d1fgka d1fmk_ d1fota d1fvra d1gjoa d1gz8a d1gzka d1h4la Protein Kinases (SCOP family d ) Aligned segments length: 33 aa, Ca-RMSD: 1.9Å

12 Universe of protein structures and PSI goals
Fold Superfamily Family

13 Evolution of folds and structures
Expected new superfamilies in yet to be discovered folds Predicted new superfamilies in known folds ? P D B ? Evolution of folds and structures ? ? ? ? Folds “new” folds

14 Nothing in Biology Makes Sense Except in the Light of Evolution
You are here But most elements of machinery of life were developed here JCSG is here Tree of life from Carl Woese, et al

15 We are built from the same parts!
E.coli – rat oxireductase RMSD of 2.5 on 140 positions 7% (!!!!) sequence id E.coli – human Ribokinase RMSD of 2.4 on 300 aa 18% sequence id E.coli – mouse Ribonucleotide Reductase 2.2/320

16 Some statistics At least 70% of all human proteins have at least one domain that have homologs in bacteria Ribosomal proteins and enzymes involved in central metabolism are well represented, but so are stress response and regulatory proteins (and a lot of domains with unknown functions).

17 Domains of Central Machinery of Life
Present in Eukaryotes Pfam 430 No fold prediction Present in Prokaryotes

18 Distribution of CML targets in different prokaryotes
> but ~

19 CML targets - first results

20 Expanding the scope of target selection
Pfam 1367 No fold prediction Present in Prokaryotes

21 PFAM targets - very first results

22 Next steps - going where no PFAM has gone before
Universe of known proteins Pfam 400

23 The future - how large is the universe of proteins? First GOS results
GOS data (and we know its just the begining Universe proteins we know today Pfam 200

24 Growing structural coverage of T. maritima
Direct structural coverage of 32% of the expressed soluble proteins and ~13% of proteome; (238 unique PDB structures). With homology and fold recognition models, over 72% (89% of predicted crystallizable non-orphan proteins), one of the highest structural coverage of an organism.

25 Structural coverage of t.maritima proteome
~73% of feasible targets

26 What is real impact of PSI - are new folds most important ?
TM0875 from t.maritima new fold no homologs – an “orphan” no corresponding Pfam family Many examples, still working on statistics. In some cases newly solved representatives of major branches allowed to improve models for thousands of proteins. Low quality models could be build on known representatives of side branches. from n.punctiforme two domains of known folds but no recognizable sequence similarity to known structures C-terminal domain provides the first structural template for Pfam family of over 500 sequences (PF00877)

27 Scientific Advisory Board
GNF & TSRI Crystallomics Core Scott Lesley Mark Knuth Dennis Carlton Marc Deller Thomas Clayton Michael DiDonato Glen Spraggon Andreas Kreusch Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Joanna C. Hale Eric Hampton Eric Koesema Edward Nigoghossian Aprilfawn White Sanjay Agarwalla Christina Trout Ylva Elias Hope Johnson Jessica Paulsen Linda Okach Bernhard Geierstanger Julie Feuerhelm Jessica Canseco Stanford /SSRL Structure Determination Core Keith Hodgson Ashley Deacon Mitchell Miller Herbert Axelrod Hsiu-Ju (Jessica) Chiu Kevin Jin Christopher Rife Qingping Xu Silvya Oommachen Henry van den Bedem Scott Talafuse Ronald Reyes Abhinav Kumar Jonathan Caruthers Chloe Zabieta Amanda Prado UCSD & Burnham Bioinformatics Core John Wooley Adam Godzik Slawomir Grzechnik Lukasz Jaroszewski Sri Krishna Subramanian Andrew Morse Tamara Astakhova Lian Duan Piotr Kozbial Naomi Cotton Dana Weekes Lukasz Slabinski Josie Alaoen Scientific Advisory Board Sir Tom Blundell Univ. Cambridge Homme Helinga Duke University Medical Center James Naismith The Scottish Structural Proteomics facility Univ. St. Andrews James Paulson, Consortium for Functional Glycomics, The Scripps Research Institute Robert Stroud, Center for Structure of Membrane Proteins, Membrane Protein Expression Center UC San Francisco Todd Yeates, UCLA-DOE, Inst. for Genomics and Proteomics Soichi Wakatsuki, Photon Factory, KEK, Japan James Wells, TSRI NMR Core Kurt Wüthrich Reto Horst Maggie Johnson Marcius Almeida Michael Gerault Wojtek Augustyniak Pedro Serrano Bill Pedrini TSRI Administrative Core Ian Wilson Marc Elsliger Jason Kay Gye Won Han David Marciano The JCSG is supported by the NIH Protein Structure Initiative grant U54 GM from the National Institute of General Medical Sciences (


Download ppt "JCSG Bioinformatics core overview: 2006"

Similar presentations


Ads by Google