JCSG Bioinformatics core overview: 2006

Slides:



Advertisements
Similar presentations
Pfam(Protein families )
Advertisements

Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Shotgun crystallization of the Thermotoga maritima proteome Protein properties and crystallization conditions that correlate with crystallization success.
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
Structural bioinformatics
From crystals to pdb: building a high throughput crystallography pipeline for structural genomics Chiu HJ 1, Wolf G 1, West W 2, van den Bedem H 1, Miller.
Protein structure (Part 2 of 2).
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
The Protein Data Bank (PDB)
Expression and purification of membrane proteins: Initial screening of Thermotoga maritima α-helical membrane proteins for NMR structural studies This.
Topic 2 Adam Godzik. JCSG approach: no model archives, building models “on the fly”
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Structural Genomics, ISGO, and Structural Genomics Task Forces Open ISGO Structural Genomics Task Force Meeting ISGO International Structural Genomics.
COMPARATIVE or HOMOLOGY MODELING
Modelling proteomes An integrated computational framework for systems biology research Ram Samudrala University of Washington How does the genome of an.
SALVAGE METHODS APPLIED TO FAILED PFAM FAMILIES Anna Grzechnik 1, Dennis Carlton 1, Heath Klock 2 Mark W. Knuth 2 and Scott A. Lesley 1,2* 1 The Joint.
A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
TSRI Administrative Core Ian Wilson Peter Kuhn Marc Elsliger Frank von Delft Tina Montgomery Gye Won Han Rong Chen Angela Walker UCSD Bioinformatics Core.
Ligand search and data mining of Structural Genomics structures Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center.
Small protein modules with similar 3D structure but different amino acid sequence Institute of Evolution, University of Haifa, ISRAEL Genome Diversity.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Ligand search and data mining of Structural Genomics structures Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center.
Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.
Acknowledgements Experiences with automated screening at the JCSG C.B.Trame 1,2, H-J.Chiu 1,2, S.Oommachen 1,2, M.Miller 1,2, A.Cohen 2, I.I.Mathews 2,
TOPSAN – A community-driven resource for enhanced impact of structural genomics data. Protein Structure Initiative "Bottlenecks" Workshop, NIH Campus,
Acknowledgements Comparative analysis of novel proteins from the CATH family of zinc peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3.
Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Joint Center for Molecular Modeling Addressing Protein Crystallization Bottlenecks by Screening Multiple Homologs Lukasz Jaroszewski, Lukasz Slabinski,
Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3.
High throughput biology data management and data intensive computing drivers George Michaels.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Experiences with automated crystal screening at the JCSG
CS515: Bioinformatic Algorithms
Optimizing Biological Data Integration
Demo: Protein Information Resource
Sequence based searches:
Crystal Screening and Data Collection Activities at SDC
Ligand Search and Data Mining of Structural Genomics Structures
Crystal Screening and Data Collection Activities at SDC
Target selection strategies for the mouse genome
Prediction of Protein Structure and Function on a Proteomic Scale
SDC pipeline crystals screened
Volume 17, Issue 2, Pages (February 2009)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Protein Sequence Analysis - Overview -
Protein Structures.
Crystallomics Core Overview
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Protein Sequence Analysis - Overview -
Mining PSI Structures: JCSG Ligand Server
Homology Modeling.
Protein structure prediction.
Genome Pool Strategy for Structural Coverage of Protein Families
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

JCSG Bioinformatics core overview: 2006

BIC - last two years Organizational and personal changes Two sites (UCSD & Burnham) Six people left, five new people hired Transformation to a production center Core tools developed, but still significant tool development Increasing role of data analysis

Bioinformatics - convergence of methods, but also challenges Maximizing production Data management for high throughput Covering the universe of proteins with structures Maximizing impact of structures Making sense of structures one at a time Understanding protein universe using structures

Bioinformatics core of JCSG - integration within and outside Integrating data across JCSG Flow of data connects cores across physical locations, different proteins “intuitive crystallography” doesn’t scale up to high throughput, centralized data management does Growing production, growing challenges, new robot, new databases Leveraging JCSG experiences and results CAMERA: developing new generation of biological databases, new horizons in protein universe JCMM: improving modeling by protein structure analysis “experimental bioinformatics” - JCSG structures and bioinformatics function predictions leading biochemistry and biology experiments

CAMERA: first look at the ever expanding universe of proteins New type of genomics New types of data (and lots of it) 17M new (predicted proteins!) 4-5 x growth in just few months New challenges of really high throughput genomics Genomics without genomes - metagenomics and its challenges

Joint Center for Molecular Modeling Newly funded (3/28/06) P20 center in response to NIGMS RFA “High accuracy protein structure modeling” Burnham/UCSD collaboration PI - Adam Godzik, coPIs - Pavel Pevzner (UCSD), Yuzhen Ye (Burnham) Goals: Improve modeling by analysis of existing structures Methods New approaches to structure comparison Evolution of protein structures Protein is a graph Comparing graphs has a long history and many tools are available New ways of evaluating protein models

These tools allow us to study entire structural families

Multiple structural alignment is actually a graph (POG) Partial order graphs have been extensively studied in mathematics and have many interesting properties

Using these tools we can identify “microdomains” in proteins d1a06_ d1blxa d1byga d1ckia d1cm8a d1csn_ d1f3mc d1fgka d1fmk_ d1fota d1fvra d1gjoa d1gz8a d1gzka d1h4la Protein Kinases (SCOP family d.144.1.7) Aligned segments length: 98 aa, Ca-RMSD: 1.8Å

These “microdomains” move independently from each other d1a06_ d1blxa d1byga d1ckia d1cm8a d1csn_ d1f3mc d1fgka d1fmk_ d1fota d1fvra d1gjoa d1gz8a d1gzka d1h4la Protein Kinases (SCOP family d.144.1.7) Aligned segments length: 33 aa, Ca-RMSD: 1.9Å

Universe of protein structures and PSI goals Fold Superfamily Family

Evolution of folds and structures Expected new superfamilies in yet to be discovered folds Predicted new superfamilies in known folds ? P D B ? Evolution of folds and structures ? ? ? ? Folds “new” folds

Nothing in Biology Makes Sense Except in the Light of Evolution You are here But most elements of machinery of life were developed here JCSG is here Tree of life from Carl Woese, et al

We are built from the same parts! E.coli – rat oxireductase RMSD of 2.5 on 140 positions 7% (!!!!) sequence id E.coli – human Ribokinase RMSD of 2.4 on 300 aa 18% sequence id E.coli – mouse Ribonucleotide Reductase 2.2/320

Some statistics At least 70% of all human proteins have at least one domain that have homologs in bacteria Ribosomal proteins and enzymes involved in central metabolism are well represented, but so are stress response and regulatory proteins (and a lot of domains with unknown functions).

Domains of Central Machinery of Life Present in Eukaryotes Pfam 430 No fold prediction Present in Prokaryotes

Distribution of CML targets in different prokaryotes > but ~

CML targets - first results

Expanding the scope of target selection Pfam 1367 No fold prediction Present in Prokaryotes

PFAM targets - very first results

Next steps - going where no PFAM has gone before Universe of known proteins Pfam 400

The future - how large is the universe of proteins? First GOS results GOS data (and we know its just the begining Universe proteins we know today Pfam 200

Growing structural coverage of T. maritima Direct structural coverage of 32% of the expressed soluble proteins and ~13% of proteome; (238 unique PDB structures). With homology and fold recognition models, over 72% (89% of predicted crystallizable non-orphan proteins), one of the highest structural coverage of an organism.

Structural coverage of t.maritima proteome ~73% of feasible targets

What is real impact of PSI - are new folds most important ? TM0875 from t.maritima new fold no homologs – an “orphan” no corresponding Pfam family Many examples, still working on statistics. In some cases newly solved representatives of major branches allowed to improve models for thousands of proteins. Low quality models could be build on known representatives of side branches. 53686717 from n.punctiforme two domains of known folds but no recognizable sequence similarity to known structures C-terminal domain provides the first structural template for Pfam family of over 500 sequences (PF00877)

Scientific Advisory Board GNF & TSRI Crystallomics Core Scott Lesley Mark Knuth Dennis Carlton Marc Deller Thomas Clayton Michael DiDonato Glen Spraggon Andreas Kreusch Daniel McMullan Heath Klock Polat Abdubek Eileen Ambing Joanna C. Hale Eric Hampton Eric Koesema Edward Nigoghossian Aprilfawn White Sanjay Agarwalla Christina Trout Ylva Elias Hope Johnson Jessica Paulsen Linda Okach Bernhard Geierstanger Julie Feuerhelm Jessica Canseco Stanford /SSRL Structure Determination Core Keith Hodgson Ashley Deacon Mitchell Miller Herbert Axelrod Hsiu-Ju (Jessica) Chiu Kevin Jin Christopher Rife Qingping Xu Silvya Oommachen Henry van den Bedem Scott Talafuse Ronald Reyes Abhinav Kumar Jonathan Caruthers Chloe Zabieta Amanda Prado UCSD & Burnham Bioinformatics Core John Wooley Adam Godzik Slawomir Grzechnik Lukasz Jaroszewski Sri Krishna Subramanian Andrew Morse Tamara Astakhova Lian Duan Piotr Kozbial Naomi Cotton Dana Weekes Lukasz Slabinski Josie Alaoen Scientific Advisory Board Sir Tom Blundell Univ. Cambridge Homme Helinga Duke University Medical Center James Naismith The Scottish Structural Proteomics facility Univ. St. Andrews James Paulson, Consortium for Functional Glycomics, The Scripps Research Institute Robert Stroud, Center for Structure of Membrane Proteins, Membrane Protein Expression Center UC San Francisco Todd Yeates, UCLA-DOE, Inst. for Genomics and Proteomics Soichi Wakatsuki, Photon Factory, KEK, Japan James Wells, TSRI NMR Core Kurt Wüthrich Reto Horst Maggie Johnson Marcius Almeida Michael Gerault Wojtek Augustyniak Pedro Serrano Bill Pedrini TSRI Administrative Core Ian Wilson Marc Elsliger Jason Kay Gye Won Han David Marciano The JCSG is supported by the NIH Protein Structure Initiative grant U54 GM074898 from the National Institute of General Medical Sciences (www.nigms.nih.gov).