Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3.

Slides:



Advertisements
Similar presentations
Pfam(Protein families )
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Shotgun crystallization of the Thermotoga maritima proteome Protein properties and crystallization conditions that correlate with crystallization success.
Structural bioinformatics
Protein structure (Part 2 of 2).
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Topic 2 Adam Godzik. JCSG approach: no model archives, building models “on the fly”
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Protein Tertiary Structure Prediction
Development of Bioinformatics and its application on Biotechnology
SALVAGE METHODS APPLIED TO FAILED PFAM FAMILIES Anna Grzechnik 1, Dennis Carlton 1, Heath Klock 2 Mark W. Knuth 2 and Scott A. Lesley 1,2* 1 The Joint.
Exploiting Structural and Comparative Genomics to Reveal Protein Functions  Predicting domain structure families and their domain contexts  Exploring.
The Pfam and MEROPS databases EMBO course 2004 Robert Finn
A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Helen M. Berman, Rutgers University EMBO Practical Course Section: Searching Structure Databases September 26, 2008 PSI Structural Genomics Knowledgebase.
TSRI Administrative Core Ian Wilson Peter Kuhn Marc Elsliger Frank von Delft Tina Montgomery Gye Won Han Rong Chen Angela Walker UCSD Bioinformatics Core.
Ligand search and data mining of Structural Genomics structures Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Small protein modules with similar 3D structure but different amino acid sequence Institute of Evolution, University of Haifa, ISRAEL Genome Diversity.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Six plasmids for NC5 sample expression and 2D [ 1 H, 15 N] HSQC screening  Rossmann2x3_58: OR25  Rossmann2x3_59: OR26  Rossmann2x3_61: OR27  Rossmann2x3_71:
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Ligand search and data mining of Structural Genomics structures Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center.
Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.
Acknowledgements Experiences with automated screening at the JCSG C.B.Trame 1,2, H-J.Chiu 1,2, S.Oommachen 1,2, M.Miller 1,2, A.Cohen 2, I.I.Mathews 2,
TOPSAN – A community-driven resource for enhanced impact of structural genomics data. Protein Structure Initiative "Bottlenecks" Workshop, NIH Campus,
Acknowledgements Comparative analysis of novel proteins from the CATH family of zinc peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3.
Classification of protein and domain families Sequence to function Protein Family Resources and Protocols for Structural and Functional Annotation of Genome.
Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Joint Center for Molecular Modeling Addressing Protein Crystallization Bottlenecks by Screening Multiple Homologs Lukasz Jaroszewski, Lukasz Slabinski,
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Modelling proteomes Ram Samudrala University of Washington.
Marc Robinson-Rechavi Département d'Ecologie et d'Evolution Université de Lausanne Genomique structurale comparative et evolution des proteines What is.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Bos taurus Olfactory Receptor Katie Davis 1,2 and Sandra Rodriguez-Zas 1 1 Department of Animal Sciences, University of Illinois Urbana-Champaign, 2 ACES.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Experiences with automated crystal screening at the JCSG
Bio/Chem-informatics
Crystal Screening and Data Collection Activities at SDC
Ligand Search and Data Mining of Structural Genomics Structures
Crystal Screening and Data Collection Activities at SDC
JCSG Bioinformatics core overview: 2006
Volume 9, Issue 10, Pages (October 2016)
Genome Annotation Continued
Predicting Active Site Residue Annotations in the Pfam Database
Target selection strategies for the mouse genome
SDC pipeline crystals screened
Volume 17, Issue 2, Pages (February 2009)
Crystallomics Core Overview
Mining PSI Structures: JCSG Ligand Server
Kinetic Discrimination of tRNA Identity by the Conserved Motif 2 Loop of a Class II Aminoacyl-tRNA Synthetase  Ethan C. Guth, Christopher S. Francklyn 
Genome Pool Strategy for Structural Coverage of Protein Families
Volume 5, Issue 3, Pages (March 1997)
Volume 29, Issue 6, Pages (March 2008)
Volume 12, Issue 11, Pages (November 2004)
Essential knowledge 1.B.1:
Presentation transcript:

Acknowledgements Comparative Analysis of Novel Proteins from the CATH Family of Zinc Peptidases Debanu Das 1,2, Abhinav Kumar 1,2, Lukasz Jaroszewski 1,3 and Ashley Deacon 1,2 1 Joint Center for Structural Genomics, 2 Stanford Synchrotron Radiation Laboratory, Menlo Park, CA 94025, 3 Burnham Institute, La Jolla, CA, I. Introduction III. General structure and biochemistry These metallopeptidases show a high degree of structural conservation in the CATH domain which has a α/β/α sandwich architecture. The active site usually comprises of histidines and carboxylates interacting with two zinc ions. Despite the variety of molecular functions and substrate specificities of these proteins, the catalysis most likely involves a hydroxyl ion ligand involved in a nucleophilic attack. The full proteins often oligomerize and display some differences in their oligomerization state, however, the exact role of the oligomer in the molecular function is still unclear. In some cases, dimer formation results in assembly of a productive catalytic site. Dimerization is usually mediated by a dimerization domain. Higher oligomeric forms such as tetramers or octamers are also observed for some proteins. Figure of the representative CATH structure fro II. Background and Significance CATH proteins are distributed across 8 PFAM families, which form the large peptidase_MH clan (CL0035). Also known in the MEROPS peptidase database as the clan MH/MC/MF of metallopeptidases: These proteins are involved in a variety of proteolytic activities, have a range of substrate specificities and are present in numerous microbial organisms, many of which are important human pathogens like S. aureus, S. typhimurium, T. vaginalis, M. tuberculosis, N. gonorrhea, N. meningitidis, C. trachomatis, G. intestinalis, and E. coli. Several of these proteins have been investigated for their therapeutic potential and diseases roles (Canavan’s disease, cancer therapy and prohormone/propeptide processing). V. Structures solved by JCSG IV. Progress of structure determination XII. Inferences and further work In the quest for increasing structural coverage across protein families, it is expected that proteins similar in sequence within a protein family will be similar in structure. Increasing structural coverage provides better templates for modeling other proteins. The comparative structural analysis presented here provides experimental verification of the validity of this approach. The structures for the proteins HP10645A and HP10645E suggest that they should be assigned to PF00246 in PfamA instead of the current suggestion of belonging to PF04952 by PfamB. The 7 structures presented here provide a basis for enhancing the modeling of 2177 out of 7591 proteins (~29%) belonging to this Pfam clan. Furthermore, 3 of these JCSG structures provide the first examples of structures for proteins within a particular sequence cluster (2QYV, 2QJ8 and 3B2Y) and thus provide the basis for modeling 384 unique proteins (10 from organisms listed as top human pathogens) belonging to these 3 clusters from 2 different Pfams (PF01546 and PF04952). 2QYV/HP9625C represents the first crystal structure of a dipeptidase PepD showing a dimer. Further analysis will be performed to try to understand evolutionary relationships between these proteins based on sequence-based phylogenetic trees and structure-based trees. Attempts will be made to investigate use of these structures and their comparative analyses in understanding structural basis for enzyme function and substrate specificities by analysis of active site amino acids, and to attempt to exploit information for therapeutic purposes. 2RB7.pdb (HP1666A), 1.6Å, R/Rfr=15.4/18.0% Unknown function, PF close homologs from important human pathogens Potential in cancer therapy 2QYV.pdb (HP9625C), 2.11Å, R/Rf= 22.0, 24.4% Putative Xaa-His dipeptidase, PF01546, Zn+2 bound 7 close homologs from important human pathogens 2FVG.pdb (TM1049), 2.01Å, R/Rf= 20.3/24.4% Endoglucanase, PF close homologs from important human pathogens PF04952Succinylglutamate desuccinylase / Aspartoacylase family (AstE-AspA ) 458 proteins1 JCSG structures, 5 all other SG PF02127Aminopeptidase I Zinc metalloprotease M all other SG PF01546Peptidase family M20/M25/M JCSG structures, 7 all other SG6 non-SG PF00246Zinc carboxypeptidase M JCSG structures10 non-SG PF04389Peptidase family M non-SG PF00883Cytosol aminopeptidase family, catalytic domain827 1 all other SG1 non-SG PF05343M42 Glutamyl aminopeptidase4271 JCSG structures, 1 all other SG1 non-SG PF05450Nicastrin (eukaryotic, not known to be peptidase, part of γ- secretase complex, no structures) 48None VIII. Comparison of two proteins with >30% sequence identity within the same Pfam PF01546: 1CG2, 2RB7 Target was selected based on 30% sequence id over full length protein and despite similairty in CATH domain, more diversity is observed in the dimerization domain. 1CG2:C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate 2RB7: Unknown function, JCSG Common core ~290 aa, RMSD ~3.0 Å For structures that cluster together at 30% level, structural conservation in the common core is the highest, Generally only slight rearrangement of secondary structural elements is observed (within the domain). PF PF PF PF PF ** PF PF PF ** VI. Phylogenetic tree and structure tree Sequence with >30% identity within a particular Pfam also cluster together in structure space 2QVP.pdb (HP10645A), 2.0Å, R/Rf= 16.1/21.3% Unknown function, PF04952 Structure suggests target may be closer in homology To PF00246 family IX. Proteins with <30% sequence id. within the same Pfam PF01546: 2RB7, 2QYV (green) Common core ~250 aa, RMSD ~3.0 Å Common core ~190 aa, RMSD ~3.0 Å PF04952: 2QJ8, 3B2Y (cyan) Larger rearrangements and extensions of secondary structural elements. Inserts and novel features more common. * PFAM assigned based on sequence homology detected with FFAS There are 3 targets not assigned by PfamA or FFAS. ** 7 targets indicated show significant FFAS match to both PF04389 and PF05450, and could possibly be distant bacterial homologs to the exclusively eukaryotic nicastrin family (PF05450). Distribution of selected targets across Pfam families Targets assigned in PfamA Targets unassigned in PfamA * Current status of 137 targets All targets selected in March B2Y.pdb (HP10645E), 1.74Å, R/Rfr=17.45/21.51% Unknown function, PF04952, Ni+2 bound Structure suggests target may be closer in homology To PF00246 family 2QJ8.pdb (HP10622H), 2.0Å, R/Rf= 20.7/25.4%, Unknown function, PF04952 Homolog involved in Canavan’s disease UCSD & Burnham (Bioinformatics Core) John Wooley Adam Godzik Lukasz Jaroszewski Slawomir Grzechnik Lian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr Kozbial Andrew Morse Prasad Burra Tamara Astakhova Josie Alaoen Cindy Cook Dana Weekes TSRI (NMR Core) Kurt Wüthrich Reto Horst Maggie Johnson Amaranth Chatterjee Michael Geralt Wojtek Augustyniak Pedro Serrano Bill Pedrini William Placzek Stanford /SSRL (Structure Determination Core) Keith HodgsonAshley Deacon Mitchell Miller Debanu Das Hsiu-Ju (Jessica) ChiuKevin Jin Christopher RifeQingping Xu Silvya OommachenScott Talafuse Henry van den BedemRonald Reyes Christine Trame Scientific Advisory Board Sir Tom BlundellRobert Stroud Univ. Cambridge Center for Structure of Membrane Proteins Homme Hellinga Membrane Protein Expression Center Duke University Medical Center UC San Francisco James Naismith James Paulson The Scottish Structural Proteomics facility Consortium for Functional Glycomics Univ. St. Andrews The Scripps Research Institute Soichi Wakatsuki Todd Yeates Photon Factory, KEK, Japan UCLA-DOE Inst. for Genomics and Proteomics James Wells UC San Francisco The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM from NIGMS ( Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH. GNF & TSRI (Crystallomics Core) Scott LesleyMark Knuth Heath Klock Dennis Carlton Thomas Clayton Kevin D. Murphy Marc DellerDaniel McMullanChristina Trout Polat Abdubek Claire AcostaLinda M. Columbus Julie FeuerhelmJoanna C. HaleThamara Janaratne Hope JohnsonLinda Okach Edward Nigoghossian Sebastian SudekAprilfawn WhiteBernhard Geierstanger Glen SpraggonYlva Elias Sanjay Agarwalla Charlene ChoBi-Ying YehAnna Grzechnik Jessica CansecoMimmi Brown TSRI (Admin Core) Ian Wilson Marc Elsliger Gye Won Han David Marciano Henry Tien Xiaoping Dai Lisa van Veen Annual meeting with SAB 2007 In March 2007, the JCSG initiated a study of the CATH homologous superfamily of zinc peptidases ( ), which belong to the phosphorylase/hydrolase-like fold in SCOP. These proteins show significant sequence divergence and have a broad phylogenetic distribution across both prokaryotes and eukaryotes. At the time, despite 40 unique experimentally determined structures in the PDB, only half of the family members had reliable homology models. JCSG is improving the structural coverage by determining novel structures which share <30% sequence identity with those in the PDB. Hidden Markov Models from the CATH database were used to identify sequences in the JCSG genome pool. PSI-Blast seeded with these sequences was used to find additional proteins. These two sets yielded 226 unique targets. After removing targets with more than 30% sequence identity to any PDB structure or to any crystallized target from a structural genomics center, 161 targets remained. Further clustering at 90% (in order to avoid nearly identical sequences), produced a set of 137 targets. To date, JCSG has contributed 6 new structures to the family and 7 other targets have been crystallized. We present our progress towards complete structural coverage of this family, highlighting common and variant structural features that support different molecular and cellular roles, focusing on active site residues, ligand binding, protein size and oligomerization state. This analysis may provide insights into structural themes that dictate protein function and also allows modeling of protein structures related by sequence. Our structures serve as a nucleation point for the design of further structure-based experiments to probe the biochemical and biomedical roles of these proteins. VII. Suggestion of PfamA assignment based on structure HP10645A (2QVP) and HP10645E (3B2Y) sequences are assigned to PF04952 in PfamB. However, structural comparisons of the CATH domain show a stronger similarity to a member of PF00246 (1QMU, left) than to a member of PF04952 (2QJ8, center). This is also supported by structure & phylogenetic trees and FFAS. Also, like 1QMU, HP10645A/E lacks an ~70 amino acid insertion that forms a “C-terminal domain” (right, black circle), which is present in all PF04952 proteins and is important for biochemical function. These two pieces of evidence suggest and support the assignment of HP10645A/E in PF00246 in PfamA. Common core of 226 aa, RMSD 2.45 Å Common core of 191 aa, RMSD 2.49 Å X. Active site study may lead to structural basis of substrate specificity 2RB7 (cyan) and 1CG2, PF Proteins in this Pfam with solved structures and >30% seq id with one another have functions which include succinyl-diaminopimelate desuccinylase activity; Carboxypeptidase G2 which cleaves C-terminal glutamate moiety from folic acid and its analogues, such as methotrexate; N-acetyl-L-citrulline deacetylase and Peptidase T tripeptidase. Active site is 1CG2 is H112, D141, E200, E176, H385 Based on this, putative active site in 2RB7 is H72, D99, D100, E138, E139, D162 Hydrolysis of methotrexate by 1CG2 Based on this information, it would now be possible to perform targeted biochemical assays to determine substrate for 2RB7, to try to understand the structural basis for substrate selection and specificity and to exploit this information for its therapeutic potential. For example, can 2RB7 hydrolyse methotrexate? Can it do so more efficiently? Can active site engineering based on structural information produce a more potent enzyme? Active site in 2RB7 XI. Elucidation of a unique oligomeric form The 2QYV (PepD, MEROPS M20.007, clan MH, subfamily C) monomer is very similar in structure to the 1LFW monomer (PepV, MEROPS M20.004, subfamily A). Both are dipeptidases belonging to PF However, 1LFW is known to function as a monomer in which the molecular structure mimics that of a dimer seen in most other proteins in this Pfam. PepD in E. coli and Prevotella albensis are seen to function as dimers. 2QYV represents the first crystal structure of a PepD, revealing it to be dimeric in the crystal structure (monomers in magenta and gold) as well as by size exclusion chromatography and shows the structural nature of the dimer. This novel structure serves as a starting point for further experiments to probe the effect of this unique dimer formation on protein function. Superimposition of all 6 structures in PF04952: 1YW4, 1YW6, 2BCO, 2G9D, 2GU2 and 2QJ8 HP10625B, 2.3Å, work in progress PF close homologs from important human pathogens Potential in cancer therapy