Sequence Similarity Analysis Often Misses Evolutionary Relationships Which Can Be Detected by Combined Analysis of 3D Structural and Sequence Residues.

Slides:



Advertisements
Similar presentations
Introduction to Genetic Analysis TENTH EDITION Introduction to Genetic Analysis TENTH EDITION Griffiths Wessler Carroll Doebley © 2012 W. H. Freeman and.
Advertisements

Recombinant DNA Technology
Unit 1: DNA and the Genome Key area 8: Genomic sequencing.
High-Throughput Protein Production Platform for the Northeast Structural Genomics Consortium ER82 WR66 Thomas Acton, Ken Conover, Bonnie Cooper, Yiwen.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Pathogenomics: Focusing studies of bacterial pathogenicity through evolutionary analysis of genomes.
A Novel Multigene Family May Encode Odorant Receptors: A Molecular Basis for Odor Recognition Linda Buck and Richard Axel Published in Cell, Volume 65,
Structural bioinformatics
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Comprehensive strategy for integrated target selection in structural genomics.
2004 PP&CW Optimization of protein expression and solubility Alternative and novel prokaryotic expression systems Eukaryotic expression systems Methods.
Protein domains vs. structure domains - an example.
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Alternative splicing and evolution Daniel Jeffares.
Protein Expression and Folding Optimization For High-Throughput Proteomics Kate Drahos 9 April 2004.
Exploring the Biology of Disulfide-Rich Hyperthermophiles through Protein Phylogenetic Profiles Navapoln Ramakul 1, Morgan Beeby 12, and Todd O. Yeates.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
GTL Facilities Characterization and Imaging of Molecular Machines Lee Makowski.
Arabidopsis genome John Markley Eldon Ulrich (bioinformatics team leader) Center for Eukaryotic Structural Genomics (CESG)
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Lecture 1: Introduction Dr. Mamoun Ahram Faculty of Medicine Second year, Second semester, Principles of Genetics and Molecular Biology.
Proteomics Understanding Proteins in the Postgenomic Era.
Zachary Bendiks. Jonathan Eisen  UC Davis Genome Center  Lab focus: “Our work focuses on genomic basis for the origin of novelty in microorganisms (how.
Comparative Genomics of the Eukaryotes
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
23 May June May 2002 From genes to drugs via crystallography 19 May 1996 Experimental and computational approaches to structure based.
PSI Data Management and Reporting: Expectations, Standards and Utility J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader.
Remote Instrumentation Access for the Ohio Consortium for Metabonomics Aaron Goodpaster and Michael A. Kennedy Miami University, Ohio Department of Chemistry.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Modelling proteomes An integrated computational framework for systems biology research Ram Samudrala University of Washington How does the genome of an.
Workflow Analysis for the Northeast Structural Genomics Consortium at the CABM/Rutgers University/RWJMS Protein Production Facility October 22, 2002 Celia.
This presentation was originally prepared by C. William Birky, Jr. Department of Ecology and Evolutionary Biology The University of Arizona It may be used.
Finish up array applications Move on to proteomics Protein microarrays.
Introduction to Proteomics 1. What is Proteomics? Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies.
HTP Construct Optimization using Bioinformatics Coupled with Amide Hydrogen Deuterium Exchange (DXMS) and HTP NMR screening Yuanpeng (Janet) Huang Northeast.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
© 2015 W. H. Freeman and Company CHAPTER 1 The Genetics Revolution Introduction to Genetic Analysis ELEVENTH EDITION Introduction to Genetic Analysis ELEVENTH.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Six plasmids for NC5 sample expression and 2D [ 1 H, 15 N] HSQC screening  Rossmann2x3_58: OR25  Rossmann2x3_59: OR26  Rossmann2x3_61: OR27  Rossmann2x3_71:
Protein Structure Initiative Mission Statement. The long- range goal of the Protein Structure Initiative is to make the three- dimensional atomic-level.
A Nanoliter-Scale Nucleic Acid Processor with Parallel Architecture Jong Wook Hong, Vincent Studer, Giao Hang, W French Andreson, Stephen R Quake presented.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.
MCB 7200: Molecular Biology Biotechnology terminology Common hosts and experimental organisms Transcription and translation Prokaryotic gene organization.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Facility I: Production and Characterization of Proteins
Chapter 1 Introduction.
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
 Six designs (OR25, OR26, OR27, OR28, OR29, OR30) for 2D [ 1 H, 15 N] HSQC screening  OR28 for structure determination Gaohua Liu 1, Nobuyasu Koga 2,
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Protein interactions: main methods for detection (all organisms) Two-hybrid8,446 (Co-)Immunoprecipitation567 Interaction adhesion assay225 In vitro binding138.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Figure 5: Expression and solubility tests for constructs of CoVs. Coronaviruses are complex, positive-sense RNA viruses that cause mild to severe respiratory.
Modelling proteomes: Application to understanding HIV disease progression Ram Samudrala Department of Microbiology University of Washington How does the.
1 High Throughput Cloning and Expression of NESG Targets Jan 2006 Dongyan Wang.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The prokaryotic genome.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
Ch 12: Genomes.
Reads aligned into contigs
Introduction to Proteomics
Target selection strategies for the mouse genome
Genomes and Their Evolution
BIOL 2416 Chapter 1: Genetics: An Introduction
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Marrying structure and genomics
BIOL 433 Plant Genetics Term 2,
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Presentation transcript:

Sequence Similarity Analysis Often Misses Evolutionary Relationships Which Can Be Detected by Combined Analysis of 3D Structural and Sequence Residues Aligned % Sequence Identity Homologous relationships established by both 3D structure and sequence: Homologous Non-homologous Adapted from work by Sanders and co-workers

Structure can often provide valuable clues to biochemical and biophysical aspects of protein function Structure-based Functional Genomics

Biological Functions of Genes and Proteins Genetic Function / Phenotype Cellular Function Biochemical Function Detailed Atomic Mechanism Biochemical Function Detailed Atomic Mechanism

An Important Approach to the Protein Folding Problem is to Characterize the “Natural Language of Proteins” Representative 3D Structure from Each of Several Thousand Sequence Families of Domains

National Institutes of Health Protein Structure Initiative (PSI) Long-Range Goal To make the three-dimensional atomic level structures of most proteins easily available from knowledge of their corresponding DNA sequences J. Norvell

Structure provides information on function and will aid in the design of experiments Development of better therapeutic targets from comparisons of protein structures from: –Pathogens vs. hosts –Diseased vs. normal tissues Expected PSI Benefits J. Norvell

Collection of structures will address key biochemical and biophysical problems –Protein folding, prediction, folds, evolution, etc. Benefits to biologists –Technology developments –Structural biology facilities –Availability of reagents and materials –Experimental outcome data on protein production and crystallization PSI Benefits (con’t) J. Norvell

PSI Pilot Phase 5-year pilot phase, September, 2000 Pilot phase Goals –Development of high throughput structure genomics pipeline to produce unique, non- redundant protein structures –Pilots for testing all facets and strategies of structural genomics PSI target selection policy –Representatives of protein sequence families –Public release of all targets, progress, results, and structures J. Norvell

PSI Pilot Research Centers Seven research centers funded in FY2000 Two additional research centers funded in FY2001 Co-funding by NIAID for two of the nine research centers Many subprojects J. Norvell

PSI Pilot Phase -- Lessons Learned Structural genomics pipelines can be constructed and scaled-up High throughput operation works for many proteins Genomic approach works for structures Bottlenecks remain for some proteins A coordinated, 5-year target selection policy must be developed Homology modeling methods need improvement J. Norvell

Bioinformatics Barry Honig, Columbia University Mark Gerstein, Yale University Sharon Goldsmith, Columbia University Chern Goh, Yale University Igor Jurisica, Ontario Cancer Inst. Andrew Laine, Columbia University Jessica Lau, Rutgers University Jinfeng Liu, Columbia University Diana Murray, Cornell Medical School Burkhard Rost, Columbia University Mike Wilson, Yale University X-ray Crystallography Wayne Hendrickson, Columbia University Peter Allen, Columbia University George DeTitta, Hauptman-Woodward John Hunt, Columbia University Rich Karlin, Columbia University Joe Luft, Hauptman-Woodward Alex Kuzin, Columbia University Phil Manor, Columbia University Liang Tong, Columbia University Kalyan Das, Rutgers University Protein Production / Biophysics Gaetano Montelione, Rutgers University Thomas Acton, Rutgers University Stephen Anderson, Rutgers University Cheryl Arrowsmith, Ontario Cancer Inst. YiWen Chiang, Rutgers University Natasha Dennisova, Rutgers Univedrsity Masayori Inouye, RWJMS - UMDNJ Lichung Ma, Rutgers University Rong Xiao, Rutgers University Adlinda Yee, Ontario Cancer Instit Protein NMR Thomas Szyperski, SUNY Buffalo James Aramani, Rutgers University Cheryl Arrowsmith, Ontario Cancer Inst. John Cort, Pacific Northwest Natl Labs Michael Kennedy, Pacific Northwest Natl Labs Gaouhua Liu, SUNY Buffalo Theresa Ramelot, Pacific Northwest Natl Labs Janet Huang, Rutgers University Gaetano Montelione, Rutgers University GVT Swapna, Rutgers University Bin Wu, Ontario Cancer Inst. Northeast Structural Genomics Consortium: A SG Research Network

Goals of the NESG Consortium Short Term Develop a Scalable Platform for Structural and Functional Proteomics of Prokaryotic and Eukaryotic Proteins Long Term Characterize the repertoire of eukaryotic protein structural domain families

The NESG Publication Network PubNet Douglas, Montelione, Gerstein Bioinformatics, 2005 in press

Target Selection Strategy

Target Selection for Structural Proteomics C. Orengo, Snowbird, UT How many protein families can we identify in the genomes with/without structural representatives? Which families should we target to maximise the structural coverage of the genomes? Can we select families to optimise function coverage?

Rost Clusters: Structural Genomics Targets Protein domain families / clusters Full length proteins < 340 amino acids No member > 30% identity to PDB structures No regions of low complexity Not predicted to be membrane associated ~ 20,000 “ NESG Clusters ”

NESG Domain Clusters Protein domain families / clusters Full length proteins < 340 amino acids No member > 30% identity to PDB structures No regions of low complexity Not predicted to be membrane associated Aeropyrum pernix Aquifex aeolicus Arabidopsis thaliana Archaeglobus fulgidis Bacillus subtilis Brucella melitensis Caenorhabditis elegans Campylobacter jejuni Caulobacter crescentus Deinococcus radiodurans Drosophila melanogaster Escherichia coli Fusobacterium nucleatum Haemophilus influenzae Helicobacter pylori Homo sapiens Human cytomegalovirus Lactococcus lactis M. thermoautotrophicum Neisseria meningitidis Other Pyrococcus furiosus Pyrococcus horikoshi Saccharomyces cerevisiae Staphylococcus aureus Streptococcus pyogenes Streptomyces coelicolor Thermoplasma acidophilum Thermotoga maritima Thermus thermophilus Vibrio cholerae Liu, Hegi, Acton, Montelione, & Rost PROTEINS : Wunderlich et al. PROTEINS : Acton et al. Meths Enzymol in press 1 Euka: 2 Proka Cloned / Expressed > 1000 Human Proteins WR41 ET8

Protein Structure Production

Primer Prímer Program Everett, Acton, & Montelione J Struct Funct Genomics.

DNA Mini-preps PCR Reaction Set up-96 well PCR Purification Restriction Digest Qiaquick Purify Ligation Transform Colony PCR Cycle Sequencing Big Dye removal Auto-Steps with the Biorobot 8000

96- Well Expression Overnight culture 24 Well Blocks 2 ml of MJ9 Transfer ~200 ul of overnight culture to appropriate well

HR969 HSQC and HetNOE Screening Amenability to Structural Determination by NMR Is Determined on NiNTA-Purified Samples

Some 30% of full-length, expressed, soluble eukaryotic proteins from the Rost Clusters produced in E. coli by NESG are DISORDERED based on Heteronuclear 1 H- 15 N NOE Data Critical NMR Observation From SPiNE It may not be possible to determine 3D structures of a large portion of the Rost domain families in isolation!

Sample Optimization - Buffer Screening Microdialysis Buttons- Optimization for NMR Vary Buffer Conditions - Stability Screen for ppt. 100 mM Arginine Small sample mass (50 ug/button) Bagby S, Tong KI, Liu D, Alattia JR, Ikura M J Biomol NMR.

Monodisperse Conditions Aggregation Screening - Crystallization Analytical Gel Filtration with Light Scattering Proterion - 96 Well Less Sample More Conditions Philip Manor, Roland Satterwhite and John Hunt LS RI

5 hours 12 hours ÄKTAxpress™ 4 modules in parallel 16 samples AC-GF AC AC/GF Affinity Chromatography (AC) HiTrap™ Chelating HP, 1 and 5 ml Gel Filtration (GF) HiLoad 16/60 Superdex 200 pg

Solubility / 2004 Stats * defined as greater than 60% soluble by SDS-PAGE analysis Many HR (Human) proteins in advanced stages of NMR 3 HR Crystal structures 2004 Production Solubility vs Organism 2004 HR Success T. Acton et al

Internet-based Data Management

NESG PROGRESS SUMMARY Jan 1, 2005 Intrinsically Disordered Proteins Full-length Proteins Produced in E. coli Organism% Unfolded E. coli 8% yeast 18% fly / worm25% human 35%

Phylogenetic Distribution of 160 NESG Structures Most (>95%) completed NESG structures are members of eukaryotic protein domain families Eukaryotic Eubacteria Archea Some 35 (~20%) NESG structures submitted to the PDB are eukaryotic proteins

Uniqueness of NESG Structures

Leverage of NESG Structures lower panel: number of proteins for which the sequence-unique structures experimentally determined (red) by each consortium could be used to build homology models (light green). upper panel shows the number of new models that could be built for ten entirely sequenced eukaryotes (tan) and for the human genome (green) Total Leverage ~20,000 Structures Novel Leverage ~ 4,000 Structures Liu and Rost