Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Similarity Analysis Often Misses Evolutionary Relationships Which Can Be Detected by Combined Analysis of 3D Structural and Sequence Residues.

Similar presentations


Presentation on theme: "Sequence Similarity Analysis Often Misses Evolutionary Relationships Which Can Be Detected by Combined Analysis of 3D Structural and Sequence Residues."— Presentation transcript:

1 Sequence Similarity Analysis Often Misses Evolutionary Relationships Which Can Be Detected by Combined Analysis of 3D Structural and Sequence Residues Aligned % Sequence Identity Homologous relationships established by both 3D structure and sequence: Homologous Non-homologous Adapted from work by Sanders and co-workers

2 Structure can often provide valuable clues to biochemical and biophysical aspects of protein function Structure-based Functional Genomics

3 Biological Functions of Genes and Proteins Genetic Function / Phenotype Cellular Function Biochemical Function Detailed Atomic Mechanism Biochemical Function Detailed Atomic Mechanism

4 An Important Approach to the Protein Folding Problem is to Characterize the “Natural Language of Proteins” Representative 3D Structure from Each of Several Thousand Sequence Families of Domains

5 National Institutes of Health Protein Structure Initiative (PSI) Long-Range Goal To make the three-dimensional atomic level structures of most proteins easily available from knowledge of their corresponding DNA sequences http://www.nigms.nih.gov/psi.html/ J. Norvell

6 Structure provides information on function and will aid in the design of experiments Development of better therapeutic targets from comparisons of protein structures from: –Pathogens vs. hosts –Diseased vs. normal tissues Expected PSI Benefits J. Norvell

7 Collection of structures will address key biochemical and biophysical problems –Protein folding, prediction, folds, evolution, etc. Benefits to biologists –Technology developments –Structural biology facilities –Availability of reagents and materials –Experimental outcome data on protein production and crystallization PSI Benefits (con’t) J. Norvell

8 PSI Pilot Phase 5-year pilot phase, September, 2000 Pilot phase Goals –Development of high throughput structure genomics pipeline to produce unique, non- redundant protein structures –Pilots for testing all facets and strategies of structural genomics PSI target selection policy –Representatives of protein sequence families –Public release of all targets, progress, results, and structures J. Norvell

9 PSI Pilot Research Centers Seven research centers funded in FY2000 Two additional research centers funded in FY2001 Co-funding by NIAID for two of the nine research centers Many subprojects J. Norvell

10 PSI Pilot Phase -- Lessons Learned Structural genomics pipelines can be constructed and scaled-up High throughput operation works for many proteins Genomic approach works for structures Bottlenecks remain for some proteins A coordinated, 5-year target selection policy must be developed Homology modeling methods need improvement J. Norvell

11 Bioinformatics Barry Honig, Columbia University Mark Gerstein, Yale University Sharon Goldsmith, Columbia University Chern Goh, Yale University Igor Jurisica, Ontario Cancer Inst. Andrew Laine, Columbia University Jessica Lau, Rutgers University Jinfeng Liu, Columbia University Diana Murray, Cornell Medical School Burkhard Rost, Columbia University Mike Wilson, Yale University X-ray Crystallography Wayne Hendrickson, Columbia University Peter Allen, Columbia University George DeTitta, Hauptman-Woodward John Hunt, Columbia University Rich Karlin, Columbia University Joe Luft, Hauptman-Woodward Alex Kuzin, Columbia University Phil Manor, Columbia University Liang Tong, Columbia University Kalyan Das, Rutgers University Protein Production / Biophysics Gaetano Montelione, Rutgers University Thomas Acton, Rutgers University Stephen Anderson, Rutgers University Cheryl Arrowsmith, Ontario Cancer Inst. YiWen Chiang, Rutgers University Natasha Dennisova, Rutgers Univedrsity Masayori Inouye, RWJMS - UMDNJ Lichung Ma, Rutgers University Rong Xiao, Rutgers University Adlinda Yee, Ontario Cancer Instit Protein NMR Thomas Szyperski, SUNY Buffalo James Aramani, Rutgers University Cheryl Arrowsmith, Ontario Cancer Inst. John Cort, Pacific Northwest Natl Labs Michael Kennedy, Pacific Northwest Natl Labs Gaouhua Liu, SUNY Buffalo Theresa Ramelot, Pacific Northwest Natl Labs Janet Huang, Rutgers University Gaetano Montelione, Rutgers University GVT Swapna, Rutgers University Bin Wu, Ontario Cancer Inst. Northeast Structural Genomics Consortium: A SG Research Network

12 Goals of the NESG Consortium Short Term Develop a Scalable Platform for Structural and Functional Proteomics of Prokaryotic and Eukaryotic Proteins Long Term Characterize the repertoire of eukaryotic protein structural domain families

13 The NESG Publication Network PubNet Douglas, Montelione, Gerstein Bioinformatics, 2005 in press

14 Target Selection Strategy

15 Target Selection for Structural Proteomics C. Orengo, Snowbird, UT 4.17.04 How many protein families can we identify in the genomes with/without structural representatives? Which families should we target to maximise the structural coverage of the genomes? Can we select families to optimise function coverage?

16 Rost Clusters: Structural Genomics Targets Protein domain families / clusters Full length proteins < 340 amino acids No member > 30% identity to PDB structures No regions of low complexity Not predicted to be membrane associated ~ 20,000 “ NESG Clusters ”

17 NESG Domain Clusters Protein domain families / clusters Full length proteins < 340 amino acids No member > 30% identity to PDB structures No regions of low complexity Not predicted to be membrane associated Aeropyrum pernix Aquifex aeolicus Arabidopsis thaliana Archaeglobus fulgidis Bacillus subtilis Brucella melitensis Caenorhabditis elegans Campylobacter jejuni Caulobacter crescentus Deinococcus radiodurans Drosophila melanogaster Escherichia coli Fusobacterium nucleatum Haemophilus influenzae Helicobacter pylori Homo sapiens Human cytomegalovirus Lactococcus lactis M. thermoautotrophicum Neisseria meningitidis Other Pyrococcus furiosus Pyrococcus horikoshi Saccharomyces cerevisiae Staphylococcus aureus Streptococcus pyogenes Streptomyces coelicolor Thermoplasma acidophilum Thermotoga maritima Thermus thermophilus Vibrio cholerae Liu, Hegi, Acton, Montelione, & Rost PROTEINS 2004. 56: 188-200 Wunderlich et al. PROTEINS 2004 56: 181-187 Acton et al. Meths Enzymol. 2005 in press 1 Euka: 2 Proka Cloned / Expressed > 1000 Human Proteins WR41 ET8

18 Protein Structure Production

19 Primer Prímer Program http://www-nmr.cabm.rutgers.edu/bioinformatics/index.html Everett, Acton, & Montelione 2004. J Struct Funct Genomics.

20 DNA Mini-preps PCR Reaction Set up-96 well PCR Purification Restriction Digest Qiaquick Purify Ligation Transform Colony PCR Cycle Sequencing Big Dye removal Auto-Steps with the Biorobot 8000

21 96- Well Expression Overnight culture 24 Well Blocks 2 ml of MJ9 Transfer ~200 ul of overnight culture to appropriate well

22 HR969 HSQC and HetNOE Screening Amenability to Structural Determination by NMR Is Determined on NiNTA-Purified Samples

23 Some 30% of full-length, expressed, soluble eukaryotic proteins from the Rost Clusters produced in E. coli by NESG are DISORDERED based on Heteronuclear 1 H- 15 N NOE Data Critical NMR Observation From SPiNE It may not be possible to determine 3D structures of a large portion of the Rost domain families in isolation!

24 Sample Optimization - Buffer Screening Microdialysis Buttons- Optimization for NMR Vary Buffer Conditions - Stability Screen for ppt. 100 mM Arginine Small sample mass (50 ug/button) Bagby S, Tong KI, Liu D, Alattia JR, Ikura M. 1997. J Biomol NMR.

25 Monodisperse Conditions Aggregation Screening - Crystallization Analytical Gel Filtration with Light Scattering Proterion - 96 Well Less Sample More Conditions Philip Manor, Roland Satterwhite and John Hunt LS RI

26 5 hours 12 hours ÄKTAxpress™ 4 modules in parallel 16 samples AC-GF AC AC/GF Affinity Chromatography (AC) HiTrap™ Chelating HP, 1 and 5 ml Gel Filtration (GF) HiLoad 16/60 Superdex 200 pg

27 Solubility / 2004 Stats * defined as greater than 60% soluble by SDS-PAGE analysis Many HR (Human) proteins in advanced stages of NMR 3 HR Crystal structures 2004 Production Solubility vs Organism 2004 HR Success T. Acton et al

28 Internet-based Data Management

29 NESG PROGRESS SUMMARY Jan 1, 2005 Intrinsically Disordered Proteins Full-length Proteins Produced in E. coli Organism% Unfolded E. coli 8% yeast 18% fly / worm25% human 35%

30 Phylogenetic Distribution of 160 NESG Structures Most (>95%) completed NESG structures are members of eukaryotic protein domain families Eukaryotic Eubacteria Archea Some 35 (~20%) NESG structures submitted to the PDB are eukaryotic proteins

31 Uniqueness of NESG Structures

32 Leverage of NESG Structures lower panel: number of proteins for which the sequence-unique structures experimentally determined (red) by each consortium could be used to build homology models (light green). upper panel shows the number of new models that could be built for ten entirely sequenced eukaryotes (tan) and for the human genome (green) Total Leverage ~20,000 Structures Novel Leverage ~ 4,000 Structures Liu and Rost


Download ppt "Sequence Similarity Analysis Often Misses Evolutionary Relationships Which Can Be Detected by Combined Analysis of 3D Structural and Sequence Residues."

Similar presentations


Ads by Google