NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Comprehensive strategy for integrated target selection in structural genomics.

Slides:



Advertisements
Similar presentations
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Advertisements

Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Bioinformatics: Players, Problems, and Processes in Computational Biology This is a comprehensive report about the growing technology sector of bioinformatics.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Thomas Blicher Center for Biological Sequence Analysis
Workshop on Biological Macromolecular Structure Models RCSB PDB Piscataway, NJ November 19-20, 2005 Topic 3: Structural Genomics and Models Contributors:
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics The University of Queensland.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Protein Tertiary Structure Prediction
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
The Pfam and MEROPS databases EMBO course 2004 Robert Finn
Copyright © 2009 Pearson Education, Inc. Art and Photos in PowerPoint ® Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino Chapter 21.
How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Sequence Similarity Analysis Often Misses Evolutionary Relationships Which Can Be Detected by Combined Analysis of 3D Structural and Sequence Residues.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Structural proteomics
Structure prediction: Homology modeling
Protein Structure Initiative Mission Statement. The long- range goal of the Protein Structure Initiative is to make the three- dimensional atomic-level.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Central dogma: the story of life RNA DNA Protein.
Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Classification of protein and domain families Sequence to function Protein Family Resources and Protocols for Structural and Functional Annotation of Genome.
Structural proteomics Handouts. Proteomics section from book already assigned.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
bacteria and eukaryotes
Homology 3D modeling Miguel Andrade Mainz, Germany Faculty of Biology,
Lateral organization and electrostatic control of signaling
Homology 3D modeling and effect of mutations
Predicting Active Site Residue Annotations in the Pfam Database
Target selection strategies for the mouse genome
Marrying structure and genomics
Homology Modeling.
Protein structure prediction.
Protein domains Jasmin sutkovic
Reliability of Assessment of Protein Structure Prediction Methods
Presentation transcript:

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Comprehensive strategy for integrated target selection in structural genomics Burkhard Rost CUBIC Columbia University

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Comprehensive strategy for integrated target selection Our research goal and current reality Unit: sequence-structure families Goals: cover all entire families with good models STAGE 1: CHOP + CLUP + filtering -> novel automatic organization of sequence-structure space STAGE 2: Refined, manual selection -> model all family members? stop-work/hold-work? STAGE 3: Explore experimental structure Answers and perspectives How many structures needed for completion? Euka-proka-archae: overlap? Why collaborate on targets? Multiplexing helpful? High-throughput protein production in eukaryotes?

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Computational biology & bioinformatics

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Sequence-structure family Sequence-structure family U’ Sequence-structure family U

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) EVA: comparative modelling V Eyrich, MA Marti-Renom, D Przybylski, A Fiser, F Pazos, A Valencia, A Sali & B Rost (2001) Bioinformatics 17, MA Marti-Renom, MS Madhusudhan, A Fiser, B Rost, A Sali (2002) Structure 10, Marc Marti Renom & Andrej Sali (UCSF) AccuracyCoverage Cumulative distribution PSI-BLAST 10 -3

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) How to decide when we exclude/include? C Sander & R Schneider 1991 Proteins, 9, B Rost 1999 Prot Engng, 12, 85-94

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Scooping families from proteomes, in practice Problems: domains overlaps

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Choose targets: single-linkage clustering Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins, in press Liu & Rost 2003 Proteins, submitted ~100,000 eukaryotic proteins (yeast, fly, worm, weed, human) clusters in largest cluster NONSENSE! Conclusions: NO clustering of full- length proteins have to chop into structural-domain- like fragments (single-linkage DOES work on PrISM)

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) CHOP proteins into structural domains Liu & Rost 2003 Proteins, submitted

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) CHOP: dissection of proteins into domains Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins, in press Liu & Rost 2003 Proteins, submitted Single-domain proteins: 61% in PDB 28% in 62 proteomes Average domain length in proteins ≥ 2 domains: ~100 residues in proteins with 1 domain: times longer

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) To take or not to take Take if > 50 globular residues and no known 3D

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Structural residue coverage in reality (any) J Liu & B Rost 2002 Bioinformatics, 18, % of residues to do ! ~28%~19%

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) If you believe 53% is pessimistic... 53% residue coverage today based on E-value 1!!

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Clustering after CHOP eukaryotic proteins (Yeast, Fly, Worm, Arabidopsis, Human/30) domain-like fragments no PDB (E-value 10-1, HSSP-distance -3) not good 4 us (membrane, coil, SEG, NORS, signal peptide) go non-singleton Liu, Montelione & Rost 2003 Proteins, in press Jinfeng 21,000 fragment clusters

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Computational biology & bioinformatics

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Main goal of Stage 2 analysis Refine Stage 1 automatic target selection through manual sequence analysis Concept: USE comparative modeling and structural features directly for refined target selection For each sequence-structure family from Stage 1: predict minimal set of exp. structures needed to high-quality model entire family. Diana Murray, Cornell

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) 1. Fold recognition and sequence-to-structure profiles 2. Comparative modeling (PrISM, Nest) 3. Structure evaluation tools (e.g. Verify3d) 4. Calculate biophysical properties Recommend 2 do additional structure if: 1) NESG-cluster members poorly modeled 2) Biophysical properties of models incompatible with known function 3) Models suggest novel functionality Toolbox Input: PDB + NESG cluster Refinement protocol 4 new 3D Target re-prioritization based on weekly PDB updates Diana Murray, Cornell

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) TargetStatus IR21solved, PDB: 1MOS ET28Purified JR15Expressed TT777Expressed GR7Expressed AR12Cloned WR204Selected XR4Expressed Stop work SPINE/ ZebaView Experimental structure of IR21 yielded high-quality models for all members of this NESG sequence/structure family Example of stop work recommendation Diana Murray, Cornell

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) NESG family: HR291 (99% identical to 1P9O), AR1731, HR2295, KR12, DR11 breaks into two clusters: A = (HR291, AR1731, HR2295) and B = (KR12, DR11) Two structures required to cover family: Predicted by Stage 2 analysis and verified by Stage 3 analysis HR291 AR1731 HR2295 HR291 AR1731 HR2295 HR291 AR1731 HR2295 KR12 DR11 KR12 DR11 A B Recommendation: Solve structure of KR12 (purified) Diana Murray, Cornell

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Archaeal structure NESG ID: GR2; PDB ID: 1QXF Archaeoglobus fulgidis S27e protein has only archae and eukaryotic members. Archae and eukaryotes share conserved hydrophobic motif (yellow). Only eukaryotes have N-terminal extension, and their models have strikingly different electrostatic properties. Human protein recommended for structure determination! Model suggests novel function: 30S ribosomal protein S27 Model for human homologue Diana Murray, Cornell

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Summary Stage 2 refinement Statistics: Many families currently under investigation Hold work recommendation: family member at advanced experimental stage predicted to yield good models for entire family -> hold-work for members at early exp. stages re-assess once structure done! Diana Murray, Cornell familiestargetsresult stop-work 40110hold-work 12 another 3D

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Computational biology & bioinformatics

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Exploit structure to speculate about function 43 no previous annotation about function defined by ‘no publication in biological journal’ 39 analyzed 31 result in some predictions about function 8 clear success: functional annotation achieved e.g. predicted active site based on structure typically: conformation of annotation transfer 23 some hints (16 ‘hypothetical proteins’) e.g. some clue about active site mostly completely new! 8 no clue Sharon Goldsmith & Barry Honig

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Answers How many structures needed for completion? Euka-proka-archae: overlap? Why collaborate on targets? Multiplexing helpful? High-throughput protein production in eukaryotes? How many structures needed for completion? Euka-proka-archae: overlap? Why collaborate on targets? Multiplexing helpful? High-throughput protein production in eukaryotes?

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) How many targets for prokaryotes + archae? 16,000 min 8,000 give: 72% fragments 72% proteins 67% residues

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) How many targets for euka-proka-archae? 8,000 8,000 give: 67% fragments 67% proteins 59% residues BUT: 50% of residues remaining

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Overlap between euka-proka-archae? surprisingly small overlap overall even lower for largest families most big families are eukaryotic! ~60% of fragments from eukaryotes no sequence- structure family member from prokaryotes or archae much higher for ‘largest 8,000’: 2,690 (34%) proka+archae only 4,277 (53%) euka only 1,033 (13%) mix

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Why collaborate on target list? competition between consortia has already hampered success-rate considerably! 32% overlap

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Does multiplexing help? Date: Multiplex DOUBLES success rate! ~4%

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Integrated strategy NESG unique, comprehensive, integrated strategy optimized to organize sequence space in structural terms: Stage 1: CHOP+CLUP+filter yields high success in focusing on sequence-structure families Stage 2: detailed refinement embeds comparative models into selection and optimizes structural coverage for family Stage 3: use experimental structure to increase structural family coverage and to allow functional exploitation Needed to do ‘em all: ~38,000 non-singletons 8,000 largest -> 50% of the residues that remain! Genomics: Surprises + our structural perspective changed the ‘world’! The revolutions continue...

NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Thanksgiving $$: NIH/NSF Data: Jinfeng Liu (CUBIC) Hedi Hegyi & Phil Carter (CUBIC), Marc-Marti Renom (UCSD) NESG: Guy Montelione (Rutgers) Barry Honig (Columbia) Diana Murray (Cornell, NYC) Tom Acton (Rutgers), Liang Tong & John Hunt (Columbia), George DeTitta (Buffalo), Cheryl Arrowsmith (Toronto) Wayne Hendrickson (Columbia) EVA: Andrej Sali & Marc-Marti Renom (UCSD), Alfonso Valencia (Madrid) Volker Eyrich, Ingrid Koh & Dariusz Przybylski (CUBIC) Data: Jinfeng Liu (CUBIC) Hedi Hegyi & Phil Carter (CUBIC), Marc-Marti Renom (UCSD) NESG: Guy Montelione (Rutgers) Barry Honig (Columbia) Diana Murray (Cornell, NYC) Tom Acton (Rutgers), Liang Tong & John Hunt (Columbia), George DeTitta (Buffalo), Cheryl Arrowsmith (Toronto) Wayne Hendrickson (Columbia) EVA: Andrej Sali & Marc-Marti Renom (UCSD), Alfonso Valencia (Madrid) Volker Eyrich, Ingrid Koh & Dariusz Przybylski (CUBIC)