Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proteogenomics: Refining and Improving Genome Annotation Samuel H Payne J Craig Venter Institute.

Similar presentations


Presentation on theme: "Proteogenomics: Refining and Improving Genome Annotation Samuel H Payne J Craig Venter Institute."— Presentation transcript:

1 Proteogenomics: Refining and Improving Genome Annotation Samuel H Payne J Craig Venter Institute

2 State of Genome Annotation Most prokaryotic genomes are auto-annotated. Sequence and function are inferred with comparative genomics; validation is sparse. Difficulties with novel or HGT genes Mature protein features  localization  PTM, cleavage Salzberg 2007

3 Diversity or Confusion

4 Proteomics Input: protein sample Output: list of peptides

5 Proteogenomics Definition: using proteomics data to do genome annotation Goals:  Find all coding regions of the genome, annotated and unannotated  Submit improved annotation to NCBI  Identify “mature protein” features

6 Proteogenomics Protocol Data sources  Yersinia pestis - Pieper et al., 2008, 2009  Bacillus anthracis – PRC/NIAID

7 Correcting Errors Unannotated genes  Both known and totally novel

8 Correcting Errors Unannotated genes  Both known and totally novel

9 Correcting Errors Start site assignment

10 Exceptions to Rules Multi-ORF genes: self splicing, frame shift

11 Exceptions to Rules Non-canonical start codons  infC – ATT (Sacerdot 1982, Payne 2010) in enterobacteria; ATA in Shewanella (Gupta 2007)  Deinococcus (Baudet 2009) suggests new non- standard starts

12 Overlaps/Wrong Frames

13 Pseudo?genes Expression of ABC transporter n- terminus. Missing critical motif elements. 5 peptides (with splicing) map to a transposable element gene. Sequence alignment to an Arabidopsis Ulp1 Castellana 2008

14 Signal Peptide N-terminal motif, target protein for export 1983 Perlman & Halvorson  Early basic residue, hydrophobic patch, AxB motif – A = [I,V,L,A,G,S], B = [A,G,S]

15 Profile of an Exported Protein  Early basic residue, hydrophobic patch, motif

16 Future Rinse and repeat 30 proteomes in 3 years Stable, robust pipeline for general use  Hosted at TeraGrid NovelNew Start Y. pestis45 B. anthracis46 D. radiodurans225117 D. vulgaris5589 L. interrogans2023

17 When Gene Predictors Fail Are GC extremes difficult?  50% (Y. pestis) – 4 missed  30’s (B. anthracis, L.interrogans) 4, 20  60’s (D. vulgaris, D. radiodurans) 55, 225

18 Are They Strange? Relative GC – does it fail on genes with different GC from others?

19 Are They All Short?

20 We See What We Know Proximity to Model Organism  Yersinia/Bacillus errors: 4/4  ‘Remote species’ errors: 20, 55, >200

21 We See What We Know Hypothetical vs. Named  Compare novel genes to observed proteome  Hypergeometric where Null probability is from the observed proteome HypotheticalNamedp-value B. anthracis310.018 L. interrogans1280.018 D. radiodurans31810 -10 D. vulgaris391610 -14

22 Expressed Protein Resource Protein Sequences  >30 M sequences  nr, uniprot  JCVI metagenomics  JGI genomes 40,000 clusters Cross referenced with proteomics, for validated proteins

23 Acknowledgements Eli Venter Shih-Ting Huang, Rembert Pieper Granger Sutton Dick Smith, PNNL NSF


Download ppt "Proteogenomics: Refining and Improving Genome Annotation Samuel H Payne J Craig Venter Institute."

Similar presentations


Ads by Google