Top-down characterization of proteins in bacteria with unsequenced genomes Colin Wynne Catherine Fenselau University of Maryland, College Park Nathan Edwards Georgetown University Medical Center
2 Microorganism Identification Important application of mass spectrometry Match spectra with sequence for identity Many bacteria will never be sequenced... Pathogen simulants, for example...but many have – about 1000 to date. Can we use the available sequence to identify proteins from unsequenced bacteria? Yes, for some proteins in some organisms! Yersinia rohdei, Erwinia herbicola, Enterobacter cloacae
3 Intact protein LC-MS/MS Crude cell lysate Capilary HPLC C8 column LTQ-Orbitrap XL Precursor scan: 400 m/z Data-dependent precursor selection: 5 most abundant ions 10 second dynamic exclusion Charge-state +3 or greater CID product ion scan 400 m/z
4 CID Protein Fragmentation Spectrum from Y. rohdei
5 Enterobacteriaceae Protein Sequences Exhaustive set of all Enterobacteriaceae protein sequences from Swiss-Prot, TrEMBL, RefSeq, Genbank, and CMR Plus, Glimmer3 predictions on Enterobacteriaceae genomes from RefSeq Primary and alternative translation start-sites Filter for intact mass in range 1 kDa – 20 kDa 253,626 distinct protein sequences, 256 species Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.
6 ProSightPC 2.0 Product ion scan decharging Enabled by high-resolution fragment ion measurements THRASH algorithm implementation Absolute mass search mode 15 ppm fragment ion match tolerance 250 Da precursor ion match tolerance "Single-click" analysis of entire LC-MS/MS datafile.
7 CID Protein Fragmentation Spectrum from Y. rohdei Match to Y. pestis 50S Ribosomal Protein L32
8 Identified E. herbicola proteins 30S Ribosomal Protein S19 m/z , z 15+, E-value 1.96e-16, Δ Six proteins identified with |Δ| < 0.02
9 DNA-binding protein HU-alpha m/z , z 13+, E-value 7.5e-26, Δ Eight proteins identified with "large" |Δ| Identified E. herbicola proteins
10 DNA-binding protein HU-alpha m/z , z 13+, E-value 1.91e-58, Δ 0.11 Use "Sequence Gazer" to find mass shift Identified E. herbicola proteins
11 DNA-binding protein HU-alpha m/z , z 13+, E-value 7.5e-26, Δ Extract N- and C-terminus sequence supported by at least 3 b- or y-ions Identified E. herbicola proteins
12 E. herbicola protein sequences
13 E. herbicola sequences found in other species
14 Phylogenetic placement of E. herbicola Phylogram Cladogram phylogeny.fr – "One-Click"
15 Genome Annotation Correction Serratia proteamaculans CSR, RPS19 Citrobacter koseri RPL32 Enterobacter sakazakii RPS21 RPL30 Enterobacter sakazakii Sodalis glossinidius Photorhabdus luminescens* Erwinia tasmaniensis Enterobacter sp. 638 Some spectra match Glimmer predictions only!
16 Conclusions Protein identification for unsequenced organisms. Identification and localization for sequence mutations and post-translational modifications. Extraction of confidently established sequence suitable for phylogenetic analysis. Genome annotation correction. New paradigm for phylogenetic analysis?
17 Acknowledgements Dr. Catherine Fenselau Colin Wynne, Joe Cannon University of Maryland Biochemistry Dr. Yan Wang University of Maryland Proteomics Core Dr. Art Delcher University of Maryland CBCB Funding: NIH/NCI
18 Shared "Biomarker" Proteins
19 Shared "Biomarker" Proteins