Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center.

Slides:



Advertisements
Similar presentations
Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Advertisements

UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Understanding Simple Cells Tom Knight Ginkgo Bioworks.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Identification of regulatory proteins from human cells using 2D-GE and LC-MS/MS Victor Paromov Christian Muenyi William L. Stone.
New Tools Samifier: A tool which converts results from protein tandem mass spectrometry into SAM format. This enables co-visualization of genomics, transcriptomics,
Introduction Recent research has proposed rapid and robust identification of intact microorganisms using matrix assisted laser desorption/ ionization time-of-flight.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
Improving Genome Annotation using Proteomics Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park.
ESI and MALDI LC/MS-MS Approaches for Larger Scale Protein Identification and Quantification: Are They Equivalent? 1P. Juhasz, 1A. Falick,1A. Graber, 1S.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry Nathan Edwards Informatics Research.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Meta-Search and Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Figure 1S. BSR homology. Exhaustive pairwise alignment using neighbour-joining phylogeny analysis by Clone Manager7 software shows the high homology of.
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Protein bioinformatics and systems biology Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Proteomics The science of proteomics Applications of proteomics Proteomic methods a. protein purification b. protein sequencing c. mass spectrometry.
Panel Discussion: Reference Databases Nathan Edwards Georgetown University Medical Center.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Panel Discussion: Reference Databases Nathan Edwards Georgetown University Medical Center.
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
In-Gel Digestion Why In-Gel Digest?
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)
Overview of Mass Spectrometry
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Aggressive Enumeration of Peptide Sequences for MS/MS Peptide Identification Nathan Edwards Center for Bioinformatics and Computational Biology.
Improving the Sensitivity of Peptide Identification by Meta-Search, Grid-Computing, and Machine-Learning Nathan Edwards Georgetown University Medical Center.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Top-down characterization of proteins in bacteria with unsequenced genomes Colin Wynne Catherine Fenselau University of Maryland, College Park Nathan Edwards.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Target Analyses in Parallel Reaction Monitoring Mode (PRM)
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D
S. Emonet, H.N. Shah, A. Cherkaoui, J. Schrenzel 
S. Emonet, H.N. Shah, A. Cherkaoui, J. Schrenzel 
Proteomics Informatics –
A, high resolution MS/MS spectrum (lower panel) of 1435
Top-down protein identification.
2D-LC-MS/MS analysis of tryptic digest of HEK293-SUMO3 cells (2 μg inj
Proteomics Informatics David Fenyő
Operation manual of AI SIDA
Presentation transcript:

Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center

2 Microorganism Identification Homeland-security/defense applications Long history of fingerprinting approaches Clinical applications in strain identification: Selection of treatment and/or antibiotics New applications in microbiome analysis: Bacterial colonies in gut,.... Chronic wound infections Compete with genomic approaches? PCR, Next-gen sequencing Primary sales-pitch is speed.

Microorganism Identifications Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to instrumentation and sample prep Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example...but many have – about 2500 to date. 3

Microorganism Identifications Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to instrumentation and sample prep Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example...but many have – about 2500 to date. Can we use the available sequence to identify proteins from unknown, unsequenced bacteria? Yes, for some proteins in some organisms! 4

5 Intact protein LC-MS/MS Crude cell lysate Capilary HPLC C8 column LTQ-Orbitrap XL Precursor scan: 400 m/z Data-dependent precursor selection: 5 most abundant ions 10 second dynamic exclusion Charge-state +3 or greater CAD product ion scan 400 m/z

6 CID Protein Fragmentation Spectrum from Y. rohdei

7 Enterobacteriaceae Protein Sequences Exhaustive set of all Enterobacteriaceae family protein sequences from Swiss-Prot, TrEMBL, RefSeq, Genbank, and [CMR]...plus Glimmer3 predictions on RefSeq Enterobacteriaceae genomes Primary and alternative translation start-sites Filter for intact mass in range 1 kDa – 20 kDa 253,626 distinct protein sequences, 256 species Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.

8 ProSightPC 2.0 Product ion scan decharging Enabled by high-resolution fragment ion measurements THRASH algorithm implementation Absolute mass search mode 15 ppm fragment ion match tolerance 250 Da precursor ion match tolerance "Single-click" analysis of entire LC-MS/MS datafile.

Other tools Explored using standard search engines: Decharge and format as charge +1 spectrum X!Tandem scoring plugin (ProSight, delta M) OMSSA, Mascot, etc… MS-Tools: MS-Deconv, MS-TopDown, MS-Align, MS-Align+, MS-Align-E! 9

10 CID Protein Fragmentation Spectrum from Y. rohdei Match to Y. pestis 50S Ribosomal Protein L32

Exact match sequence… 11

Phylogeny: Protein vs DNA 12 Protein Sequence16S-rRNA Sequence

What about mixtures? 13

14 Shared Small Ribosomal Proteins

15 Shared Small Ribosomal Proteins

16 Identified E. herbicola proteins 30S Ribosomal Protein S19 m/z , z 15+, E-value 1.96e-16, Δ Six proteins identified with |Δ| < 0.02

17 DNA-binding protein HU-alpha m/z , z 13+, E-value 7.5e-26, Δ Eight proteins identified with "large" |Δ| Identified E. herbicola proteins

18 DNA-binding protein HU-alpha m/z , z 13+, E-value 1.91e-58 Use "Sequence Gazer" to find mass shift ΔM mode can "tolerate" one shift for free! Identified E. herbicola proteins

ProSightPC: ΔM mode 19 Protein Sequence Experimental Precursor ΔMΔM b- and y-ions Also: PIITA - Tsai et al. 2009

ProSightPC: ΔM mode 20 Protein Sequence Experimental Precursor ΔMΔM b- and y-ions ΔMΔM b'- and y'-ions Also: PIITA - Tsai et al Match a single "blind" mass-shift for free!

ProSightPC: ΔM mode 21 Protein Sequence Experimental Precursor ΔMΔM b-, b'-, y- and y'-ions ΔMΔM Also: PIITA - Tsai et al Match a single "blind" mass-shift for free!

22 DNA-binding protein HU-alpha m/z , z 13+, E-value 7.5e-26, Δ Extract N- and C-terminus sequence supported by at least 3 b- or y-ions Identified E. herbicola proteins

23 E. herbicola protein sequences

24 E. herbicola sequences found in other species

25 Phylogenetic placement of E. herbicola Phylogram Cladogram phylogeny.fr – "One-Click"

Genome annotation errors UniProt: E. coli Cell division protein ZapB 22 (371) E. coli strains 26 MQFRRGMTMSLEVFEKLEAKVQQAIDTITL… 3 (204) 17 (166) 0 (2)

Genome annotation errors UniProt: E. coli Cell division protein ZapB 22 (371) E. coli strains Need ±1500 Da precursor tolerance… 27 MQFRRGMTMSLEVFEKLEAKVQQAIDTITL… 3 (204) 17 (166) 0 (2)

28 Conclusions Protein identification for unsequenced organisms. Identification and localization for sequence mutations and post-translational modifications. Extraction of confidently established sequence suitable for phylogenetic analysis. Genome annotation correction. New paradigm for phylogenetic analysis?

29 Acknowledgements Dr. Catherine Fenselau Avantika Dhabaria, Joe Cannon*, Colin Wynne* University of Maryland Biochemistry Dr. Yan Wang University of Maryland Proteomics Core Dr. Art Delcher University of Maryland CBCB Funding: NIH/NCI