Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Analysis of human haptoglobin, digest with trypsin and Glu-C – six putative N-motif peptides. Glycopeptide separation by hydrophilic interaction liquid.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
Proteomics The proteome is larger than the genome due to alternative splicing and protein modification. As we have said before we need to know All protein-protein.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular.
Gene Set Enrichment and Splicing Detection using Spectral Counting Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University.
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Proteomics Informatics – Overview of Mass spectrometry (Week 2)
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
Chapter 9 Mass Spectrometry (MS) -Microbial Functional Genomics 조광평 CBBL.
Improving Genome Annotation using Proteomics Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park.
Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Nathan Edwards Center for Bioinformatics and Computational Biology
Analysis of human haptoglobin, after digest with trypsin and Glu-C – six putative N-linked motif peptides. Glycopeptide separation by hydrophilic interaction.
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center.
Introduction to Protein Chemistry October 2013 Gustavo de Souza IMM, OUS.
Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry Nathan Edwards Center for Bioinformatics and Computational Biology.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Meta-Search and Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Protein bioinformatics and systems biology Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Separates charged atoms or molecules according to their mass-to-charge ratio Mass Spectrometry Frequently.
Aggressive Enumeration of Peptide Sequences for MS/MS Peptide Identification Nathan Edwards Center for Bioinformatics and Computational Biology.
Improving the Sensitivity of Peptide Identification by Meta-Search, Grid-Computing, and Machine-Learning Nathan Edwards Georgetown University Medical Center.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Top-down characterization of proteins in bacteria with unsequenced genomes Colin Wynne Catherine Fenselau University of Maryland, College Park Nathan Edwards.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Proteomics: Technology and Cell Signaling Presenter: Ido Tal Advisor: Prof. Michal Linial י " ג סיון תשע " ה.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Algorithms and Computation: Bottom-Up Data Analysis Workflows
Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Proteomics Informatics David Fenyő
Protein Identification Using Mass Spectrometry
Mass Spectrometry THE MAIN USE OF MS IN ORG CHEM IS:
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Presentation transcript:

Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

Outline Tandem mass-spectrometry of peptides Detection of alternative splicing protein isoforms Phyloproteomics using top-down mass-spec. Characterization of glycoprotein microheterogeneity by mass-spectrometry 2

Mass Spectrometer 3 Ionizer Sample + _ Mass Analyzer Detector MALDI Electro-Spray Ionization (ESI) Time-Of-Flight (TOF) Quadrapole Ion-Trap Electron Multiplier (EM)

Mass Spectrum 4

Mass is fundamental 5

Sample Preparation for MS/MS 6 Enzymatic Digest and Fractionation

Single Stage MS 7 MS

Tandem Mass Spectrometry (MS/MS) 8 Precursor selection

Tandem Mass Spectrometry (MS/MS) 9 Precursor selection + collision induced dissociation (CID) MS/MS

Why Tandem Mass Spectrometry? MS/MS spectra provide evidence for the amino-acid sequence of functional proteins. Key concepts: Spectrum acquisition is unbiased Direct observation of amino-acid sequence Sensitive to small sequence variations 10

Unannotated Splice Isoform Human Jurkat leukemia cell-line Lipid-raft extraction protocol, targeting T cells von Haller, et al. MCP LIME1 gene: LCK interacting transmembrane adaptor 1 LCK gene: Leukocyte-specific protein tyrosine kinase Proto-oncogene Chromosomal aberration involving LCK in leukemias. Multiple significant peptide identifications 11

Unannotated Splice Isoform 12

Unannotated Splice Isoform 13

Splice Isoform Anomaly Human erythroleukemia K562 cell-line Depth of coverage study Resing et al. Anal. Chem Peptide Atlas A8_IP SALT1A2 gene: Sulfotransferase family, cytosolic, 1A 2 ESTs, 1 mRNA mRNA from lung, small cell-cancinoma sample Single (significant) peptide identification Five agreeing search engines PepArML FDR < 1%. All source engines have non-significant E-values 14

Splice Isoform Anomaly 15

Splice Isoform Anomaly 16

Translation start-site correction Halobacterium sp. NRC-1 Extreme halophilic Archaeon, insoluble membrane and soluble cytoplasmic proteins Goo, et al. MCP GdhA1 gene: Glutamate dehydrogenase A1 Multiple significant peptide identifications Observed start is consistent with Glimmer 3.0 prediction(s) 17

Halobacterium sp. NRC-1 ORF: GdhA1 K-score E-value vs 10% FDR Many peptides inconsistent with annotated translation start site of NP_

Translation start-site correction 19

What if there is no "smoking gun" peptide… 20

What if there is no "smoking gun" peptide… 21

What if there is no "smoking gun" peptide… 22

HER2/Neu Mouse Model of Breast Cancer Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue by LC-MS/MS 1.4 million MS/MS spectra Peptide-spectrum assignments Normal samples (N n ): 161,286 (49.7%) Tumor samples (N t ): 163,068 (50.3%) 4270 proteins identified in total 2-unique generalized protein parsimony 23

Nascent polypeptide-associated complex subunit alpha x 10 -8

Pyruvate kinase isozymes M1/M x 10 -5

Phyloproteomics Fragment intact proteins (top-down MS) Match the spectra to protein sequences Place the organism phylogenetically Works even for unknown microorganisms without any available sequences 26

27 CID Protein Fragmentation Spectrum from Y. rohdei

28 CID Protein Fragmentation Spectrum from Y. rohdei Match to Y. pestis 50S Ribosomal Protein L32

Exact match sequence… 29

Phylogeny: Protein vs DNA 30 Protein Sequence16S-rRNA Sequence

What about mixtures? 31

32 Shared Small Ribosomal Proteins

33 Shared Small Ribosomal Proteins

34 DNA-binding protein HU-alpha m/z , z 13+, E-value 7.5e-26, Δ Eight proteins identified with "large" |Δ| Identified E. herbicola proteins

35 DNA-binding protein HU-alpha m/z , z 13+, E-value 1.91e-58 Use "Sequence Gazer" to find mass shift ΔM mode can "tolerate" one shift for free! Identified E. herbicola proteins

36 DNA-binding protein HU-alpha m/z , z 13+, E-value 7.5e-26, Δ Extract N- and C-terminus sequence supported by at least 3 b- or y-ions Identified E. herbicola proteins

37 E. herbicola protein sequences

38 E. herbicola sequences found in other species

39 Phylogenetic placement of E. herbicola Phylogram Cladogram phylogeny.fr – "One-Click"

Glycoprotein Microheterogeneity Glycosylation is important, but our analytic tools are rather rudimentary Detach glycans (PNGase-F) and analyze glycans Detach glycans (PNGase-F) and analyze peptides Get glycan structures, but no association with protein or protein site, or Get glycosylation sites, but no association with glycan structures. We analyze glycopeptides directly… Challenges all facets of glycoproteomics 40

Altered N-Glycosylation in Cancer 41 N X S/T COO- NH3+ Fut-VIII (α1-6 Fuc) Comunale, 2010 GnT-V (β1-6 GlcNAc) Wang, 2007 ST-VI Gal1 (α 2-6 NeuAc) Hedlund, 2008 Fut-VI (α1-3 Fuc) Higai,2008 Glycosyltransferase Expression or Glycan Analyses GalNAc Sialic Acid Gal GlcNAc Man K. Chandler

The informatics challenge Identify glycopeptides in large-scale tandem mass-spectrometry datasets Many glycopeptide enriched fractions Many tandem mass-spectra / fraction Good, but not great, instrumentation QStar Elite – CID, good MS1/MS2 resolution Strive for hypothesis-generating analysis Site-specific glycopeptide characterization Glycoform occupancy in differentiated samples 42

CID Glycopeptide Spectrum 43

Observations Oxonium ions (204, 366) help distinguish glycopeptides from peptides… …but do little to identify the glycopeptide Few peptide b/y-ions to identify peptides… …but intact peptide fragments are common If the peptide can be guessed, then… …the glycan's mass can be determined 44

Haptoglobin (HPT_HUMAN) NLFLNHSE*NATAK MVSHHNLTTGATLINE VVLHPNYSQVDIGLIK Haptoglobin Standard 45 N-glycosylation motif (NX/ST) * Site of GluC cleavage Pompach et al. Journal of Proteome Research 11.3 (2012): 1728–1740.

Tuning the filters… Oxonium ions: Number & intensity Match tolerance "Intact-peptide" fragments: Number & intensity Match tolerance Glycan composition: ICScore Constrain search space Match tolerance Glycan database: Constrain search space Match tolerance Precursor ion: Non-monoisotopic selection Sodium adducts Charge state Peptide search space: Semi-specific peptides Non-specific peptides Peptide MW range Variable modifications 46

Tuning the filters… We estimate the number of false-positives… …so that the user can tune the search parameters 47

Application of Exoglycosidases to locate Fucose At ITIH4 site N LPTQNITFQTE K. Chandler

NVVFVIDK ITIH4 Glycopeptide 49 K. Chandler

Similar Glycopeptides Spectra ( mass Δ ~ +162 Da) 50 MVSHHNLTTGATLINE ? +162 Da

Fragmented Glycopeptides ( mass Δ ~ +162 Da) 51 MVSHHNLTTGATLINE ? +162 Da MVSHHNLTTGATLINE

Propagating Annotations MVS+A1G1 MVS+A2G2 VVL+A1G1 VVL+A2G2 52 G. Berry

Summary Mass-spectrometry coupled with protein chemistry and good informatics can look beyond the obvious to the unexpected... …and there is plenty to find! 53

Acknowledgements Edwards lab Kevin Chandler Gwenn Berry Fenselau lab (UMD) Colin Wynne Avantika Dhabaria Goldman lab (GU) Kevin Chandler Petr Pompach NSF Graduate Fellowship (Chandler) Funding: NCI 54