In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

From Genome to Proteome Juang RH (2004) BCbasics Systems Biology, Integrated Biology.
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
1336 SW Bertha Blvd, Portland OR 97219
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Mining Clinical Proteomes for Post-Translational Modifications David L. Tabb, Ph.D.
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.
Antibody Sequencing by LC-MS/MS Paul Shan Bioinformatics Solutions Inc.
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Facts and Fallacies about de Novo Sequencing & Database Search.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Conclusion  Comprehensive workflow identified approximately 70% more high confident peptide as compare to general search strategy.  The comprehensive.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Identification of regulatory proteins from human cells using 2D-GE and LC-MS/MS Victor Paromov Christian Muenyi William L. Stone.
Beyond Database Search PTMs, Mutations & Full Sequence Coverage Bin Ma Professor, University of Waterloo 从搜库到蛋白全序列分析.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Dinosaur Proteomics. 2 Claims Proteins can be extracted from fossilized bones Extracted proteins can be analyzed by LC-MS/MS MS/MS can be matched to.
Common parameters At the beginning one need to set up the parameters.
Improving Peptide Searching Workflow to Maximize Protein Identifications Shadab Ahmad 1, Amol Prakash 1, David Sarracino 1, Bryan Krastins 1, MingMing.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Proteomic Analysis of Ribosome Heterogeneity Proteomics Group Meeting April 1, 2010 Namrata Udeshi, PavanVaidyanathan, Jacob Jaffe, Karl Clauser, Steve.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Overview of Mass Spectrometry
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
MassMatrix Search Results Explained
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
Top-down protein identification.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Presentation transcript:

In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics Solutions Inc, Canada 2 University of Waterloo, Canada

Problem Complete protein sequence coverage o antibody confirmation o biomarker discovery Database search software along is insufficient Protein sequence analysis

Possible reasons for incomplete coverage “non-database” peptides o unexpected modifications o mutated residues o novel peptide database errors Meanwhile Large amount of high-quality spectra are not matched. Protein sequence analysis

A workflow to identify both the database and “non-database” peptides Objective Maximize protein sequence coverage Explain more high-quality MS/MS spectra Proposed workflow for in-depth analysis

Workflow Proposed workflow for in-depth analysis Multiple protein digests with different enzymes High accuracy MS for both precursor and fragment ions

Workflow Proposed workflow for in-depth analysis PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom. 2003;17(20): Identify de novo sequence tags Reveal a set of high quality spectra

Workflow Proposed workflow for in-depth analysis PEAKS DB: De Novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 2012; 11: , 1–8. Identify database peptides. Database search result validated by de novo tags Reveal a set of confident proteins

Workflow Proposed workflow for in-depth analysis PeaksPTM: Mass spectrometry-based identification of peptides with unspecified modifications. Journal of Proteome Research 10.7 (2011) : Identify peptides with unexpected modifications Peptides from the set of confident proteins are “modified” in-silico by trying all possible modifications in UNIMOD. Speed up by de novo tags For input spectra with + highly confident de novo tags - no significant database matches

Workflow Proposed workflow for in-depth analysis SPIDER: software for protein identification from sequence tags with de novo sequencing error. J Bioinform Comput Biol Jun;3(3): Identify peptides with mutation, such as residue insertion, deletion, and substitution. Screen the protein database to find short sequences similar to de novo tags Use both the de novo tags and database sequence to reconstruct the most probable sequences that match the spectrum For input spectra with + highly confident de novo tags - no significant database matches

Workflow Proposed workflow for in-depth analysis Unassigned de novo sequence tags are reported as possible novel peptides

Result integration Proposed workflow for in-depth analysis

Test the workflow with the standard bovine serum albumin Sample Workflow In-depth analysis of BSA Pure ALBU_BOVIN from SIGMA 3 digests with Trypsin, LysC, GluC. LC-MS/MS with Thermo LTQ-Orbitrap XL. Workflow implemented in PEAK 6 3 digests in one project Searched database: Swiss-Prot Trypsin LysC GluC Workflow LC-MS/MS

More PSMs are identified in each additional step: Result 5,152 MS/MS spectra 1,737 PSMs 906 PSMs 44 PSMs 38 MS/MS spectra Filtered at 1% FDR 1,737 -> 2,687 PSMs PEAKS ALC score > 70%

BSA coverage Result The uncovered 4% is in the protein N-terminal region, which is mostly likely cleaved-off and not in the purchased sample 1. 1 specific binding site (Asp-Thr-His-Lys) for Cu(II) ions. T. Peters Jr., F.A. Blumenstock. J. Biol. Chem., 242 (1967), p. 1574

Contaminants Identified with at least 3 unique peptides. – Human keratin proteins (K2C1_HUMAN and K1C_HUMAN) – Bacteria protein (SSPA_STAAR) – Trypsin (TRY1_BOVIN) Result

PTMs Unsuspected modifications identified by PTM search – Three PTMs specified in database search » Carbamidomethylation (C) » Oxidation (M) » Deamidation (NQ) Result

Mutation 214 th amino acid A  T Brown 1975, Fed. Proc. 34:591 Result

Unexplained de novo tags Might be… – Novel peptides outside of the searched database Result KK.QTALVELLK.HK ||||||| DPALVELLKK

A software workflow proposed for in-depth protein sequence analysis Found many things in a “pure” sample – Contaminants – Unsuspected PTMs – Mutations Improved protein sequence coverage – BSA coverage: 87% -> 96% Explained more high-quality MS/MS spectra – Identified MS/MS spectra: 1,737 -> 2,687 Summary

Q / A