Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.

Slides:



Advertisements
Similar presentations
Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Advertisements

Protein Sequencing and Identification by Mass Spectrometry.
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
Protein Identification by Sequence Database Search Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University Medical Center.
Database Searches. Peptide mass fingerprinting digestMS Search HIT SCORE Protein X 1000 Protein Y 50 Protein Z 5 Protein X theoretical digestProtein Y.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Mass Spectrometry Peptide identification
Computing fragmentation trees from tandem mass spectrometry data Florian Rasche1, Aleš Svatoš2, Ravi Kumar Maddula2, Christoph Böttcher3 & Sebastian Böcker1*
The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)
Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.
Fa 06CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston,
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
My contact details and information about submitting samples for MS
CSE182-L12 Mass Spectrometry Peptide identification CSE182.
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
Wi’07Bafna Proteomics via Mass Spectrometry (a bioinformatics perspective) Vineet Bafna
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
Algorithmic Problems in Peptide Sequencing
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
Protein Identification by Sequence Database Search Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University Medical Center.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Proteomics Technology and Protein Identification
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Aggressive Enumeration of Peptide Sequences for MS/MS Peptide Identification Nathan Edwards Center for Bioinformatics and Computational Biology.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
MC 13.3 Spectroscopy, Pt III 1 Introduction to Mass Spectrometry (cont) Principles of Electron-Impact Mass Spectrometry:  A mass spectrometer produces.
Proteomics & Mass Spectrometry
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
이원엽. Abstract InsPecT: a tool to identify post-translational modifications using tandem mass spectrometry data Database filtering using Peptide.
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
De novo interpretation of peptide mass spectra
General Overview of the module and the methods
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Protein Identification Using Tandem Mass Spectrometry
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
(Journal of Computational Biology, 2001) (SODA, 2000)
Presentation transcript:

Protein sequencing and Mass Spectrometry

Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation

Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second

Tandem MS Secondary Fragmentation Ionized parent peptide

The peptide backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus The peptide backbone breaks to form fragments with characteristic masses.

Ionization H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus The peptide backbone breaks to form fragments with characteristic masses. Ionized parent peptide H+H+

Fragment ion generation H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus The peptide backbone breaks to form fragments with characteristic masses. Ionized peptide fragment H+H+

Tandem MS for Peptide ID 147 K 1166 L E D E E L F G S y ions b ions [M+2H] 2+ m/z % Intensity

Peak Assignment 147 K 1166 L E D E E L F G S y ions b ions y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 b3b3 b4b4 b5b5 b8b8 b9b9 [M+2H] 2+ b6b6 b7b7 y9y9 y8y8 m/z % Intensity Peak assignment implies Sequence (Residue tag) Reconstruction!

Database Searching for peptide ID For every peptide from a database –Generate a hypothetical spectrum –Compute a correlation between observed and experimental spectra –Choose the best Database searching is very powerful and is the de facto standard for MS. –Sequest, Mascot, and many others

Spectra: the real story Noise Peaks Ions, not prefixes & suffixes Mass to charge ratio, and not mass –Multiply charged ions Isotope patterns, not single peaks

Peptide fragmentation possibilities (ion types) -HN-CH-CO-NH-CH-CO-NH- RiRi CH-R’ aiai bibi cici x n-i y n-i z n-i y n-i-1 b i+1 R” d i+1 v n-i w n-i i+1 low energy fragmentshigh energy fragments

Ion types, and offsets P = prefix residue mass S = Suffix residue mass b-ions = P+1 y-ions = S+19 a-ions = P-27

Mass-Charge ratio The X-axis is (M+Z)/Z –Z=1 implies that peak is at M+1 –Z=2 implies that peak is at (M+2)/2 M=1000, Z=2, peak position is at 501 –Suppose you see a peak at 501. Is the mass 500, or is it 1000?

Spectral Graph Each prefix residue mass (PRM) corresponds to a node. Two nodes are connected by an edge if the mass difference is a residue mass. A path in the graph is a de novo interpretation of the spectrum G

Spectral Graph Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. Spectral graph: –Each node u defines a putative prefix residue M(u). –(u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. –Paths in the spectral graph correspond to a interpretation S G E K

Re-defining de novo interpretation Find a subset of nodes in spectral graph s.t. –0, M are included –Each peak contributes at most one node (interpretation)(*) –Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) –An appropriate objective function (ex: the number of peaks interpreted) is maximized S G E K G

Two problems Too many nodes. –Only a small fraction are correspond to b/y ions (leading to true PRMs) (learning problem) –Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). –In general, the forbidden pairs problem is NP-hard S G E K

However,.. The b,y ions have a special non-interleaving property Consider pairs (b 1,y 1 ), (b 2,y 2 ) –If (b 1 y 2

Non-Intersecting Forbidden pairs S G E K If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, The de novo problem can be solved efficiently using a dynamic programming technique

The forbidden pairs method There may be many paths that avoid forbidden pairs. We choose a path that maximizes an objective function, –EX: the number of peaks interpreted

The forbidden pairs method Sort the PRMs according to increasing mass values. For each node u, f(u) represents the forbidden pair Let m(u) denote the mass value of the PRM u f(u)

D.P. for forbidden pairs Consider all pairs u,v –m[u] M/2 Define S(u,v) as the best score of a forbidden pair path from 0- >u, v->M Is it sufficient to compute S(u,v) for all u,v? uv

D.P. for forbidden pairs Note that the best interpretation is given by uv

D.P. for forbidden pairs Note that we have one of two cases. 1.Either u v) 2.Or, u > f(v) (and f(u) < v) Case 1. –Extend u, do not touch f(v) u f(u) v

The complete algorithm for all u /* increasing mass values from 0 to M/2 */ for all v /* decreasing mass values from M to M/2 */ if (u > f[v]) else if (u < f[v]) If (u,v)  E /* maxI is the score of the best interpretation */ maxI = max {maxI,S[u,v]}