Download presentation
Presentation is loading. Please wait.
Published byShanon Gallagher Modified over 6 years ago
1
Proteomics February 15, 2017 Dr. ir. Perry Moerland
Bioinformatics Laboratory Academic Medical Center Graduate School ‘Bioinformatics’
2
Mass spectrometry: MALDI-TOF
need only femtomole of peptide material ionized peptides are accelerated time of flight spectrum From Lodish et al: Ion source + mass analyzer MALDI –TOF MS: Matrix-Assisted Laser Desorption Ionization Time Of Flight Mass Spectrometry mass spectrum intensity mass 2
3
MS ID: peptide mass fingerprint
wet lab: gel protein peptides spectrum digest in silico: protein in database theoretical spectrum digest compare > 59,000,000 entries score From Graves & Haystead: K=lysine, R=arginine We will come back to how to obtain the sticks representation of a spectrum later in this lecture 3
4
Identification: peptide mass fingerprint
Why a fingerprint and not the intact protein mass? Pre-sequences Post translational modifications Alternative splicing Errors in databases: 1 in bases (i.e amino acids) Measurement error: 10 ppm error on 100kDa = 1 Da Pre-sequence: signal peptide 4
5
Intact mass and protein identity
Variable/unknown modifications Intron/exon boundaries N terminal processing Pi X Ac GlcNAc Sequencing errors & polymorphisms C terminal processing Known/fixed modifications Only a few modified peptides Most peptides will have the predicted mass 5
6
Identification: how to score?
Simply counting the number of matches is not enough The limited mass accuracy of MS can lead to both false positives and false negatives 6
7
Identification: how to score?
Simply counting the number of matches is not enough False positive, because MS has only limited mass accuracy Bias towards ‘heavy’ proteins … False negatives, because Peptides can be post-translationally modified Sometimes digestion is not perfect: missed cleavages The limited mass accuracy of MS can lead to both false positives and false negatives 7 7
8
Importance of mass accuracy
1H 12C 12 14N 16O 31P 32S 8
9
Peptide mass fingerprinting: Mascot
Trade-off between false positives (specificity) and false negatives (sensitivity) 9
10
Peptide mass fingerprinting: missed cleavages
Higher sensitivity, lower specificity 10
11
Scoring scheme: MASCOT
Score, Expect? 11
12
Scoring scheme: MOWSE/MASCOT (I)
# of matching proteins For protein A: Mowse score = 50/(NH) 50 : average protein of 50 kDa H : molecular weight of protein A N : product of match scores, each match score M (0≤M ≤ 1) weighted inversely proportionally to peptide mass accumulate statistics on the size distribution of peptide masses as a function of protein mass Note: this implicitly incorporates the number of matches Warning: many scoring schemes (and there are tens of them) are very ad-hoc and prone to false positives 12
13
Scoring scheme: MOWSE/MASCOT (II)
Probabilistic: transformation of the Mowse score in probability P that match is a random event Score: correct for size of database P = 1 / (20 x 5000) S = -10LogP = 50 Expect (E-value): number of times you could expect to get this score or better by chance E-value≥1: completely random 13
14
LC-MS/MS in 30 sec. LC chromatogram MS mass spectrum fragmentation
From Peng and Gygi 14
15
MS/MS fragmentation Breaks a selected peptide in smaller parts
fragmentation spectrum: identifies individual amino acids 15
16
MS/MS peptide identification
Sequence searching Sequence tag De novo 16
17
MS/MS ID: sequence searching
dBase shortlist Predict MSMS spectra tubulin Compare MATCH Very dominant fragmentation pattern from some daughter ions: -good chemical reasons for this behaviour. No good contiguous sequence can be established to a high degree of confidence. In these cases raw MSMS searching is much more apt in identifying parent. RAW MSMS searching will be the future of fragmentation - db searching. Computer generates a theoretical fragmentation spectrum: Mascot 17
18
MS/MS ID: sequence TAG Internal sequence dBase
Format: 409, T1A2G ,13 mass1 internal sequence mass2 Compare mass Internal sequence tag shortlist Compare fragments Strech of contiguous high peaks easily found Peak to peak distance corresponds with aminoacid mass Predominantly Y” low energy CID C&N masses correspond to a combination of aa’s (all permutations) 2+ precursor charge state: Fragment masses> parent mass ensure these are products from a non singly charged molecule. match 18
19
MS/MS ID: de novo (I) De novo: for peptides/proteins not yet in database or not identified by fingerprint techniques: unsequenced species modified peptides Algorithms for extracting long stretches of peptide are needed: high-resolution & very good quality spectra 19
20
MS/MS ID: de novo (II) Problem: MS/MS spectra display a mixture of C- and N-terminal fragment ion series (a, b, c, and x, y, z, respectively) Solution: reduce complexity of MS/MS spectra Protein digestion using metalloendopeptidase with Lys-N specificity: cleavage N-terminally of lysine electron-transfer-induced dissociation MS/MS spectrum is dominated by c-ions MS/MS spectrum has few gaps In a gold standard dataset 42% of peptides identified via database search can be identified de novo Van Breukelen et al, Proteomics (2010) 20
21
MS/MS ID: validation (I)
Many MS/MS search engines are threshold-based Probabilistic approaches: compute probability that a match is correct: Mascot PeptideProphet Nesvizhskii et al, Nature Methods (2007) 21
22
MS/MS ID: validation (II)
Many MS/MS search engines are threshold-based Target-decoy searching: Database augmented with reversed or randomized sequences of DB Filter using various score cutoffs (FDR) Nesvizhskii et al, Nature Methods (2007) 23
23
Cellular changes in T-cell expression patterns upon infection with HIV
Analysis of human T cells upon HIV-1 infection 1921 protein spots detected with 2D DIGE 288 spots differentially expressed at peak infection 93 unique proteins identified via peptide mass fingerprinting 188 unidentified spots Goal: try to find candidate proteins in silico Nandal et al, 16:25, BMC Bioinformatics (2015) 24
24
Recapitulation: 2D gel electrophoresis
Iso-electric point Horizontal axis: separation by pI Vertical axis: separation by Mw, lower = lighter Molecular weight 2D-DIGE: quantification of changes in protein expression using fluorescent labeling 25
25
In-depth mining of 2D-DIGE data
26
26
Step 1: Calculation of pI and Mw
27
27
Step 2: Fitting calibration curves
Cubic smoothing splines 28 28 28
28
Step 3: Generation of candidate list
Properties & pI/Mw ranges of proteins sent to TagIdent -> get list of UNIPROT identifiers back of proteins that are close to predicted pI/Mw Isoforms are disregarded, as STRING does not handle isoforms. 29 29 29
29
Step 4: Prioritization of candidates
STRING 30
30
STRING: functional protein association network
STRING is a database of known and predicted protein interactions, derived from different sources: Interactions are visualized via graphs v9.1: >5,200,000 proteins from 1,133 organisms Different line colors represent the types of evidence for the association STRING 10: database currently covers 9'643'763 proteins from 2'031 organisms. 31
31
STRING network: 2D-DIGE identified proteins
Visualisation of the STRING network. Associations are represented with lines. Different colours of lines code for different evidence categories. New proteins have to fit in this network as well as possible. 377 observed interactions as compared to 66.2 expected interactions (P << ) 32
32
? Step 5: gene expression-based filtering gene expression ~
protein expression ? 33
33
Properties of candidate lists
Post-translational modifications Hydrophobic proteins 34
34
Results TPR: true-positive rate With optimal settings for
pI range and Mw range: ~44% of correct proteins in top-5 35
35
Results: gene expression-based filtering
36
36
Further pointers Catalogues: http://www.humanproteomemap.org/
Galaxy: Boekel et al, Nature Biotechnology, 33:2, 139 (2015) 37
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.