Mass Spectrometry in Life Science: Technology and Data-Evaluation H. Thiele Bruker Daltonik, Germany
MALDI-TOF Mass Spectrometry Proteomics Functional Genomics Proteome Analysis Investigation of protein diversity Genomics SNP Genotyping Search for genetic variations MALDI-TOF MS Identification No a priori knowledge about analyte Screening Analyte of known MW Bridging Proteomics & Genomics
Mass Spectrometer for Biopolymer Research The Technology
Laser Sample plate Analyte molecules in matrix Acceleration grids Drift region Ion detector Mass spectrum Vacuum system Vacuum lock 20 to 200 spectra have to be added; total duration 2 to 20 seconds with 50 (200) Hertz Laser Principle of MALDI-TOF-MS Linear flight tube Flight time m/z all ions with E kin = 1/2mv 2 space/energy uncertainty
HiRes mass spectrum Ion detector MALDI ion source Ion reflector The reflector focuses ions of same mass but different E kin (velocity) on detector; high resolution is obtained Laser High resolution TOF-MS with Reflector Flight time m/z 0V+ kV
MS/MS by PSD MS/MS = fragment ion or tandem mass spectromentry PSD = Post Source Decay
Electr. potential ion energy Metastable decay of molecular ions, energy is reduced according to mass ratio SourceReflector PSD by Reflectron TOF (Scheme) Adjustment of voltages Segment 1 Segment 2 Segment 3 Segment 4 E = 1/2 mv 2 v=const. eg. if M + = 1000, m = 500 has 4 keV m = 100 has 0.8 keV m =25 has 100 eV
Daughter ion mass spectrum Ion detector Parent ion selector MALDI ion source Laser The daughter ion spectrum can only be measured in segments which have to be pasted together segments are necessary. Strong fieldWeaker field Weak field Adjustment of voltages Ion reflector Manual operation: 20 – 40 minutes; automatic operation: 5 – 10 minutes per daughter ion spectrum (100 acquisitions in each segment) TOF-MS/MS by PSD
In proteomics, many proteins have to be separated and analysed fast to avoid degradation Regarding structure information, MALDI MS/MS appears to be optimal, but PSD is much too slow ! Consequence: Development of a fast MALDI MS/MS instrument !
MALDI TOF/TOF with post-acceleration by potential LIFT
Electr. potential ion energy Decaying ions, energy reduced, low speed SourceReflector LIFT All fragment ions can be analyzed simultaneously, no segmenting necessary 1. TOF 2. TOF Even low mass ions have high energy, good for detection Potential is switched when ions are in LIFT TOF/TOF with LIFT (Scheme)
Daughter ion mass spectrum Ion detector Ion reflector Parent ion selector MALDI ion source Potential LIFT for post acceleration MS/MS spectrum of daughter ions is measured in a single acquisition; no pasting of segments; low sample consumption, high speed, high sensitivity 1 to 200 spectra needed; 1 to 10 seconds only with 20 Hertz laser Laser Parent ion supressor TOF -MS/MS with post-acceleration by LIFT LID Collision Cell (CID)
Identification of Proteins (sequence of amino acids) and Protein modifications Data Evaluation Goal : – – Fragmentation of proteins / peptides resulting in PMF / PFF spectra – – Detection (annotation) of the masses of the fragments – – Identification by database searches Method :
- Detection of peaks with low signal/noise ratio - Identification (mass, area, intensity) of (overlapping) isotopic patterns - Score the results - Detection of multiple charges (TOF spectra z = 1,2) average mass monoisotopic mass Problems to be solved by Bioinformatics Detection of protonated molecular ion [M+H]+ Isotopic resolution nominal mass
12 C 93 1 H N O S + : monoisotopic Isotopic pattern of peptides 12 C 93 1 H H 14 N O S + : 1.4%, m= C C 1 H N O S + : 88.9%, m= C 93 1 H N N 16 O S + : 8.1%, m= C 93 1 H N O S + : 0.7%, m= C 93 1 H N O O 1 32 S + : 0.9%, m=
Deisotoping: Assigning monoisotopic masses SNAP approach: Peak selection - -Damping of chemical noise using FFT filtering - -Baseline correction - -noise calculation - -peak search Iterative search for isotopic patterns – –Analysing the largest peaks first – –Alignment of patterns using peak list heuristic and FFT deconvolution – –Nonlinear fit using asymmetric line shape – –Subtraction of analysed patterns Reevaluation – –Fit of intensities of overlapping patterns, optional addition of ICAT masses – –Calculation of Quality Factor
SNAP : Regularized FFT Deconvolution Uncertainty of mean peptide isotopic distribution
SNAP : Nonlinear Fit - 2 Exponentially modified gaussians for asymmetric line shapes: Local optima for least square fit:
SNAP : Quality Factor Idea: Get a value for the quality of a pattern which can be used in favor of S/N or intensity for selecting the “best” peaks Fuzzy Scoring Basic Scoring 22 Area/Width Mean deviation , for all patterns Kind of Spectrum/ Instrument Quality factor
From overlapping peak groups SNAP : Use Case To monoisotopic masses
Wavelet Methods for Denoising Proteomics Spectra Denoising by Hard Thresholding Scale - adaptive Thresholds Preservation of Position, Shape and Amplitude of major Peaks Wavelet Transform Hard Thresholding Inverse Wavelet Transform
Denoising by Hard Thresholding Further Developments Baseline Correction " " Deconvolution of Isotopic Patterns " " Scale-Energy Parameters for enhanced Clustering
Charge Deconvolution : Without Isotopic Resolution m/z Different m/z peaks of Equine Apomyoglobin Protein MW is calculated from m/z differences between adjacant peaks by deconvolution software (result see inlet). Protein Z = Peptide Z = 1,2,3,4 Small molecules Z = 1 Charge states for ESI Related Ion Deconvolution Peak Picking m/z ; intensity Deconvolution envelope; distances Result Z + MW [M+zH] z+ /z M
Charge Deconvolution: Isotopic Resolution d (m/z) =0.25 u d (m/z) =0.2 u (M+4H) 4+ (M+5H) 5+ For isotopically resolved patterns the charge state and the mass can be determined from a single pattern.
Calibration Get more accurate data Problems to be solved by Bioinformatics
Automatic „Smart“ Calibration Automatic Control based on external and internal data Resulting Accuracy <10 ppm High Precision Correction improves stability & accuracy Contaminants, self digestion Mass distribution of peptides External calibration spots Statistical ReferencesInternal CalibrantsExternal Calibration Automatic “Smart” Calibration Tof(m/z) = c 0 +c 1 (m/z) 1/2 +c 2 (m/z) + fixed high precision correction
Statistical Calibration for Proteomics Peaklist Statistical Reference Masses Assign Masses (dM < dErr) Calibrate dErr := Max(50, 0.5*dErr) dErr>=50 Stop Initial Error dErr<500 ppm Using modified Mann’s clustering Resulting Accuracy <20ppm NoYes
Details of the Calibration Routine: Internal Multipoint Calibration – an Example Matching with contaminants Exclusion limit 800ppm Final calibration calibration, reject in- accurate masses average error: 13.4ppm calibration, reject un- matched masses 1.Calibration round Exclusion limit 150ppm average error: 66.7 ppm 2.Calibration round Exclusion limit 40ppm calibration, reject in- accurate masses average error: 16.3ppm error [ppm] measured mass [Da]
Iterative Generation of internal calibrant list Calibration PMFSearch Generation of an improved calibrant list Start of PMF identification with a default calibrant list usually 2 repeats are sufficient The default calibrant list usually consists of three typical trypsin peptides Improved calibrant lists typically contain of masses – averagely of these can be found in a spectra
MS based Identity Search Search Engines Problems to be solved by Bioinformatics
MS Protein Identification is Probability based How closely is a given protein or peptide sequence matching to the measured masses ? There are several strategies for a matching “ score“ : For example: - -Probability based MOWSE score (Mascot) - -Bayesian probability (ProFound) - -Cross correlation (MS-Fit) Masses determined by MS are not unique Identification is probability based Problem of assigning true probabilities to a given identification
Part 2 Successively changing various search parameters to test their influence. Optimisation of search parameters. Part 1 Comparison of the performance of the search engines using a typical set of search parameters. Dataset: 168 MALDI PMF spectra About 10,000 searches have been performed to establish a statistical basis the data was acquired in the environment of a typical proteome project Evaluation of PMF and Search Engines
% of searches ProFound Z score 5% significance level ProFound % of searches Mascot score 5% significance level Mascot % of searches log (MS-Fit MOWSE Score) MS-Fit Comparision of PMF Search Engines – Score Distribution
random matches % of searches ProFound Z score 5% significance level range of uncertainty correct identifications ProFound - scoring distribution Idea: Integration of search results from different engines could improve significance and confidence! An effective ranking of results can be assessed by individual search score distributions Converting the Scoring Distribution to a MetaScore
- Effective sorting of reported results of several search engines - - More correct Proteins are on rank number one - - Elimination of false positives - - drawback: MetaScore does not reflect true probabilities Ranking of Search Results of different PMF algorithms by MetaScore
Automated validation of Search Results Search Engines Problems to be solved by Bioinformatics
m/z PMF Result judgement m/z MS/MS m/z List of precursor masses Fuzzy Engine MetaScoring MTP-Viewer Result visualization Identified ? No Auto MS/MS definition Search result driven Queries Yes From Automation to High Throughput
Fuzzy Engine for Protein Identifikation from PMF spectra FL Probability Score Score Ratio to unrelated Sequence Sequence Coverage Correlation Coefficient Peak Quality Factor Identified Undefined Uncertain (unique) Uncertain (multiple) Identified (multiple) Bad data
Automated MS/MS Precursor Ion Selection Automation & High Throughput Problems to be solved by Bioinformatics
Strategies for automated MS/MS acquisition
Acknowledgement Jens Decker, Michael Kuhn Bruker Daltonik Martin Blüggel, Daniel Chamrad Peter Maaß Kristian Bredies