1 InCoB 2009, Singapore Ren é Hussong et al. Highly accelerated feature detection in mass spectrometry data using modern graphics processing units Bioinformatics 25 (2009). Junior Research Group for Protein-Protein-Interactions and Computational Proteomics Saarland University, Saarbruecken, Germany
2 Outline ∙ Introduction & Motivation - The Differential Proteomics Pipeline ∙ Computational Proteomics - Signal Processing and Feature Detection - The Isotope Wavelet Transform ∙ Parallelization via GPUs ∙ Results & Discussion
3 The Differential Proteomics Pipeline Two probes: e.g. sick vs. healthy Mass Spectrometer List of differentially expressed proteins Applications range from basic pharmaceutical research over medical diagnostics and therapy to biotechnology and engineering.
4 Principle of Biological Mass Spectrometry digest intensity mass Fingerprint ProteinsPeptides Peptides are ionized and accelerated
5 Principle of Biological Mass Spectrometry digest intensity mass Fingerprint mass of a single neutron
6 Principle of Biological Mass Spectrometry digest intensity mass Fingerprint mass of a single neutron
7 (Simple) Feature Finding Typically done by simple thresholding : Needs additional preprocessing steps, like e.g.: - Baseline elimination (e.g. by morphological filters) - Noise reduction and/or smoothing (Mostly) needs resampling Needs additional postprocessing steps, like e.g.: - Peak clustering (so-called “deconvolution”) - Model fitting, charge prediction
8 The Isotope Wavelet Transform Convolution with a kernel function - by construction robust against noise and baseline artifacts - also acts as a filter for chemical noise - predicts simultaneously the charge state - needs no explicit resampling - only a single parameter (threshold)
9 Results – Myoglobin PMF
10 Parallelization via CUDA
11 Parallelization via CUDA
12 Parallelization via CUDA b-th data point
13 Parallelization via CUDA b-th data point
14 Parallelization via CUDA b-th data point
15 Parallelization via CUDA b-th data point
16 Parallelization via CUDA T0 b-th data point Tn
17 Parallelization via CUDA and TBB 2x NVIDIA Tesla C870 via Intel Threading Building Blocks 1x NVIDIA Tesla C870 1x CPU 2.3 GHz >200x speedup
18 Open Issues – Future Work ∙ Solutions for machine-specific ‘ artifacts ’, e.g. - Tailing effects in TOF-Analyzers - Severe mass discretization in high resolution data ∙Separating overlapping patterns ∙Tests for MS n spectra - Refined averagine model GPU solutions
19 Availability: OpenMS ∙An open source C++ library for mass spectrometry ∙Designed for “users” as well as for “developers” ∙ TOPP - “The OpenMS proteomics pipeline” - suite of independent software tools - include file handling / conversion - peak picking and feature detection - includes visualizer TOPPView …
20 References Hussong, R, Gregorius, B, Tholey, A, and Hildebrandt, A (2009). Highly accelerated feature detection in proteomics data sets using modern graphics processing units. Bioinformatics 25. Schulz-Trieglaff, O, Hussong, R, Gr ö pl, C, Leinenbach, A, Hildebrandt, A, Huber, C, and Reinert, K (2008). Computational Quantification of Peptides from LC-MS Data. Journal of Computational Biology 15 (7). Sturm, M, Bertsch, A, Gr ö pl, C, Hildebrandt, A, Hussong, R, Lange, E, Pfeifer, N, Schulz- Trieglaff, O, Zerck, A, Reinert, K, and Kohlbacher, O (2008). OpenMS - An open-source software framework for mass spectrometry, BMC Bioinformatics 9 (163). Hussong, R, Tholey, A, and Hildebrandt, A (2007). Efficient Analysis of Mass Spectrometry Data Using the Isotope Wavelet In: COMPLIFE 2007: The Third International Symposium on Computational Life Science. American Institute of Physics (AIP) 940. Schulz-Trieglaff, O, Hussong, R, Gr ö pl, C, Hildebrandt, A, and Reinert, K (2007). A Fast and Accurate Algorithm for the Quantification of Peptides from Mass Spectrometry Data, In: Proceedings of the Eleventh Annual International Conference on Research in Computational Molecular Biology (RECOMB). Lecture Notes in Bioinformatics (LNBI) 4453.
21 The Isotope Wavelet Transform Kernel function charge state 1, mass 1000D Kernel function charge state 1, mass 2000D - by construction robust against noise and baseline artifacts - also acts as a filter for chemical noise - predicts simultaneously the charge state - needs no explicit resampling - only a single parameter (threshold) Convolution with a kernel function
22 The Isotope Wavelet Transform MS spectrum (charge state 3) charge-1-transform charge-2-transform charge-3-transform
23 The Sweep Line Idea m/z [Th] RT [s] 2 additional parameters: RT_cutoff RT_interleave 2 additional parameters: RT_cutoff RT_interleave
24 digest intensity mass/charge Fingerprint charge state 1 Open Issues – Future Work Fragment Fingerprint
25 Open Issues – Future Work ∙Separating overlapping patterns
26 The Retention Time
27 Results – 2D noisy data
28 The Adaptive Isotope Wavelet Kernel - denotes the Heaviside step function - λ (m) is a linear function fit to the averagine model