Download presentation
Presentation is loading. Please wait.
Published byLucinda Anderson Modified over 8 years ago
1
9th Benelux Bioinformatics Conference, 09/12/2014
2
Pattern mining of mass spectrometry quality control data Wout Bittremieux
3
Mass spectrometry 3 protein digestion peptide separation protein sample peptide sample ion sourcedetector generalized mass spectrometer ion selector fragmentation fragment mass analyzer output spectra
4
Quality control metrics Derived from experimental data Instrument settings 4 Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics 13, 1905–1913 (2014). Bittremieux, W. et al. jqcML: An open-source Java API for mass spectrometry quality control data in the qcML format. Journal of Proteome Research 13, 3484–3487 (2014). Bittremieux, W. et al. Mass spectrometry quality control through instrument monitoring. In preparation.
5
Metrics derived from experimental data 5
6
6
7
7
8
8
9
9
10
10
11
Instrument settings 11
12
Instrument settings 12
13
Instrument settings 13
14
Instrument settings 14
15
Instrument settings 15
16
Instrument settings 16
17
High dimensionality 17
18
Previous approaches: Univariate 18
19
Previous approaches: Multivariate 19
20
Previous approaches: Multivariate 20
21
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 21 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110
22
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 22 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110
23
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 23 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110 ✓✓
24
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 24 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110 ✓✓ ✗✗
25
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 25 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110
26
Frequent itemset mining 26 Aksehirli, E. et al. Cartification: A neighborhood preserving transformation for mining high dimensional data. in 13 th IEEE International Conference on Data Mining 937–942 (2013). Naulaerts, S. et al. A primer to frequent itemset mining for bioinformatics. Briefings in Bioinformatics (2013).
27
Cartification 27 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 Transactions consist of the k nearest neighbors on a single dimension for each item
28
Cartification 28 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 1 2 3 Transactions consist of the k nearest neighbors on a single dimension for each item
29
Cartification 29 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 1 2 3 Transactions consist of the k nearest neighbors on a single dimension for each item 1 2 3
30
Cartification 30 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 1 2 3 Transactions consist of the k nearest neighbors on a single dimension for each item 1 2 3 2 3 4
31
Cartification k -nearest neighbors in the first dimension (X-axis) k -nearest neighbors in the second dimension (Y-axis) 31 1 2 3 1 2 3 2 3 4 3 4 5 3 4 5 5 6 7 7 8 9 7 8 9 8 9 10 9 11 9 10 11 1 2 3 1 2 3 1 3 5 3 4 5 3 4 5 4 6 8 7 8 9 7 8 9 7 9 9 10 11 9 10 11
32
Cartification 32 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 Frequent itemset mining: 4 maximal frequent itemsets with support = 4
33
CartiClus 1.Convert the high-dimensional database to a transaction database 2.Mine (maximal) frequent itemsets 3.Convert the itemsets to subspace clusters 4.Redo clustering projected on the detected subspaces (optional) 33
34
CartiClus 34
35
Results Detected subspaces Various quartiles of the same metric Related metrics: significant overlap with previous manually defined groups of co-occurring metrics New relationships between metrics to be validated using expert knowledge Detected clusters Highly dependent on projected subspaces Able to capture valid relationships between experiments 35
36
Results 36
37
Results 37
38
Conclusion Different sources of qualitative data Metrics derived from experimental data Instrument settings Subspace clustering to detect patterns in high-dimensional data Univariate insufficient: metrics influence each other Multivariate insufficient: global transformation 38
39
Conclusion Cartification: Neighborhood-preserving transformation Finds relevant subspaces and discards noise Fast Resulting subspace clustering Able to identify relationships between various qualitative metrics Clusters experiments exhibiting similar behavior 39
40
Acknowledgments 40 ADReM / biomina Emin Aksehirli Bart Cuypers Aida Mrzic Stefan Naulaerts Pieter Meysman Bart Goethals Kris Laukens InSPECtor Hanny Willems Lennart Martens Dirk Valkenborg biomina
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.