9th Benelux Bioinformatics Conference, 09/12/2014
Pattern mining of mass spectrometry quality control data Wout Bittremieux
Mass spectrometry 3 protein digestion peptide separation protein sample peptide sample ion sourcedetector generalized mass spectrometer ion selector fragmentation fragment mass analyzer output spectra
Quality control metrics Derived from experimental data Instrument settings 4 Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics 13, 1905–1913 (2014). Bittremieux, W. et al. jqcML: An open-source Java API for mass spectrometry quality control data in the qcML format. Journal of Proteome Research 13, 3484–3487 (2014). Bittremieux, W. et al. Mass spectrometry quality control through instrument monitoring. In preparation.
Metrics derived from experimental data 5
6
7
8
9
10
Instrument settings 11
Instrument settings 12
Instrument settings 13
Instrument settings 14
Instrument settings 15
Instrument settings 16
High dimensionality 17
Previous approaches: Univariate 18
Previous approaches: Multivariate 19
Previous approaches: Multivariate 20
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 21 ExperimentQC 1 QC 2 QC 3 QC 4 Exp Exp Exp Exp
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 22 ExperimentQC 1 QC 2 QC 3 QC 4 Exp Exp Exp Exp
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 23 ExperimentQC 1 QC 2 QC 3 QC 4 Exp Exp Exp Exp ✓✓
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 24 ExperimentQC 1 QC 2 QC 3 QC 4 Exp Exp Exp Exp ✓✓ ✗✗
Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 25 ExperimentQC 1 QC 2 QC 3 QC 4 Exp Exp Exp Exp
Frequent itemset mining 26 Aksehirli, E. et al. Cartification: A neighborhood preserving transformation for mining high dimensional data. in 13 th IEEE International Conference on Data Mining 937–942 (2013). Naulaerts, S. et al. A primer to frequent itemset mining for bioinformatics. Briefings in Bioinformatics (2013).
Cartification Transactions consist of the k nearest neighbors on a single dimension for each item
Cartification Transactions consist of the k nearest neighbors on a single dimension for each item
Cartification Transactions consist of the k nearest neighbors on a single dimension for each item 1 2 3
Cartification Transactions consist of the k nearest neighbors on a single dimension for each item
Cartification k -nearest neighbors in the first dimension (X-axis) k -nearest neighbors in the second dimension (Y-axis)
Cartification Frequent itemset mining: 4 maximal frequent itemsets with support = 4
CartiClus 1.Convert the high-dimensional database to a transaction database 2.Mine (maximal) frequent itemsets 3.Convert the itemsets to subspace clusters 4.Redo clustering projected on the detected subspaces (optional) 33
CartiClus 34
Results Detected subspaces Various quartiles of the same metric Related metrics: significant overlap with previous manually defined groups of co-occurring metrics New relationships between metrics to be validated using expert knowledge Detected clusters Highly dependent on projected subspaces Able to capture valid relationships between experiments 35
Results 36
Results 37
Conclusion Different sources of qualitative data Metrics derived from experimental data Instrument settings Subspace clustering to detect patterns in high-dimensional data Univariate insufficient: metrics influence each other Multivariate insufficient: global transformation 38
Conclusion Cartification: Neighborhood-preserving transformation Finds relevant subspaces and discards noise Fast Resulting subspace clustering Able to identify relationships between various qualitative metrics Clusters experiments exhibiting similar behavior 39
Acknowledgments 40 ADReM / biomina Emin Aksehirli Bart Cuypers Aida Mrzic Stefan Naulaerts Pieter Meysman Bart Goethals Kris Laukens InSPECtor Hanny Willems Lennart Martens Dirk Valkenborg biomina