Presentation is loading. Please wait.

Presentation is loading. Please wait.

9th Benelux Bioinformatics Conference, 09/12/2014.

Similar presentations


Presentation on theme: "9th Benelux Bioinformatics Conference, 09/12/2014."— Presentation transcript:

1 9th Benelux Bioinformatics Conference, 09/12/2014

2 Pattern mining of mass spectrometry quality control data Wout Bittremieux

3 Mass spectrometry 3 protein digestion peptide separation protein sample peptide sample ion sourcedetector generalized mass spectrometer ion selector fragmentation fragment mass analyzer output spectra

4 Quality control metrics Derived from experimental data Instrument settings 4 Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics 13, 1905–1913 (2014). Bittremieux, W. et al. jqcML: An open-source Java API for mass spectrometry quality control data in the qcML format. Journal of Proteome Research 13, 3484–3487 (2014). Bittremieux, W. et al. Mass spectrometry quality control through instrument monitoring. In preparation.

5 Metrics derived from experimental data 5

6 6

7 7

8 8

9 9

10 10

11 Instrument settings 11

12 Instrument settings 12

13 Instrument settings 13

14 Instrument settings 14

15 Instrument settings 15

16 Instrument settings 16

17 High dimensionality 17

18 Previous approaches: Univariate 18

19 Previous approaches: Multivariate 19

20 Previous approaches: Multivariate 20

21 Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 21 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110

22 Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 22 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110

23 Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 23 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110 ✓✓

24 Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 24 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110 ✓✓ ✗✗

25 Our approach: Subspace clustering Try to find a suitable subset of the original feature space in which (dis)similar items can be found 25 ExperimentQC 1 QC 2 QC 3 QC 4 Exp 1 5610002000 Exp 2 68170150 Exp 3 76140160 Exp 4 3000400160110

26 Frequent itemset mining 26 Aksehirli, E. et al. Cartification: A neighborhood preserving transformation for mining high dimensional data. in 13 th IEEE International Conference on Data Mining 937–942 (2013). Naulaerts, S. et al. A primer to frequent itemset mining for bioinformatics. Briefings in Bioinformatics (2013).

27 Cartification 27 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 Transactions consist of the k nearest neighbors on a single dimension for each item

28 Cartification 28 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 1 2 3 Transactions consist of the k nearest neighbors on a single dimension for each item

29 Cartification 29 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 1 2 3 Transactions consist of the k nearest neighbors on a single dimension for each item 1 2 3

30 Cartification 30 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 1 2 3 Transactions consist of the k nearest neighbors on a single dimension for each item 1 2 3 2 3 4

31 Cartification k -nearest neighbors in the first dimension (X-axis) k -nearest neighbors in the second dimension (Y-axis) 31 1 2 3 1 2 3 2 3 4 3 4 5 3 4 5 5 6 7 7 8 9 7 8 9 8 9 10 9 11 9 10 11 1 2 3 1 2 3 1 3 5 3 4 5 3 4 5 4 6 8 7 8 9 7 8 9 7 9 9 10 11 9 10 11

32 Cartification 32 048121620 4 8 12 16 20 1 2 3 4 5 6 7 8 9 11 10 Frequent itemset mining: 4 maximal frequent itemsets with support = 4

33 CartiClus 1.Convert the high-dimensional database to a transaction database 2.Mine (maximal) frequent itemsets 3.Convert the itemsets to subspace clusters 4.Redo clustering projected on the detected subspaces (optional) 33

34 CartiClus 34

35 Results Detected subspaces Various quartiles of the same metric Related metrics: significant overlap with previous manually defined groups of co-occurring metrics New relationships between metrics to be validated using expert knowledge Detected clusters Highly dependent on projected subspaces Able to capture valid relationships between experiments 35

36 Results 36

37 Results 37

38 Conclusion Different sources of qualitative data Metrics derived from experimental data Instrument settings Subspace clustering to detect patterns in high-dimensional data Univariate insufficient: metrics influence each other Multivariate insufficient: global transformation 38

39 Conclusion Cartification: Neighborhood-preserving transformation Finds relevant subspaces and discards noise Fast Resulting subspace clustering Able to identify relationships between various qualitative metrics Clusters experiments exhibiting similar behavior 39

40 Acknowledgments 40 ADReM / biomina Emin Aksehirli Bart Cuypers Aida Mrzic Stefan Naulaerts Pieter Meysman Bart Goethals Kris Laukens InSPECtor Hanny Willems Lennart Martens Dirk Valkenborg biomina

41


Download ppt "9th Benelux Bioinformatics Conference, 09/12/2014."

Similar presentations


Ads by Google