Download presentation
Presentation is loading. Please wait.
Published byOsborn McKinney Modified over 9 years ago
1
Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen, The Netherlands 1 Centre for Diabetes, Endicronology & Metabolism School of Clinical & Experimental Medicine University of Birmingham, UK Team Admire-LVQ Adaptive Distance Measures In Relevance Learning Vector Quantization
3
33 DREAM6/FlowCAP2 challenge 2011 The DREAM project [www.the-dream-project.org] Dialogue for Reverse Engineering Assessments and Methods FlowCAP initiative [http://flowcap.flowsite.org] Flow Cytometry: Critical Assessment of Population Identification Methods Organizers Ryan Brinkman, British Columbia Cancer Agency Raphael Gottardo, Fred Hutchinson Cancer Research Center Tim Mosmann, University of Rochester Richard H. Scheuermann, University of Texas Southwestern Medical Center Organizers Gustavo Stolovitzky, Robert Prill, Raquel Norel, Pablo Meyer, IBM Computational Biology Center Julio Saez-Rodriguez, European Bioinformatics Institute (EMBL-EBI)
4
44 flow cytometry preprocessing cell size, granularity, +26 protein markers (ten-) thousands of events per marker 4 training set: 23 AML patients, 156 healthy donors test set : 180 unlabeled patients Wade Rogers, U. of Pennsylvania peripheral blood/ bone marrow aspirate fluorophore- conjugated antibodies for specific proteins © www.the-dream-project.org
5
55 list of markers 1 FS lin (~ cell size) 2 SS log (~ granularity) 3 CD45 (protein marker) measured in all cells } 5 © www.the-dream-project.org four diff. features
6
66 possible workflow: - selection of cells, based on e.g. FS Lin, SS Log, CD-45 - inspection of all markers only for selected cells e.g. differential diagnosis (subtypes) list of markers here: classification based on entire cell population and all markers target diagnosis: AML patient / healthy donor unspecific with respect to types of AML consideration of frequencies / histograms only information about single cells disregarded
7
77 class-conditional mean histograms healthy donors AML patients suggested set of features (1)mean (2) standard deviation (3) skewness (4) kurtosis (5) median (6) interquartile range
8
88 class-conditional mean histograms healthy donors AML patients suggested set of features (1)mean (2) standard deviation (3) skewness (4) kurtosis (5) median (6) interquartile range
9
99 feature vectors (186-dim.) healthy donors (mean) AML patients (mean)
10
10 matrix relevance LVQ Training: correct prototype ∙ cost function based Generalized Matrix LVQ (GMLVQ) ∙ gradient based optimization of E ( prototypes and matrix Ω ) simplest setting: 1 prototype per class, healthy donors / AML patients vectors w in 186-dim. features space nearest prototype classifier according to adaptive distance measure wrong prototype
11
11 - 5/6 of data for training, 1/6 for validation - ROC, threshold-average over 50 random splits validation FS Lin SS Log CD45 all markers false positive rate true positive rate
12
12 - 5/6 of data for training, 1/6 for validation - ROC, threshold-average over 50 random splits - note: patient 116 consistently misclassified validation true positive rate false positive rate
13
13 validation training set errors validation set errors patient “116” (AML)
14
14 visualization patient 116 projection on first eigenvector of Λ prototypes
15
15 prediction: 180 test set patients projection on first eigenvector of Λ test set prototypes
16
16 “AML – score” prediction: 180 test set patients 20 AML cases! perfect test set prediction e.g. AUROC = 1 (achieved by 8 teams!) Note: GMLVQ scores are not directly interpretable as “certainties” or probabilistic assignments
17
17 difference vector “ AML - healthy ” prototype here: components corresponding to mean values prototypes
18
18 relevances relevance of markers: in detail: iqr median kurtosis skewness std. dev. mean ← diagonal elements of Λ
19
19 relevances relevance of markers: in detail: iqr median kurtosis skewness std. dev. mean SS log
20
20 “AML – score” scores, certainties, ranking ? 20 AML cases! perfect test set prediction e.g. AUC =1 (ROC) comparison: scores vs. ground truth (?) : Pearson-correlation: 0.9703 sum of |differences|: 3.8455
21
21 “transformed AML – score” 20 AML cases! perfect test set prediction e.g. AUC =1 (ROC) comparison: scores vs. ground truth: Pearson-correlation: 0.9820 sum of |differences|: 4.4347 scores, certainties, ranking ? Pearson-correlation: 0.9703 sum of |differences|: 3.8455
22
22 summary feature vectors: moment based characteristics of flow cytometry data [mean, standard deviation, skewness, kurtosis, median, iqr ] Matrix Relevance Learning Vector Quantization - perfect classification with respect to training and test set (e.g. AUC(roc)=1) - weighting of features (pairs of features) according to their relevance in the classification - visualization of the data set - identification of outliers (“116” ?)
23
23 outlook selection of reduced feature set: relevance matrix results suggest a selection of protein markers and/or specific features identification / diagnosis of AML subtypes - AML subtypes to be identified by specific marker profiles - machine learning approach requires larger data sets, e.g. GMLVQ with several prototypes representing AML - back to gating – selection of cells for differential diagnosis? direct classification of histograms non-Euclidean, histogram-specific distance measures e.g. Divergence-based LVQ [Mwebaze et al., 2010]
24
24 P. Schneider, M. Biehl, B. Hammer, Adaptive relevance matrices in learning vector quantization Neural Computation 21: 3532-3561 (2009) A recent application in tumor classification: references (www.cs.rug.nl/~biehl) W. Arlt, M. Biehl, A.E. Taylor et al. J Clinical Endocrinology & Metabolism, in press (2011) Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Patients with Adrenal Tumors The method (GMLVQ):
25
25 thanks Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.