Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute.

Slides:



Advertisements
Similar presentations
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Advertisements

A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Yue Han and Lei Yu Binghamton University.
Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration.
3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)
Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
LVQ acrosome integrity assessment of boar sperm cells Nicolai Petkov 1, Enrique Alegre 2 Michael Biehl 1, Lidia Sánchez 2 1 University of Groningen, The.
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge AGCT meeting, August 2011 David Amar, Yaron Orenstein & Ron Zeira Ron Shamir’s.
A Computer Aided Detection System For Digital Mammograms Based on Radial Basis Functions and Feature Extraction Techniques By Mohammed Jirari Shanghai,
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 4: Feature matching
. Differentially Expressed Genes, Class Discovery & Classification.
Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
DIMACS Workshop on Machine Learning Techniques in Bioinformatics 1 Cancer Classification with Data-dependent Kernels Anne Ya Zhang (with Xue-wen.
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Common Core State Standards for Mathematics Making Inferences and Justifying Conclusions S-IC Math.S-IC.5. Use data from a randomized experiment to compare.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
Whole Genome Expression Analysis
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
Classification of boar sperm head images using Learning Vector Quantization Rijksuniversiteit Groningen/ NL Mathematics and Computing Science
The Broad Institute of MIT and Harvard Classification / Prediction.
Stabil07 03/10/ Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Dynamical Analysis of LVQ type algorithms, WSOM 2005 Dynamical analysis of LVQ type learning rules Rijksuniversiteit Groningen Mathematics and Computing.
Measurement Variables Describing Distributions © 2014 Project Lead The Way, Inc. Computer Science and Software Engineering.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Regularization in Matrix Relevance Learning Petra Schneider,
Prototype-based learning and adaptive distances for classification
Michael Biehl Mathematics and Computing Science University of Groningen / NL Prototype-based classifiers and their applications in the life-sciences
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
The sbv IMPROVER species translation challenge Sometimes you can trust a rat Sahand Hormoz Adel Dayarian KITP, UC Santa Barbara Gyan Bhanot Rutgers Univ.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
NTU & MSRA Ming-Feng Tsai
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Bootstrap and Model Validation
Classification of FDG-PET* Brain Data
David Amar, Tom Hait, and Ron Shamir
Predicting Recurrence in Clear Cell Renal Cell Carcinoma
Prototype-based models
Biomedical applications of prototype-based
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Classifiers!!! BCH364C/394P Systems Biology / Bioinformatics
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Gene Expression Classification
Prototype-based models in unsupervised and supervised machine learning
Tree Net algorithm contruction
Natural killer receptor ligand expression on acute myeloid leukemia impacts survival and relapse after chemotherapy by Sara Mastaglio, Eric Wong, Travis.
Model Enhanced Classification of Serious Adverse Events
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Robust diagnosis of DLBCL from gene expression data from different laboratories DIMACS - RUTCOR Workshop on Boolean and Pseudo-Boolean Functions in Memory.
Roc curves By Vittoria Cozza, matr
Volume 118, Issue 2, Pages (July 2004)
Presentation transcript:

Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen, The Netherlands 1 Centre for Diabetes, Endicronology & Metabolism School of Clinical & Experimental Medicine University of Birmingham, UK Team Admire-LVQ Adaptive Distance Measures In Relevance Learning Vector Quantization

33 DREAM6/FlowCAP2 challenge 2011 The DREAM project [ Dialogue for Reverse Engineering Assessments and Methods FlowCAP initiative [ Flow Cytometry: Critical Assessment of Population Identification Methods Organizers Ryan Brinkman, British Columbia Cancer Agency Raphael Gottardo, Fred Hutchinson Cancer Research Center Tim Mosmann, University of Rochester Richard H. Scheuermann, University of Texas Southwestern Medical Center Organizers Gustavo Stolovitzky, Robert Prill, Raquel Norel, Pablo Meyer, IBM Computational Biology Center Julio Saez-Rodriguez, European Bioinformatics Institute (EMBL-EBI)

44 flow cytometry preprocessing cell size, granularity, +26 protein markers (ten-) thousands of events per marker 4 training set: 23 AML patients, 156 healthy donors test set : 180 unlabeled patients Wade Rogers, U. of Pennsylvania peripheral blood/ bone marrow aspirate fluorophore- conjugated antibodies for specific proteins ©

55 list of markers 1 FS lin (~ cell size) 2 SS log (~ granularity) 3 CD45 (protein marker) measured in all cells } 5 © four diff. features

66 possible workflow: - selection of cells, based on e.g. FS Lin, SS Log, CD-45 - inspection of all markers only for selected cells e.g. differential diagnosis (subtypes) list of markers here: classification based on entire cell population and all markers target diagnosis: AML patient / healthy donor unspecific with respect to types of AML consideration of frequencies / histograms only information about single cells disregarded

77 class-conditional mean histograms healthy donors AML patients suggested set of features (1)mean (2) standard deviation (3) skewness (4) kurtosis (5) median (6) interquartile range

88 class-conditional mean histograms healthy donors AML patients suggested set of features (1)mean (2) standard deviation (3) skewness (4) kurtosis (5) median (6) interquartile range

99 feature vectors (186-dim.) healthy donors (mean) AML patients (mean)

10 matrix relevance LVQ Training: correct prototype ∙ cost function based Generalized Matrix LVQ (GMLVQ) ∙ gradient based optimization of E ( prototypes and matrix Ω ) simplest setting: 1 prototype per class, healthy donors / AML patients vectors w in 186-dim. features space nearest prototype classifier according to adaptive distance measure wrong prototype

11 - 5/6 of data for training, 1/6 for validation - ROC, threshold-average over 50 random splits validation FS Lin SS Log CD45 all markers false positive rate true positive rate

12 - 5/6 of data for training, 1/6 for validation - ROC, threshold-average over 50 random splits - note: patient 116 consistently misclassified validation true positive rate false positive rate

13 validation training set errors validation set errors patient “116” (AML)

14 visualization patient 116 projection on first eigenvector of Λ prototypes

15 prediction: 180 test set patients projection on first eigenvector of Λ test set prototypes

16 “AML – score” prediction: 180 test set patients 20 AML cases! perfect test set prediction e.g. AUROC = 1 (achieved by 8 teams!) Note: GMLVQ scores are not directly interpretable as “certainties” or probabilistic assignments

17 difference vector “ AML - healthy ” prototype here: components corresponding to mean values prototypes

18 relevances relevance of markers: in detail: iqr median kurtosis skewness std. dev. mean ← diagonal elements of Λ

19 relevances relevance of markers: in detail: iqr median kurtosis skewness std. dev. mean SS log

20 “AML – score” scores, certainties, ranking ? 20 AML cases! perfect test set prediction e.g. AUC =1 (ROC) comparison: scores vs. ground truth (?) : Pearson-correlation: sum of |differences|:

21 “transformed AML – score” 20 AML cases! perfect test set prediction e.g. AUC =1 (ROC) comparison: scores vs. ground truth: Pearson-correlation: sum of |differences|: scores, certainties, ranking ? Pearson-correlation: sum of |differences|:

22 summary feature vectors: moment based characteristics of flow cytometry data [mean, standard deviation, skewness, kurtosis, median, iqr ] Matrix Relevance Learning Vector Quantization - perfect classification with respect to training and test set (e.g. AUC(roc)=1) - weighting of features (pairs of features) according to their relevance in the classification - visualization of the data set - identification of outliers (“116” ?)

23 outlook selection of reduced feature set: relevance matrix results suggest a selection of protein markers and/or specific features identification / diagnosis of AML subtypes - AML subtypes to be identified by specific marker profiles - machine learning approach requires larger data sets, e.g. GMLVQ with several prototypes representing AML - back to gating – selection of cells for differential diagnosis? direct classification of histograms non-Euclidean, histogram-specific distance measures e.g. Divergence-based LVQ [Mwebaze et al., 2010]

24 P. Schneider, M. Biehl, B. Hammer, Adaptive relevance matrices in learning vector quantization Neural Computation 21: (2009) A recent application in tumor classification: references ( W. Arlt, M. Biehl, A.E. Taylor et al. J Clinical Endocrinology & Metabolism, in press (2011) Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Patients with Adrenal Tumors The method (GMLVQ):

25 thanks Thanks