PhD defense C. LU 25/01/2005 1 Probabilistic Machine Learning Approaches to Medical Classification Problems Probabilistic Machine Learning Approaches to.

Slides:

Advertisements

Similar presentations

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

Advertisements

SISTA seminar Feb 28, 2002 Preoperative Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1,

AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.

ECG Signal processing (2)

Pattern Recognition and Machine Learning

Biointelligence Laboratory, Seoul National University

Data Mining Classification: Alternative Techniques

Pattern Recognition and Machine Learning

An Introduction of Support Vector Machine

Support Vector Machines

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Supervised Learning Recap

The Center for Signal & Image Processing Georgia Institute of Technology Kernel-Based Detectors and Fusion of Phonological Attributes Brett Matthews Mark.

Chapter 4: Linear Models for Classification

Computer vision: models, learning and inference

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

PhD Hearing (Oct 15, 2003) Predictive Computer Models for Medical Classification Problems Predictive Computer Models for Medical Classification Problems.

Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Pattern Recognition and Machine Learning

Classification and risk prediction

Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Learning From Data Chichang Jou Tamkang University.

Machine Learning CMPT 726 Simon Fraser University

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Ensemble Learning (2), Tree and Forest

An Introduction to Support Vector Machines Martin Law.

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Biointelligence Laboratory, Seoul National University

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

A Comparative Study on Variable Selection for Nonlinear Classifiers C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.

Reduced the 4-class classification problem into 6 pairwise binary classification problems, which yielded the conditional pairwise probability estimates.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Christopher M. Bishop, Pattern Recognition and Machine Learning.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Biointelligence Laboratory, Seoul National University

Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Data Mining and Decision Support

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.

Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

CEE 6410 Water Resources Systems Analysis

Sparse Kernel Machines

Pattern Recognition and Machine Learning

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Linear Discrimination

Presentation transcript:

PhD defense C. LU 25/01/ Probabilistic Machine Learning Approaches to Medical Classification Problems Probabilistic Machine Learning Approaches to Medical Classification Problems Chuan LU Jury: Prof. L. Froyen, chairman Prof. J. Vandewalle Prof. S. Van Huffel, promotor Prof. J. Beirlant Prof. J.A.K. Suykens, promotor Prof. P.J.G. Lisboa Prof. D. Timmerman Prof. Y. Moreau ESAT-SCD/SISTA Katholieke Universiteit Leuven

PhD defense C. LU 25/01/ Clinical decision support systems Advances in technologies facilitate data collection computer based decision support systems Human beings: subjective, experience dependent. Artificial intelligence (AI) in medicine Expert system Machine learning Diagnostic modelling Knowledge discovery STOP Coronary Disease Computer Model

PhD defense C. LU 25/01/ Medical classification problems Essential for clinical decision making Constrained diagnosis problem e.g. benign -, malignant + (for tumors). Classification Find a rule to assign an obs. into one of the existing classes supervised learning, pattern recognition Our applications: Ovarian tumor classification with patient data Brain tumor classification based on MRS spectra Benchmarking cancer diagnosis based on microarray data Challenge: uncertainty, validation, curse of dimensionality

PhD defense C. LU 25/01/ Good performance Apply learning algorithms, autonomous acquisition and integration of knowledge Approaches Conventional statistical learning algorithms Artificial neural networks, Kernel-based models Decision trees Learning sets of rules Bayesian networks Machine learning

PhD defense C. LU 25/01/ Probabilistic framework Building classifiers – a flowchart Probability of disease Feature selection Model selection Test, Prediction Predicted Class New pattern Classifier Machine Learning Algorithm Training Patterns + class labels Central Issue Good generalization performance! model fitness  complexity Regularization, Bayesian learning Central Issue Good generalization performance! model fitness  complexity Regularization, Bayesian learning

PhD defense C. LU 25/01/ Outline Supervised learning Bayesian frameworks for blackbox models Preoperative classification of ovarian tumors Bagging for variable selection and prediction in cancer diagnosis problems Conclusions Supervised learning Bayesian frameworks for blackbox models Preoperative classification of ovarian tumors Bagging for variable selection and prediction in cancer diagnosis problems Conclusions

PhD defense C. LU 25/01/ Conventional linear classifiers Linear discriminant analysis (LDA) R Discriminating using z=w T x  R Maximizing between-class variance while minimizing within- class variance Probability of malignancy  Tumor marker x1x1 inputs w0w0 x2x2 xDxD ageFamily historybias w2w2 wDwD w1w1... output Logistic regression (LR) Logit: log (odds) Parameter estimation: maximum likelihood

PhD defense C. LU 25/01/ Feedforward neural networks Training (Back-propagation, L-M, CG,…), validation, test Regularization, Bayesian methods Automatic relevance determination (ARD) Applied to MLP  variable selection Applied to RBF-NN  relevance vector machines (RVM) Local minima problem inputs x1x1 x2x2 xDxD...   hidden layer output Multilayer Perceptrons (MLP) Radial basis function (RBF) neural networks x1x1 x2x2 xDxD...  bias Basis function Activation function

PhD defense C. LU 25/01/ Support vector machines (SVM) For classification: functional form Statistical learning theory [Vapnik95] kernel function x   (x)

PhD defense C. LU 25/01/ Support vector machines (SVM) For classification: functional form Statistical learning theory [Vapnik95] Margin maximization x w T x + < w T x + b < 0 Class: -1 w T x + > w T x + b > 0 Class: +1 Hyperplane: w T x + = w T x + b = 0 x x x x x x margin x kernel function 2/  w  2

PhD defense C. LU 25/01/ Support vector machines (SVM) For classification, functional form Statistical learning theory [Vapnik95] Margin maximization Positive definite kernel k(.,.) RBF kernel: Linear kernel: Feature space Mercer’s theorem k(x, z) = Dual space kernel function Additive kernel-based models Enhanced interpretability Variable selection! Quadratic programming Sparseness, unique solution Additive kernels Kernel trick

PhD defense C. LU 25/01/ Least squares SVMs LS-SVM classifier [Suykens99] SVM variant Inequality constraint  equality constraint Quadratic programming  solving linear equations Primal problem solved in dual space Dual problem

PhD defense C. LU 25/01/ Model evaluation Performance measure Accuracy: correct classification rate Receiver operating characteristic (ROC) analysis Confusion table ROC curve Area under the ROC curve AUC=P[y(x – )<y(x + )] True result — Test result—TNFN FPTP Assumption: equal misclass. cost and constant class distribution in the target environment Training ValidationTest Training ValidationTestTP TN FN FP

PhD defense C. LU 25/01/ Outline Supervised learning Bayesian frameworks for blackbox models Preoperative classification of ovarian tumors Bagging for variable selection and prediction in cancer diagnosis problems Conclusions

PhD defense C. LU 25/01/ Bayesian frameworks for blackbox models Advantages Automatic control of model complexity, without CV Possibility to use prior info and hierarchical models for hyperparameters Predictive distribution for output Principle of Bayesian learning [ MacKay95] Define the probability distribution over all quantities within the model Update the distribution given data using Bayes’ rule Construct posterior probability distributions for the (hyper)parameters. Prediction based on the posterior distributions over all the parameters. Principle of Bayesian learning [ MacKay95] Define the probability distribution over all quantities within the model Update the distribution given data using Bayes’ rule Construct posterior probability distributions for the (hyper)parameters. Prediction based on the posterior distributions over all the parameters.

PhD defense C. LU 25/01/ Bayesian inference Likelihood  Prior Evidence Posterior = Bayes’ rule Model evidence Marginalization (Gaussian appr.) [MacKay95, Suykens02, Tipping01]

PhD defense C. LU 25/01/ Sparse Bayesian learning (SBL) Automatic relevance determination (ARD) applied to f(x)=w T  (x) Prior for w m varies hierarchical priors  sparseness Basis function  (x) Original variable  linear SBL model variable selection!  variable selection! Kernel  relevance vector machines Relevance vectors: prototypical Sequential SBL algorithm [Tipping03] RVM

PhD defense C. LU 25/01/ Sparse Bayesian LS-SVMs Iteratively pruning of easy cases (support value  <0) [Lu02] Mimicking margin maximization as in SVM Support vectors close to decision boundary Sparse Bayesian LSSVM Sparse Bayesian LSSVM

PhD defense C. LU 25/01/ Variable (feature) selection Importance in medical classification problems Economics of data acquisition Accuracy and complexity of the classifiers Gain insights into the underlying medical problem Filter, wrapper, embedded We focus on model evidence based methods within the Bayesian framework [Lu02, Lu04] Forward / stepwise selection Bayesian LS-SVM Sparse Bayesian learning models Accounting for uncertainty in variable selection via sampling methods Who’s who?

PhD defense C. LU 25/01/ Outline Supervised learning Bayesian frameworks for blackbox models Preoperative classification of ovarian tumors Bagging for variable selection and prediction in cancer diagnosis problems Conclusions

PhD defense C. LU 25/01/ Ovarian cancer diagnosis Problem Ovarian masses Ovarian cancer : high mortality rate, difficult early detection Treatment of different types of ovarian tumors differ Develop a reliable diagnostic tool to preoperatively discriminate between malignant and benign tumors. Assist clinicians in choosing the treatment. Medical techniques for preoperative evaluation Serum tumor maker: CA125 blood test Ultrasonography Color Doppler imaging and blood flow indexing Two-stage study Preliminary investigation: KULeuven pilot project, single-center Extensive study: IOTA project, international multi-center study

PhD defense C. LU 25/01/ Ovarian cancer diagnosis Attempts to automate the diagnosis Risk of malignancy Index (RMI) [Jacobs90] RMI= score morph × score meno × CA125 Mathematical models Logistic Regression Multilayer perceptrons Kernel-based models Bayesian belief network Hybrid Methods Kernel-based models Bayesian Framework

PhD defense C. LU 25/01/ Preliminary investigation – pilot project Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~ records (data with missing values were excluded), 25 features. 291 benign tumors, 134 (32%) malignant tumors Preprocessing: e.g. CA_125->log, Color_score {1,2,3,4} -> 3 design variables {0,1}.. Descriptive statistics Demographic, serum marker, color Doppler imaging and morphologic variables

PhD defense C. LU 25/01/ Experiment – pilot project Desired property for models: Probability of malignancy High sensitivity for malign.  low false positive rate. Compared models Bayesian LS-SVM classifiers RVM classifiers Bayesian MLPs Logistic regression RMI (reference) ‘Temporal’ cross-validation Training set: 265 data (1994~1997) Test set: 160 data (1997~1999) Multiple runs of stratified randomized CV Improved test performance Conclusions for model comparison similar to temporal CV

PhD defense C. LU 25/01/ Variable selection – pilot project Forward variable selection based on Bayesian LS-SVM Evolution of the model evidence 10 variables were selected based on the training set (first treated 265 patient data) using RBF kernels.

PhD defense C. LU 25/01/ Model evaluation – pilot project  Compare the predictive power of the models given the selected variables ROC curves on test Set (data from 160 newest treated patients)

PhD defense C. LU 25/01/ Model evaluation – pilot project Comparison of model performance on test set with rejection based on  The rejected patients need further examination by human experts  Posterior probability essential for medical decision making  The rejected patients need further examination by human experts  Posterior probability essential for medical decision making

PhD defense C. LU 25/01/ Extensive study – IOTA project International Ovarian Tumor Analysis Protocol for data collection A multi-center study 9 centers 5 countries: Sweden, Belgium, Italy, France, UK 1066 data of the dominant tumors 800 (75%) benign 266 (25%) malignant About 60 variables after preprocessing

PhD defense C. LU 25/01/ Data – IOTA project

PhD defense C. LU 25/01/ Model development – IOTA project Randomly divide data into Training set: N train =754 Test set: N test =312 Stratified for tumor types and centers Model building based on the training data Variable selection: with / without CA125 Bayesian LS-SVM with linear/RBF kernels Compared models: LRs Bay LS-SVMs, RVMs, Kernels: linear/RB, additive RBF Model evaluation ROC analysis Performance of all centers as a whole / of individual centers Model interpretation?

PhD defense C. LU 25/01/ Model evaluation – IOTA project MODELa (12 var) MODELa (12 var) MODELb (12 var) MODELb (12 var) MODELaa (18 var) MODELaa (18 var) Comparison of model performance using different variable subsets Variable subset matters more than model type Linear models suffice pruning Variable subset

PhD defense C. LU 25/01/ Test in different centers – IOTA project Comparison of model performance in different centers using MODELa and MODELb AUC range among the various models ~ related to the test set size of the center. MODELa performs slightly better than MODELb, but not significant

PhD defense C. LU 25/01/ Model visualization – IOTA project Model fitted using 754 training data. 12 Var from MODELa. Bayesian LS-SVM with linear kernels Class cond. densities Posterior prob. Test AUC: Sensitivity: 85.3% Specificity: 89.5%

PhD defense C. LU 25/01/ Outline Supervised learning Bayesian frameworks for blackbox models Preoperative classification of ovarian tumors Bagging for variable selection and prediction in cancer diagnosis problems Conclusions

PhD defense C. LU 25/01/ Bagging linear SBL models for variable selection in cancer diagnosis Microarrays and magnetic resonance spectroscopy (MRS) High dimensionality vs. small sample size Data are noisy Sequential sparse Bayesian learning algorithm based on logit models (no kernel) as basic variable selection method: unstable, multiple solutions => How to stabilize the procedure?

PhD defense C. LU 25/01/ Bagging strategy Bagging: bootstrap + aggregate Training data 12B … Bootstrap sampling Linear SBL 1 Linear SBL 2 Linear SBL B … Model1Model2ModelB Variable selection Test pattern output averaging Model ensemble output …

PhD defense C. LU 25/01/ Brain tumor classification Based on the 1 H short echo magnetic resonance spectroscopy (MRS) spectra data 205  138 L2 normalized magnitude values in frequency domain 3 classes of brain tumors Class 1vs 3 Class 2vs 3 Class 1vs 2 P(C 1 |C 1 or C 2 ) P(C 1 |C 1 or C 3 ) P(C 2 |C 2 or C 3 ) P(C 1 ) P(C 2 ) P(C 3 ) ? class Joint post. probability Pairwise cond. class probability CouplePairwise binary classification meningiomas astrocytomas II glioblastomas metastases Class3 Class2 Class1 N 1 =57 N 2 =22 N 3 =126

PhD defense C. LU 25/01/ Brain tumor multiclass classification based on MRS spectra data Mean accuracy (%) Variable selection methods Mean accuracy from 30 runs of CV 89% 86%

PhD defense C. LU 25/01/ Biological relevance of the selected variables – on MRS spectra Mean spectrum and selection rate for variables using linSBL+Bag for pairwise binary classification

PhD defense C. LU 25/01/ Outline Supervised learning Bayesian frameworks for blackbox models Preoperative classification of ovarian tumors Bagging for variable selection and prediction in cancer diagnosis problems Conclusions

PhD defense C. LU 25/01/ Conclusions Bayesian methods: a unifying way for model selection, variable selection, outcome prediction Kernel-based models Less hyperparameter to tune compared with MLPs Good performance in our applications. Sparseness: good for kernel-based models RVM  ARD on parametric model LS-SVM  iterative data point pruning Variable selection Evidence based, valuable in applications. Domain knowledge helpful. Variable seection matters more than the model type in our applications. Sampling and ensemble: stabilize variable selection and prediction.

PhD defense C. LU 25/01/ Conclusions Compromise between model interpretability and complexity possible for kernel-based models via additive kernels. Linear models suffice in our application. Nonlinear kernel-based models worth of trying. Contributions Automatic tuning of kernel parameter for Bayesian LS-SVM Sparse approximation for Bayesian LS-SVM Proposed two variable selection schemes within Bayesian framework Used additive kernels, kPCR and nonlinear biplot to enhance the interpretability of the kernel-based models Model development and evaluation of predictive models for ovarian tumor classification, and other cancer diagnosis problems.

PhD defense C. LU 25/01/ Future work Bayesian methods: integration for posterior probability, sampling methods or variational methods Robust modelling. Joint optimization of model fitting and variable selection. Incorporate uncertainty, cost in measurement into inference. Enhance model interpretability by rule extraction? For IOTA data analysis, multi-center analysis, prospective test. Combine kernel-based models with belief network (expert knowledge), dealing with missing value problem.

PhD defense C. LU 25/01/ Acknowledgments Prof. S. Van Huffel and Prof. J.A.K. Suykens Prof. D. Timmerman Dr. T. Van Gestel, L. Ameye, A. Devos, Dr. J. De Brabanter. IOTA project EU-funded research project INTERPRET coordinated by Prof. C. Arus EU integrated project eTUMOUR coordinated by B. Celda EU Network of excellence BIOPATTERN Doctoral scholarship of the KUL research council

PhD defense C. LU 25/01/ Thank you!