Reduced the 4-class classification problem into 6 pairwise binary classification problems, which yielded the conditional pairwise probability estimates.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Neural Networks and Kernel Methods
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
SISTA seminar Feb 28, 2002 Preoperative Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1,
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Protein Fold Recognition with Relevance Vector Machines Patrick Fernie COMS 6772 Advanced Machine Learning 12/05/2005.
Minimum Redundancy and Maximum Relevance Feature Selection
Chapter 4: Linear Models for Classification
Computer vision: models, learning and inference
PhD Hearing (Oct 15, 2003) Predictive Computer Models for Medical Classification Problems Predictive Computer Models for Medical Classification Problems.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Pattern Recognition and Machine Learning
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Machine Learning CMPT 726 Simon Fraser University
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Bayesian Framework EE 645 ZHAO XIN. A Brief Introduction to Bayesian Framework The Bayesian Philosophy Bayesian Neural Network Some Discussion on Priors.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Bayesian Learning for Conditional Models Alan Qi MIT CSAIL September, 2005 Joint work with T. Minka, Z. Ghahramani, M. Szummer, and R. W. Picard.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PhD defense C. LU 25/01/ Probabilistic Machine Learning Approaches to Medical Classification Problems Probabilistic Machine Learning Approaches to.
Support Vector Machine Applications Electrical Load Forecasting ICONS Presentation Spring 2007 N. Sapankevych 20 April 2007.
PATTERN RECOGNITION AND MACHINE LEARNING
A Comparative Study on Variable Selection for Nonlinear Classifiers C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
INTRODUCTION TO Machine Learning 3rd Edition
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning 5. Parametric Methods.
NTU & MSRA Ming-Feng Tsai
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
Review of statistical modeling and probability theory Alan Moses ML4bio.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
3. Linear Models for Regression 後半 東京大学大学院 学際情報学府 中川研究室 星野 綾子.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CEE 6410 Water Resources Systems Analysis
Probability Theory and Parameter Estimation I
Sparse Kernel Machines
Ch3: Model Building through Regression
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Mathematical Foundations of BME Reza Shadmehr
Biointelligence Laboratory, Seoul National University
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Presentation transcript:

reduced the 4-class classification problem into 6 pairwise binary classification problems, which yielded the conditional pairwise probability estimates. coupled the conditional pairwise probability to obtain the joint posterior probability for each class by using Hastie’s method. the variables used should be the union of the variables selected by the 6 binary sparse Bayesian logit models. 3 Experiments 3.1 Data Binary cancer classification Based on Micro-array gene expression data [4] normalized to have mean zero and variance one. Multiclass classification of brain tumors * Use of the brain tumor data provided by the EU funded INTERPRET project (IST , carbon.uab.es/ INTERPRET) is gratefully acknowledged Based on the 1 H short echo magnetic resonance spectroscopy (MRS) spectra data [5]. Four major types of brain tumors: – benign (glioblastomas, metastases) – malignant (menigiomas, astrocytomas of grade II). 205 spectra  138 L2 normalized magnitude values in the frequency domain 3.2 Experimental settings Since the number of samples is very small compared with the dimension of the variables, variable selection was not purely based on one single training set. For the two binary classification problems For the multiclass classification problem 2 Methods 2.1 Sparse Bayesian modelling Sparse Bayesian learning is the application of Bayesian automatic relevance determination (ARD) to models linear in their parameters, by which the sparse solutions to the regression or classification tasks can be obtained [1]. The predictions are based upon some functions y(x) defined in the input space x: Two forms for the basis functions  m (x): – Original input variables  m = x m – Kernel basis function  m = K(x; x m ), where K(:; :) denotes some symmetric kernel functions. For a regression problem, the likelihood of the data for a sparse Bayesian model can be expressed as: where  2 is the variance of the i.i.d. noise. The parameters w are given a Gaussian prior where  = {  m } is a vector of hyperparameters, with a uniform prior on log(  m ).  using a penalty function  m log|w m | in terms of regularization, with preference to a smoother model. Estimate these hyperparameters: maximizing marginal likelihood p(T | w;  2 ) with respect to  and  2. This optimization process can be performed efficiently using an iterative re-estimation procedure. A fast sequential learning algorithm is also available [2]. The greedy selection procedure enables us to process the data of high dimensionality efficiently. 2.2 linear Sparse Bayesian logit model for variable selection For binary classification problems, utilize the logistic function g(y) = 1/(1 + e -y ) [1]. The marginal likelihood is binomial. No noise variance in this case, and a local Gaussian approximation is used to compute the posterior distribution of the weights. The most relevant variables for this classifier can be obtained from the resulting sparse solutions, if the original variables are taken as the basis function in the linear sparse Bayesian classifier. Variable selection using linear sparse Bayesian models for medical classification problems Chuan LU Dept. of Electrical Engineering Acknowledgements This research was funded by the projects of IUAP IV-02 and IUAP V-22, KUL GOA-MEFISTO-666, IDO/99/03, FWO G and G Further information Chuan Lu K.U.Leuven – Dept. ESAT Division of SCD-SISTA Kasteelpark Arenberg Leuven (Heverlee), Belgium Supervisors: Prof. Sabine Van Huffel Prof. Johan J.A.K. Suykens Tel.: Fax: Introduction In medical classification problems, variable selection can have an impact on the economics of data acquisition and the accuracy and complexity of the classifiers, and is helpful in understanding the underlying mechanism that generated the data. In this work, we investigate the use of Tipping’s sparse Bayesian learning method with linear basis functions in variable selection. The selected variables were then used in different types of probabilistic linear classifiers, including linear discriminant analysis (LDA) models, logistic regression (LR) models, relevance vector machines (RVMs) with linear kernels [1] and the Bayesian least squares support vector machines (LS- SVM) with linear kernels [3]. 3.3 Results LOO accuracy for binary classification problems. We obtained zero LOO errors by using only 4 and 5 selected genes on 3 out of the 4 linear classifiers, for the Leukemia and colon cancer data respectively. Note:’N/A’ stands for ’not available’ due to numerical problems. Test performance for 4-class brain tumor classification. The averaged test performance, from 30 random crossvalidation (CV) trials, increases from accuracy of 68.48% to 75.34% by using variable selection for the linear LS-SVM classifier that performs best in this experiment. 4 Discussion and Conclusions  Use of the proposed variable selection pre- processing can increase the generalization performance of the linear models.  The algorithm appeared to be fast and efficient in dealing with datasets with very high dimentionality.  The results from these experiments are somehow biased  Future work requires more experiments in order to see the characteristics of this variable selection procedure (esp. when combined with bagging)  the performance when compared with the other variable selection methods. References [1] M.E. Tipping, Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, [2] M.E. Tipping and A. Faul, Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of Artificial Intelligence and Statistics ’03, [3] J.A.K. Suykens, T. Van Gestel et al., Least Squares Support Vector Machines. Singapore: World Scientific, [4] I. Guyon et al., Gene selection for cancer classification using support vector machines, Machine learning, [5] L. Lukas, A. Devos et al., Classification of brain tumours using 1H MRS spectra, internal report, ESAT-SISTA, K.U.Leuven, Be aware of the uncertainty involved resulting from – the existence of multiple solutions, – the sensitivity of the algorithm to small perturbations of experimental conditions. Attempts to tackle this problem are e.g. bagging, model averaging and committee machines. Here we focus only on the selection of a single subset of variables. cancerno. samplesno. genestask leukemia subtypes colon622000disease/normal