Robust Fisher Discriminant Analysis

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
CHAPTER 10: Linear Discrimination
Computer vision: models, learning and inference
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines (and Kernel Methods in general)
Assuming normally distributed data! Naïve Bayes Classifier.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Variations of Minimax Probability Machine Huang, Kaizhu
Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Javad Lavaei Department of Electrical Engineering Columbia University Joint work with Somayeh Sojoudi Convexification of Optimal Power Flow Problem by.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 5.2: Recap on Probability Theory Jürgen Sturm Technische Universität.
Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Review of statistical modeling and probability theory Alan Moses ML4bio.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS 9633 Machine Learning Support Vector Machines
Matt Gormley Lecture 4 September 12, 2016
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CH 5: Multivariate Methods
Jan Rupnik Jozef Stefan Institute
Classification Discriminant Analysis
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Classification Discriminant Analysis
Modelling data and curve fitting
Support Vector Machines Most of the slides were taken from:
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Feature space tansformation methods
Generally Discriminant Analysis
Linear Programming Introduction.
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
COSC 4368 Machine Learning Organization
Linear Programming Introduction.
Linear Discrimination
A task of induction to find patterns
Optimization under Uncertainty
Presentation transcript:

Robust Fisher Discriminant Analysis Article presented at NIPS 2005 By Seung-Jean Kim, Alessandro Magnani, Stephen P. Boyd Presenter: Erick Delage February 14, 2006

Outline Background on Fisher Linear Discriminant Analysis Making the approach robust to small sample sets while maintaining computation efficiency Experimental results

Fisher Discriminant Analysis Given two Random Variables X,Y n, find the linear discriminant  n that maximizes Fisher’s discriminant ratio: Unique solution : Easy to compute Probabilistic interpretation Kernelizable Naturally extends to k-class problems

Probabilistic Interpretation The Fisher discriminant is the Bayes optimal classifier for two normal distributions with equal covariance. Fisher discriminant analysis can be shown to:

Using Kernels When discriminating in feature space (x). We can use kernels: And show that  is of the form: A projection along  given by: And find  by solving: K_i,j = k(x_i,x_j) where x_I, x_j are data samples I and j.

Robust Fisher Discriminant Analysis Uncertainty in (x, x) & (y, y) FDA is sensitive to estimation errors of these parameters. Can we make it more robust using general convex uncertainty models on the problem data? Is it still a computationally feasible technique?

Max Worst-case Fisher Discriminant ratio Assuming ,where U is a convex compact subset. We can try optimizing: From basic min-max theory, we know (1)  (2)

Max Worst-case Fisher Discriminant ratio

Max Worst-case Fisher Discriminant ratio Given , Because , (1) is equivalent to (2) which is convex and can be solved efficiently using a tractable general methods (e.g. interior point methods).

Experimental Results Two benchmark problems from the machine learning repository Sonar: 208 points, n = 60 Ionosphere: 351 points, n = 34 Uncertainty models:

Experimental Results

References S.-J. Kim, A. Magnani, and S. P. Boyd: Robust Fisher Discrminant Analysis. In T. Leen, T. Dietterich and V. Tresp, editors, Advances in Neural Information Processing Systems, 18, pp 659-666 ,MIT Press, 2006. S. Mika, G. Rätsch, and K.-R. Müller: A Mathematical Programming Approach to the Kernel Fisher Algorithm. In T. Leen, T. Dietterich and V. Tresp, editors, Advances in Neural Information Processing Systems, pp 591-597,MIT Press, 2000.