580.691 Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Unsupervised Learning
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Lecture 3 Nonparametric density estimation and classification
Chapter 4: Linear Models for Classification
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Classification and risk prediction
Statistical Decision Theory, Bayes Classifier
Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Networks. Male brain wiring Female brain wiring.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
EM and expected complete log-likelihood Mixture of Experts
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Optimal Bayes Classification
Linear Models for Classification
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Lecture 2. Bayesian Decision Theory
Lecture 1.31 Criteria for optimal reception of radio signals.
Usman Roshan CS 675 Machine Learning
Probability Theory and Parameter Estimation I
CH 5: Multivariate Methods
Classification of unlabeled data:
Comp328 tutorial 3 Kai Zhang
Classification Discriminant Analysis
Outline Parameter estimation – continued Non-parametric methods.
Classification Discriminant Analysis
More about Posterior Distributions
Mathematical Foundations of BME Reza Shadmehr
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Integration of sensory modalities
Generally Discriminant Analysis
Mathematical Foundations of BME
Learning Theory Reza Shadmehr
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Nonparametric density estimation and classification
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data

Bayesian classification Suppose we wish to classify vector x as belonging to a class: {1,…,L}. We are given labeled data and need to form a classification function: Classify x into the class l that maximizes the posterior probability. priorLikelihood marginal

Classification when distributions have equal variance Suppose we wish to classify a person as male or female based on height. What we have: What we want: femalemale Note that the two densities have equal variance Assume equal probability of being male or female:

Classification when distributions have equal variance

Estimating the decision boundary between data of equal variance Suppose the distributions for the data in each class is a Gaussian. The decision boundary between any two classes is where the log of the ratio is zero. If the data in each class has a Gaussian density with equal variance, then the boundary between any two classes is a line.

Estimating the decision boundary from estimated densities From the data we can get an ML estimate of Gaussian parameters Class 1 Class 2 Class 3

Relationship between Bayesian classification and Fischer discriminant If we have two classes, class -1 and class +1, then the decision boundary is at 0: For the Bayesian classifier, under assumption of equal variance, the decision boundary is at: The Fischer decision boundary is the same as the Bayesian when the two classes have equal variance and equal prior probability.

Classification when distributions have unequal variance What we have: Classification: Assume:

Quadratic discriminant: when data comes from unequal variance Gaussians The decision boundary between any two classes is where the log of the ratio is zero. If the data in each class has a Gaussian density with unequal variance, then the boundary between any two classes is a quadratic function of x. green red blue

Non-parametric estimate of densities: Kernel density estimate Suppose we have points x(i) that belong to class l. Suppose we can’t assume that these points come from a Gaussian distribution. To estimate the density, we need to form a function that assigns a weight to each point x in our space, with the integral of this function equal to 1. It seems that the more data points x (i) we find around x, the more the weight of x should be. The kernel density estimate puts a Gaussian centered at each data point. Where there are more data points, there are more Gaussians, and the sum is the density Histogram of the sampled data belonging to class l Kernel ML estimate of a Gaussian density density estimate using a Gaussian kernel

Non-parametric estimate of densities: Kernel density estimate green red blue

Classification with missing data Suppose that we have built a Bayesian classifier and are now given a new data point to classify, but that this new data point is missing some of the “features” that we normally expect to see. In the example below, we have two features (x1 and x2), and four classes. The likelihood function is plotted. Suppose that we are given data point (*,-1) to classify. This data point is missing a value for x1. If we assume the missing value is the average of the previously observed x1, then we would estimate it to be about 1. Assuming that the prior probabilities are equal among the four classes, we classify (1,-1) as class c2. However, c4 is a better choice because when x2=-1, c4 is the most likely class as it has the highest likelihood

Classification with missing data good databad (or missing) data