Mathematical Foundations of BME

580.704 Mathematical Foundations of BME
Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data

Bayesian classification
Suppose we wish to classify vector x as belonging to a class: {1,…,L}. We are given labeled data and need to form a classification function: Likelihood prior Classify x into the class l that maximizes the posterior probability. marginal

Classification when distributions have equal variance
Suppose we wish to classify a person as male or female based on height. What we have: What we want: Assume equal probability of being male or female: female male 160 180 200 0.01 0.02 0.03 0.04 160 180 200 0.005 0.01 0.015 0.02 160 180 200 0.005 0.01 0.015 0.02 0.025 0.03 0.035 Note that the two densities have equal variance

Classification when distributions have equal variance
160 180 200 0.005 0.01 0.015 0.02 160 180 200 -4 -2 2 4

Estimating the decision boundary between data of equal variance
Suppose the distributions for the data in each class is a Gaussian. The decision boundary between any two classes is where the log of the ratio is zero. If the data in each class has a Gaussian density with equal variance, then the boundary between any two classes is a line.

Estimating the decision boundary from estimated densities
From the data we can get an ML estimate of Gaussian parameters Class 2 Class 1 Each log ratio gives us a line, for a total of 3 lines. The winning class for each region is the class that has the largest numerator in the posterior probability ratio. Class 3

Relationship between Bayesian classification and Fischer discriminant
If we have two classes, class -1 and class +1, then the decision boundary is at 0: For the Bayesian classifier, under assumption of equal variance, the decision boundary is at: The Fischer decision boundary is the same as the Bayesian when the two classes have equal variance and equal prior probability.

Classification when distributions have unequal variance
What we have: Classification: Assume: 160 180 200 0.005 0.01 0.015 0.02 0.025 160 180 200 0.005 0.01 0.015 0.02 0.025 0.03 0.035 160 180 200 0.2 0.4 0.6 0.8 1 140 160 180 200 0.05 0.1 0.15 0.2 0.25

160 180 200 0.005 0.01 0.015 0.02 0.025 160 180 200 -12 -10 -8 -6 -4 -2

Quadratic discriminant: when data comes from unequal variance Gaussians
green red The decision boundary between any two classes is where the log of the ratio is zero. If the data in each class has a Gaussian density with unequal variance, then the boundary between any two classes is a quadratic function of x. blue

Non-parametric estimate of densities: Kernel density estimate
-20 -10 10 20 2 4 6 8 Suppose we have points x(i) that belong to class l. Suppose we can’t assume that these points come from a Gaussian distribution. To estimate the density, we need to form a function that assigns a weight to each point x in our space, with the integral of this function equal to 1. It seems that the more data points x(i) we find around x, the more the weight of x should be. The kernel density estimate puts a Gaussian centered at each data point. Where there are more data points, there are more Gaussians, and the sum is the density. Histogram of the sampled data belonging to class l -20 -10 10 20 0.02 0.04 0.06 0.08 ML estimate of a Gaussian density -20 -10 10 20 0.02 0.04 0.06 density estimate using a Gaussian kernel Kernel

Non-parametric estimate of densities: Kernel density estimate
green red blue

Classification with missing data
Suppose that we have built a Bayesian classifier and are now given a new data point to classify, but that this new data point is missing some of the “features” that we normally expect to see. In the example below, we have two features (x1 and x2), and four classes. The likelihood function is plotted. -4 -2 2 4 6 8 -3 -1 1 3 Suppose that we are given data point (*,-1) to classify. This data point is missing a value for x1. If we assume the missing value is the average of the previously observed x1, then we would estimate it to be about 1. Assuming that the prior probabilities are equal among the four classes, we classify (1,-1) as class c2. However, c4 is a better choice because when x2=-1, c4 is the most likely class as it has the highest likelihood.

Classification with missing data
good data bad (or missing) data

Mathematical Foundations of BME

Similar presentations

Presentation on theme: "Mathematical Foundations of BME"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mathematical Foundations of BME

Similar presentations

Presentation on theme: "Mathematical Foundations of BME"— Presentation transcript:

Similar presentations

About project

Feedback