Download presentation
Presentation is loading. Please wait.
1
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.
2
Classification and Regression Classification Goal: Learn the underlying function f: X (features) Y (class, or category) e.g. words “spam”, or “not spam” Regression f: X (features) Y (continuous values) e.g. GPA salary
3
Supervised Classification How to find an unknown function f: X Y (features class) or equivalently P(Y|X) Classifier: 1. Find P(X|Y), P(Y), and use Bayes rule - generative 2. Find P(Y|X) directly - discriminative
4
Classification Learn P(Y|X) 1. Bayes rule: P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y) Learn P(X|Y), P(Y) “Generative” classifier 2. Learn P(Y|X) directly “ Discriminative ” (to be covered later in class) e.g. logistic regression
5
Generative Classifier: Bayes Classifier Learn P(X|Y), P(Y) e.g. email classification problem 3 classes for Y = { spam, not spam, maybe } 10,000 binary features for X = {“Cash”, “Rolex”,…} How many parameters do we have? P(Y) : P(X|Y) :
6
Generative learning: Naïve Bayes Introduce conditional independence P(X 1,X 2 |Y) = P(X 1 |Y) P(X 2 |Y) P(Y|X) = P(X|Y) P(Y) / P(X) for X=(X i,…,X n ) = P(X 1 |Y)…P(X n |Y) P(Y) / P(X) = prod i P(X i |Y) P(Y) / P(X) Learn P(X 1 |Y), … P(X n |Y), P(Y) instead of learning P(X 1,…, X n |Y) directly
7
Naïve Bayes 3 classes for Y = {spam, not spam, maybe} 10,000 binary features for X = {“Cash”,”Rolex”,…} Now, how many parameters? P(Y) P(X|Y) fewer parameters “simpler” – less likely to overfit
8
Full Bayes vs. Naïve Bayes XOR X1X2Y 101 011 110 000 P(Y=1|(X1,X2)=(0,1))=? Full Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=? Naïve Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=?
9
Regression Prediction of continuous variables e.g. I want to predict salaries from GPA. I can regress that … Learn the mapping f: X Y Model is linear in the parameters (+ some noise) linear regression Assume Gaussian noise Learn MLE Θ
10
1-parameter linear regression Normal linear regression or equivalently, MLE Θ ? MLE σ 2 ?
11
Multivariate linear regression What if the inputs are vectors? Write matrix X and Y : (n data points, k features for each data) MLE Θ =
12
Constant term? We may expect linear data that does not go through the origin Trick?
13
The constant term
14
Regression: another example Assume the following model to fit the data. The model has one unknown parameter θ to be learned from data. A maximum likelihood estimation of θ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.