Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials. Comments and corrections gratefully received.http://www.cs.cmu.edu/~awm/tutorials

Gaussian Bayes Classifiers: Slide 2 Copyright © 2001, Andrew W. Moore Maximum Likelihood learning of Gaussians for Classification Why we should care 3 seconds to teach you a new learning algorithm What if there are 10,000 dimensions? What if there are categorical inputs? Examples “out the wazoo”

Gaussian Bayes Classifiers: Slide 3 Copyright © 2001, Andrew W. Moore Why we should care One of the original “Data Mining” algorithms Very simple and effective Demonstrates the usefulness of our earlier groundwork

Gaussian Bayes Classifiers: Slide 4 Copyright © 2001, Andrew W. Moore Where we were at the end of the MLE lecture… Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Joint BC Naïve BC Joint DE Naïve DE Gauss DE

Gaussian Bayes Classifiers: Slide 5 Copyright © 2001, Andrew W. Moore This lecture… Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC

Gaussian Bayes Classifiers: Slide 7 Copyright © 2001, Andrew W. Moore Road Map Probability PDFs Gaussians MLE of Gaussians MLE Density Estimation Bayes Classifiers Gaussian Bayes Classifiers Decision Trees

Gaussian Bayes Classifiers: Slide 8 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifier Assumption The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated?

Gaussian Bayes Classifiers: Slide 9 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? |DB i | p i mle = ------ |DB| Let DB i = Subset of database DB in which the output class is y = i

Gaussian Bayes Classifiers: Slide 10 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? (  i mle,  i mle )= MLE Gaussian for DB i Let DB i = Subset of database DB in which the output class is y = i

Gaussian Bayes Classifiers: Slide 11 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? (  i mle,  i mle )= MLE Gaussian for DB i Let DB i = Subset of database DB in which the output class is y = i

Gaussian Bayes Classifiers: Slide 21 Copyright © 2001, Andrew W. Moore age, hours  wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape”

Gaussian Bayes Classifiers: Slide 22 Copyright © 2001, Andrew W. Moore age, hours  wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape”

Gaussian Bayes Classifiers: Slide 30 Copyright © 2001, Andrew W. Moore An “MPG” example Things to note: Class Boundaries can be weird shapes (hyperconic sections) Class regions can be non-simply- connected But it’s impossible to model arbitrarily weirdly shaped regions Test your understanding: With one input, must classes be simply connected?

Gaussian Bayes Classifiers: Slide 31 Copyright © 2001, Andrew W. Moore Overfitting dangers Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit.

Gaussian Bayes Classifiers: Slide 32 Copyright © 2001, Andrew W. Moore Overfitting dangers Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit. Problemette with Gaussian Bayes classifier: #parameters quadratic with #dimensions. With 10,000 dimensions and only 1,000 datapoints we could overfit. Question: Any suggested solutions?

Gaussian Bayes Classifiers: Slide 39 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree BC Here??? Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC

Gaussian Bayes Classifiers: Slide 40 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree BC Here??? Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Easy! Guess how?

Gaussian Bayes Classifiers: Slide 41 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Gauss/Joint BC Gauss Naïve BC Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Gauss/Joint DE Gauss Naïve DE

Gaussian Bayes Classifiers: Slide 42 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Gauss/Joint BC Gauss Naïve BC Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Gauss/Joint DE Gauss Naïve DE

Gaussian Bayes Classifiers: Slide 43 Copyright © 2001, Andrew W. Moore Mixed Categorical / Real Density Estimation Write x = (u,v) = (u 1,u 2,… u q,v 1,v 2 … v m-q ) Real valuedCategorical valued P(x |M)= P(u,v |M) (where M is any Density Estimation Model)

Gaussian Bayes Classifiers: Slide 44 Copyright © 2001, Andrew W. Moore Not sure which tasty DE to enjoy? Try our… Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) Gaussian with parameters depending on v Big “m-q”-dimensional lookup table

Gaussian Bayes Classifiers: Slide 45 Copyright © 2001, Andrew W. Moore MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) vv = Mean of u among records matching v vv = Cov. of u among records matching v qvqv = Fraction of records that match v u |v,M ~ N(  v,  v ), P(v |M) = q v

Gaussian Bayes Classifiers: Slide 46 Copyright © 2001, Andrew W. Moore MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) vv = Mean of u among records matching v = vv = Cov. of u among records matching v = qvqv = Fraction of records that match v = u |v,M ~ N(  v,  v ), P(v |M) = q v R v = # records that match v

Gaussian Bayes Classifiers: Slide 47 Copyright © 2001, Andrew W. Moore Gender and Hours Worked* *As with all the results from the UCI “adult census” dataset, we can’t draw any real-world conclusions since it’s such a non-real-world sample

Gaussian Bayes Classifiers: Slide 51 Copyright © 2001, Andrew W. Moore Joint / Gauss BC Combo Rather so-so-notation for “Gaussian with mean  i,v and covariance  i,v evaluated at u”  i,v = Mean of u among records matching v and in which y=i  i,v = Cov. of u among records matching v and in which y=i q i,v = Fraction of “y=i” records that match v pipi = Fraction of records that match “y=i”

Gaussian Bayes Classifiers: Slide 54 Copyright © 2001, Andrew W. Moore Joint / Gauss DE Combo and Joint / Gauss BC Combo: The downside (Yawn…we’ve done this before…) More than a few categorical attributes blah blah blah massive table blah blah lots of parameters blah blah just memorize training data blah blah blah do worse on future data blah blah need to be more conservative blah

Gaussian Bayes Classifiers: Slide 59 Copyright © 2001, Andrew W. Moore Naïve / Gauss BC  ij = Mean of u j among records in which y=i  2 ij = Var. of u j among records in which y=i q ij [h]= Fraction of “y=i” records in which v j = h pipi = Fraction of records that match “y=i”

Gaussian Bayes Classifiers: Slide 65 Copyright © 2001, Andrew W. Moore What you should know A lot of this should have just been a corollary of what you already knew Turning Gaussian DEs into Gaussian BCs Mixing Categorical and Real-Valued

Gaussian Bayes Classifiers: Slide 66 Copyright © 2001, Andrew W. Moore Questions to Ponder Suppose you wanted to create an example dataset where a BC involving Gaussians crushed decision trees like a bug. What would you do? Could you combine Decision Trees and Bayes Classifiers? How? (maybe there is more than one possible way)

Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Similar presentations

Presentation on theme: "Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Similar presentations

Presentation on theme: "Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie."— Presentation transcript:

Similar presentations

About project

Feedback