Download presentation
Presentation is loading. Please wait.
Published byDominic Boyd Modified over 9 years ago
1
Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials. Comments and corrections gratefully received.http://www.cs.cmu.edu/~awm/tutorials
2
Gaussian Bayes Classifiers: Slide 2 Copyright © 2001, Andrew W. Moore Maximum Likelihood learning of Gaussians for Classification Why we should care 3 seconds to teach you a new learning algorithm What if there are 10,000 dimensions? What if there are categorical inputs? Examples “out the wazoo”
3
Gaussian Bayes Classifiers: Slide 3 Copyright © 2001, Andrew W. Moore Why we should care One of the original “Data Mining” algorithms Very simple and effective Demonstrates the usefulness of our earlier groundwork
4
Gaussian Bayes Classifiers: Slide 4 Copyright © 2001, Andrew W. Moore Where we were at the end of the MLE lecture… Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Joint BC Naïve BC Joint DE Naïve DE Gauss DE
5
Gaussian Bayes Classifiers: Slide 5 Copyright © 2001, Andrew W. Moore This lecture… Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC
6
Gaussian Bayes Classifiers: Slide 6 Copyright © 2001, Andrew W. Moore Road Map Probability PDFs Gaussians MLE of Gaussians MLE Density Estimation Bayes Classifiers Decision Trees
7
Gaussian Bayes Classifiers: Slide 7 Copyright © 2001, Andrew W. Moore Road Map Probability PDFs Gaussians MLE of Gaussians MLE Density Estimation Bayes Classifiers Gaussian Bayes Classifiers Decision Trees
8
Gaussian Bayes Classifiers: Slide 8 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifier Assumption The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N( i, i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated?
9
Gaussian Bayes Classifiers: Slide 9 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N( i, i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? |DB i | p i mle = ------ |DB| Let DB i = Subset of database DB in which the output class is y = i
10
Gaussian Bayes Classifiers: Slide 10 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N( i, i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? ( i mle, i mle )= MLE Gaussian for DB i Let DB i = Subset of database DB in which the output class is y = i
11
Gaussian Bayes Classifiers: Slide 11 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N( i, i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? ( i mle, i mle )= MLE Gaussian for DB i Let DB i = Subset of database DB in which the output class is y = i
12
Gaussian Bayes Classifiers: Slide 12 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classification
13
Gaussian Bayes Classifiers: Slide 13 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classification How do we deal with that?
14
Gaussian Bayes Classifiers: Slide 14 Copyright © 2001, Andrew W. Moore Here is a dataset 48,000 records, 16 attributes [Kohavi 1995]
15
Gaussian Bayes Classifiers: Slide 15 Copyright © 2001, Andrew W. Moore Predicting wealth from age
16
Gaussian Bayes Classifiers: Slide 16 Copyright © 2001, Andrew W. Moore Predicting wealth from age
17
Gaussian Bayes Classifiers: Slide 17 Copyright © 2001, Andrew W. Moore Wealth from hours worked
18
Gaussian Bayes Classifiers: Slide 18 Copyright © 2001, Andrew W. Moore Wealth from years of education
19
Gaussian Bayes Classifiers: Slide 19 Copyright © 2001, Andrew W. Moore age, hours wealth
20
Gaussian Bayes Classifiers: Slide 20 Copyright © 2001, Andrew W. Moore age, hours wealth
21
Gaussian Bayes Classifiers: Slide 21 Copyright © 2001, Andrew W. Moore age, hours wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape”
22
Gaussian Bayes Classifiers: Slide 22 Copyright © 2001, Andrew W. Moore age, hours wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape”
23
Gaussian Bayes Classifiers: Slide 23 Copyright © 2001, Andrew W. Moore age, edunum wealth
24
Gaussian Bayes Classifiers: Slide 24 Copyright © 2001, Andrew W. Moore age, edunum wealth
25
Gaussian Bayes Classifiers: Slide 25 Copyright © 2001, Andrew W. Moore hours, edunum wealth
26
Gaussian Bayes Classifiers: Slide 26 Copyright © 2001, Andrew W. Moore hours, edunum wealth
27
Gaussian Bayes Classifiers: Slide 27 Copyright © 2001, Andrew W. Moore Accuracy
28
Gaussian Bayes Classifiers: Slide 28 Copyright © 2001, Andrew W. Moore An “MPG” example
29
Gaussian Bayes Classifiers: Slide 29 Copyright © 2001, Andrew W. Moore An “MPG” example
30
Gaussian Bayes Classifiers: Slide 30 Copyright © 2001, Andrew W. Moore An “MPG” example Things to note: Class Boundaries can be weird shapes (hyperconic sections) Class regions can be non-simply- connected But it’s impossible to model arbitrarily weirdly shaped regions Test your understanding: With one input, must classes be simply connected?
31
Gaussian Bayes Classifiers: Slide 31 Copyright © 2001, Andrew W. Moore Overfitting dangers Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit.
32
Gaussian Bayes Classifiers: Slide 32 Copyright © 2001, Andrew W. Moore Overfitting dangers Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit. Problemette with Gaussian Bayes classifier: #parameters quadratic with #dimensions. With 10,000 dimensions and only 1,000 datapoints we could overfit. Question: Any suggested solutions?
33
Gaussian Bayes Classifiers: Slide 33 Copyright © 2001, Andrew W. Moore General: O(m 2 ) parameters
34
Gaussian Bayes Classifiers: Slide 34 Copyright © 2001, Andrew W. Moore General: O(m 2 ) parameters
35
Gaussian Bayes Classifiers: Slide 35 Copyright © 2001, Andrew W. Moore Aligned: O(m) parameters
36
Gaussian Bayes Classifiers: Slide 36 Copyright © 2001, Andrew W. Moore Aligned: O(m) parameters
37
Gaussian Bayes Classifiers: Slide 37 Copyright © 2001, Andrew W. Moore Spherical: O(1) cov parameters
38
Gaussian Bayes Classifiers: Slide 38 Copyright © 2001, Andrew W. Moore Spherical: O(1) cov parameters
39
Gaussian Bayes Classifiers: Slide 39 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree BC Here??? Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC
40
Gaussian Bayes Classifiers: Slide 40 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree BC Here??? Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Easy! Guess how?
41
Gaussian Bayes Classifiers: Slide 41 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Gauss/Joint BC Gauss Naïve BC Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Gauss/Joint DE Gauss Naïve DE
42
Gaussian Bayes Classifiers: Slide 42 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Gauss/Joint BC Gauss Naïve BC Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Gauss/Joint DE Gauss Naïve DE
43
Gaussian Bayes Classifiers: Slide 43 Copyright © 2001, Andrew W. Moore Mixed Categorical / Real Density Estimation Write x = (u,v) = (u 1,u 2,… u q,v 1,v 2 … v m-q ) Real valuedCategorical valued P(x |M)= P(u,v |M) (where M is any Density Estimation Model)
44
Gaussian Bayes Classifiers: Slide 44 Copyright © 2001, Andrew W. Moore Not sure which tasty DE to enjoy? Try our… Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) Gaussian with parameters depending on v Big “m-q”-dimensional lookup table
45
Gaussian Bayes Classifiers: Slide 45 Copyright © 2001, Andrew W. Moore MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) vv = Mean of u among records matching v vv = Cov. of u among records matching v qvqv = Fraction of records that match v u |v,M ~ N( v, v ), P(v |M) = q v
46
Gaussian Bayes Classifiers: Slide 46 Copyright © 2001, Andrew W. Moore MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) vv = Mean of u among records matching v = vv = Cov. of u among records matching v = qvqv = Fraction of records that match v = u |v,M ~ N( v, v ), P(v |M) = q v R v = # records that match v
47
Gaussian Bayes Classifiers: Slide 47 Copyright © 2001, Andrew W. Moore Gender and Hours Worked* *As with all the results from the UCI “adult census” dataset, we can’t draw any real-world conclusions since it’s such a non-real-world sample
48
Gaussian Bayes Classifiers: Slide 48 Copyright © 2001, Andrew W. Moore Joint / Gauss DE Combo What we just did
49
Gaussian Bayes Classifiers: Slide 49 Copyright © 2001, Andrew W. Moore Joint / Gauss BC Combo What we do next
50
Gaussian Bayes Classifiers: Slide 50 Copyright © 2001, Andrew W. Moore Joint / Gauss BC Combo
51
Gaussian Bayes Classifiers: Slide 51 Copyright © 2001, Andrew W. Moore Joint / Gauss BC Combo Rather so-so-notation for “Gaussian with mean i,v and covariance i,v evaluated at u” i,v = Mean of u among records matching v and in which y=i i,v = Cov. of u among records matching v and in which y=i q i,v = Fraction of “y=i” records that match v pipi = Fraction of records that match “y=i”
52
Gaussian Bayes Classifiers: Slide 52 Copyright © 2001, Andrew W. Moore Gender, Hours Wealth
53
Gaussian Bayes Classifiers: Slide 53 Copyright © 2001, Andrew W. Moore Gender, Hours Wealth
54
Gaussian Bayes Classifiers: Slide 54 Copyright © 2001, Andrew W. Moore Joint / Gauss DE Combo and Joint / Gauss BC Combo: The downside (Yawn…we’ve done this before…) More than a few categorical attributes blah blah blah massive table blah blah lots of parameters blah blah just memorize training data blah blah blah do worse on future data blah blah need to be more conservative blah
55
Gaussian Bayes Classifiers: Slide 55 Copyright © 2001, Andrew W. Moore Naïve/Gauss combo for Density Estimation How many parameters? Real Categorical
56
Gaussian Bayes Classifiers: Slide 56 Copyright © 2001, Andrew W. Moore Naïve/Gauss combo for Density Estimation Real Categorical
57
Gaussian Bayes Classifiers: Slide 57 Copyright © 2001, Andrew W. Moore Naïve/Gauss DE Example
58
Gaussian Bayes Classifiers: Slide 58 Copyright © 2001, Andrew W. Moore Naïve/Gauss DE Example
59
Gaussian Bayes Classifiers: Slide 59 Copyright © 2001, Andrew W. Moore Naïve / Gauss BC ij = Mean of u j among records in which y=i 2 ij = Var. of u j among records in which y=i q ij [h]= Fraction of “y=i” records in which v j = h pipi = Fraction of records that match “y=i”
60
Gaussian Bayes Classifiers: Slide 60 Copyright © 2001, Andrew W. Moore Gauss / Naïve BC Example
61
Gaussian Bayes Classifiers: Slide 61 Copyright © 2001, Andrew W. Moore Gauss / Naïve BC Example
62
Gaussian Bayes Classifiers: Slide 62 Copyright © 2001, Andrew W. Moore Learn Wealth from 15 attributes
63
Gaussian Bayes Classifiers: Slide 63 Copyright © 2001, Andrew W. Moore Learn Wealth from 15 attributes Same data, except all real values discretized to 3 levels
64
Gaussian Bayes Classifiers: Slide 64 Copyright © 2001, Andrew W. Moore Learn Race from 15 attributes
65
Gaussian Bayes Classifiers: Slide 65 Copyright © 2001, Andrew W. Moore What you should know A lot of this should have just been a corollary of what you already knew Turning Gaussian DEs into Gaussian BCs Mixing Categorical and Real-Valued
66
Gaussian Bayes Classifiers: Slide 66 Copyright © 2001, Andrew W. Moore Questions to Ponder Suppose you wanted to create an example dataset where a BC involving Gaussians crushed decision trees like a bug. What would you do? Could you combine Decision Trees and Bayes Classifiers? How? (maybe there is more than one possible way)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.