Download presentation
Presentation is loading. Please wait.
1
3(+1) classifiers from the Bayesian world
21/03/2017
2
Bayes classifier Bayes decision theory Bayes classifier
P(j | x) = P(x | j ) · P (j ) / P(x) Discriminant function in the case of normal likelihood Parameter estimation: Form of the density is known The density is defined by a few parameters Estimate the parameters from data
3
Example ? age debit income leave? <21 none < 50K yes 21-50
50K-200K 50< no 200K< ?
4
Naϊve Bayes
5
Naϊve Bayes The Naive Bayes classifier is a Bayes classifier where we assume the conditional independence of the features:
6
Naϊve Bayes Two-category case
x = [x1, x2, …, xd ]t where each xi is binary and pi = P(xi = 1 | 1) qi = P(xi = 1 | 2)
8
Training Naive Bayes - MLE
Goal: estimate pi = P(xi = 1 | 1 ) and qi = P(xi = 1 | 2 ) based on N training samples assume that pi and qi are binomial Maximum Likelihood estimation is:
9
Training Naive Bayes – Bayes estimation
Beta distribution X ~ Beta(a,b) E [X]=1/(1+b/a)
10
Training Naive Bayes – Bayes estimation
assume that pi and qi are binomial we use Beta distribution to represent uncertainty of the estimation … 2 steps of Bayes estimation …
11
Training Naive Bayes – Bayes estimation (m-estimate)
in practice: avoding 0 likelihood/posteriori m and p are constants (metaparameters) p is the prior guess for each pi m is the „equivalent sample size”
12
Naϊve Bayes in practice
not so naive fast, easily distributable low memory good choice if there are many features and potentially each feature can contribute to the solution
13
Example ? P() age debit income leave? <21 none < 50K yes 21-50
50K-200K 50< no 200K< ? P(age>50| =yes) = (0+mp) / 2+m P(none debit| =yes) P(200K<income| =yes)
14
Generative vs. Discriminative Classifiers
Modeling the data belonging to each class, i.e. how they are generated Bayes: likelihood P(x | j ) and apriori P(j ) are estimated Discriminative: Goal is the discrimination of classes Bayes: direct estimation of the posteriori P(j | x) x1 x2 x3 x1 x2 x3
15
Logistic Regression (Maximum Entropy Classifier)
Two-category case: Training (MLE):
16
Non-parametric Bayes classifiers
17
Non-parametric estimation of densities
17 Non-parametric estimation of densities Non-parametric estimation techniques do not assume the form the density Bayes classifier: non-parametric estimation of likelihood P(x | j ), i.e. generative or directly the posteriori P(j | x), i.e. discriminative
18
Non-parametric estimation
18 Non-parametric estimation estimate p(x) Probability that a vector x will fall in region R is: P is a smoothed (or averaged) version of the density function p(x) if we have a sample of size n; therefore, the expected value that k points fall in R is then: k = nP Pattern Classification, Chapter 2 (Part 1)
19
Non-parametric estimation
19 Non-parametric estimation applying MLE for P: p(x) is continuous and that the region R is so small that p does not vary significantly within it: where is a point within R and V the volume enclosed by R.
20
Convergence of non-parametric estimations
20 Convergence of non-parametric estimations The volume V needs to approach 0 anyway if we want to use this estimation Practically, V cannot be allowed to become small since the number of samples is always limited
21
Convergence of non-parametric estimations
21 Three necessary conditions should apply if we want pn(x) to converge to p(x):
22
22
23
Parzen windows the volume and the form of R is fixed V is constant (n is constant) p(x) is estimated by the count down of points of the training sample in R around x =k
24
Parzen windows- hypercube
24 Parzen windows- hypercube R is a d-dimensional hypercube ((x-xi)/hn) is equal to unity if xi falls within the hypercube of volume Vn centered at x and equal to zero otherwise. ( is called a kernel)
25
Parzen windows- hypercube
25 Parzen windows- hypercube number of samples in this hypercube:
26
Generalized Parzen Windows
pn(x) estimates p(x) like the average of a distance between x and (xi) (i = 1,… ,n) samples can be any function between x and xi
27
Parzen windows - example
27 Parzen windows - example p(x) ~ N(0,1) (u) = (1/(2) exp(-u2/2) and hn = h1/n (n>1)
28
28
29
29
30
30
31
31
32
32 p(x) ?
33
33 real generator: p(x) = 1U(a,b) + 2T(c,d) (mixture of an uniform and a triangular density)
34
Parzen windows as classifiers
34 Parzen Windows are used for modelling/estimation of the multidimensional likelihood Generative classifier The decision surface/regions are highly depend on the kernel and kernel length
35
35
36
Example ? P() age debit income leave? <21 none < 50K yes 21-50
50K-200K 50< no 200K< ? P(age>50, none debit, 200K<income | =yes) = ? can be the number of feature where tha values of x and xi are different
37
k nearest neighbor estimation
37 k nearest neighbor estimation a solution for the problem of the unknown “best” window function: Let the cell volume be a function of the training data Center a cell about x and let it grows until it captures kn samples (kn = f(n)) kn are called the kn nearest-neighbors of x 2 lehetőség van: Nagy a sűrűség x közelében; ekkor a cella kicsi lesz, és így a felbontás jó lesz Sűrűség kicsi; ekkor a cella nagyra fog nőni, és akkor áll le, amikor nagy sűrűségű tartományt ér el A becslések egy családját kaphatjuk a kn=k1/n választással, a k1 különböző választásai mellett
38
38 © Ethem Alpaydin: Introduction to Machine Learning. 2nd edition (2010)
39
k nearest neighbor classifier (knn)
39 P(i | x) direct estimation form n training samples take the smallest R around x which includes k samples out of n if ki out of k is labeled by i : pn(x, i) = ki /(nV)
40
k nearest neighbor classifier (knn)
40 ki /k is the fraction of the samples within the cell that are labeled i For minimum error rate, the most frequently represented category within the cell is selected If k is large and the cell sufficiently small, the performance will approach the best possible
41
41
42
Példa ? age debit income leave? <21 none < 50K yes 21-50
50K-200K 50< no 200K< ? k=3 Distance metric = how many features are different
43
Non-parametric classifiers
They HAVE got parameters! Non-parametric classifiers are Bayes classifiers which use non-parametric denstiy estimation approaches Parzen-windows classifier kernel and h length genarative k nearest neighbor classifier distance metric and k discriminative
44
about distance metrics
46
Bayes classifiers in practice
Summary Bayes classifiers in practice Parametric Non-parametric Generative (estimation of the likelihood) Naive Bayes Parzen windows classifier Discriminative (direct estimation of the Posteriori) Logistic Regression k nearest neighbor classifier
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.