Download presentation
Presentation is loading. Please wait.
Published byAshlee Lester Modified over 9 years ago
1
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1
2
Content Saskia Klein & Steffen Bollmann 2 Recap from last weak Bayesian Linear Regression What is linear regression? Application of the Bayesian Theory on Linear Regression Example Comparison to Conventional Linear Regression Bayesian Logistic Regression Naive Bayes classifier Source: Bishop (ch. 3,4); Barber (ch. 10)
3
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
4
Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
5
Linear Regression Saskia Klein & Steffen Bollmann 5
6
Linear Regression Saskia Klein & Steffen Bollmann 6
7
Examples of linear regression models Saskia Klein & Steffen Bollmann 7
8
Bayesian Linear Regression Saskia Klein & Steffen Bollmann 8
9
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
10
Bayesian Linear Regression - Likelihood Saskia Klein & Steffen Bollmann 10
11
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
12
Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
13
Bayesian Linear Regression - Prior Saskia Klein & Steffen Bollmann 13
14
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
15
Bayesian Linear Regression – Posterior Distribution Saskia Klein & Steffen Bollmann 15
16
Example Linear Regression Saskia Klein & Steffen Bollmann 16 matlab
17
Predictive Distribution Saskia Klein & Steffen Bollmann 17
18
Common Problem in Linear Regression: Overfitting/model complexitiy Saskia Klein & Steffen Bollmann 18 Least Squares approach (maximizing the likelihood): point estimate of the weights Regularization: regularization term and value needs to be chosen Cross-Validation: requires large datasets and high computational power Bayesian approach: distribution of the weights good prior model comparison: computationally demanding, validation data not required
19
From Regression to Classification Saskia Klein & Steffen Bollmann 19
20
Classification Saskia Klein & Steffen Bollmann 20 decision boundary
21
Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 21
22
Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 22
23
Example Saskia Klein & Steffen Bollmann 23 Barber: DemosExercises\demoBayesLogRegression.m
24
Example Saskia Klein & Steffen Bollmann 24 Barber: DemosExercises\demoBayesLogRegression.m
25
Naive Bayes classifier Saskia Klein & Steffen Bollmann 25 Why naive? strong independence assumptions assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable Ignores relation between features and assumes that all feature contribute independently to a class [http://en.wikipedia.org/wiki/Naive_Bayes_classifier]
26
Saskia Klein & Steffen Bollmann Thank you for your attention 26
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.