INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Lecture Slides for.

Slides:



Advertisements
Similar presentations
: INTRODUCTION TO Machine Learning Parametric Methods.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
Model Assessment and Selection
Model Assessment and Selection
Model assessment and cross-validation - overview
Data mining in 1D: curve fitting
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Machine Learning CMPT 726 Simon Fraser University
Model Selection. Agenda Myung, Pitt, & Kim Olsson, Wennerholm, & Lyxzen.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Learning Rong Jin.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Probability theory: (lecture 2 on AMLbook.com)
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
INTRODUCTION TO Machine Learning 3rd Edition
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
CH 5: Multivariate Methods
Maximum Likelihood Estimation
Special Topics In Scientific Computing
Bias and Variance of the Estimator
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Pattern Recognition and Machine Learning
INTRODUCTION TO Machine Learning
Machine Learning” Dr. Alper Özpınar.
A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.
Parametric Methods Berlin Chen, 2005 References:
Parametric Estimation
Presentation transcript:

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

CHAPTER 4: Parametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Parametric Estimation X = { x t } t where x t ~ p (x) Parametric estimation: Assume a form for p (x | θ ) and estimate θ, its sufficient statistics, using X e.g., N ( μ, σ 2 ) where θ = { μ, σ 2 }

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Maximum Likelihood Estimation Likelihood of θ given the sample X l ( θ |X) = p (X | θ ) = ∏ t p (x t | θ ) Log likelihood L( θ |X) = log l ( θ |X) = ∑ t log p (x t | θ ) Maximum likelihood estimator (MLE) θ * = argmax θ L( θ |X)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = p o x (1 – p o ) (1 – x) L (p o |X) = log ∏ t p o x t (1 – p o ) (1 – x t ) MLE: p o = ∑ t x t / N Multinomial: K>2 states, x i in {0,1} P (x 1,x 2,...,x K ) = ∏ i p i x i L(p 1,p 2,...,p K |X) = log ∏ t ∏ i p i x i t MLE: p i = ∑ t x i t / N

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 Gaussian (Normal) Distribution p(x) = N ( μ, σ 2 ) MLE for μ and σ 2 : μ σ

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Bias and Variance Unknown parameter θ Estimator d i = d (X i ) on sample X i Bias: b θ (d) = E [d] – θ Variance: E [(d–E [d]) 2 ] Mean square error: r (d, θ ) = E [(d– θ ) 2 ] = (E [d] – θ ) 2 + E [(d–E [d]) 2 ] = Bias 2 + Variance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 Bayes’ Estimator Treat θ as a random var with prior p ( θ ) Bayes’ rule: p ( θ |X) = p(X| θ ) p( θ ) / p(X) Full: p(x|X) = ∫ p(x| θ ) p( θ |X) d θ Maximum a Posteriori (MAP): θ MAP = argmax θ p( θ |X) Maximum Likelihood (ML): θ ML = argmax θ p(X| θ ) Bayes’: θ Bayes’ = E[ θ |X] = ∫ θ p( θ |X) d θ

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Bayes’ Estimator: Example x t ~ N ( θ, σ o 2 ) and θ ~ N ( μ, σ 2 ) θ ML = m θ MAP = θ Bayes’ =

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10 Parametric Classification

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Given the sample ML estimates are Discriminant becomes

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Equal variances Single boundary at halfway between means

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Variances are different Two boundaries

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Regression

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Regression: From LogL to Error

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Linear Regression

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Polynomial Regression

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Other Error Measures Square Error: Relative Square Error: Absolute Error: E ( θ |X) = ∑ t |r t – g(x t | θ )| ε -sensitive Error: E ( θ |X) = ∑ t 1(|r t – g(x t | θ )|> ε ) (|r t – g(x t | θ )| – ε )

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Bias and Variance biasvariance noisesquared error

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Estimating Bias and Variance M samples X i ={x t i, r t i }, i=1,...,M are used to fit g i (x), i =1,...,M

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Bias/Variance Dilemma Example: g i (x)=2 has no variance and high bias g i (x)= ∑ t r t i /N has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al., 1992)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 bias variance f gigi g f

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23 Polynomial Regression Best fit “min error”

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Best fit, “elbow”

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25 Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models E’=error on data + λ model complexity Akaike’s information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26 Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter 15)