INTRODUCTION TO Machine Learning

Slides:

Advertisements

Similar presentations

: INTRODUCTION TO Machine Learning Parametric Methods.

Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Pattern Recognition and Machine Learning

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 3rd Edition

Model Assessment and Selection

Model Assessment and Selection

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Model Selection. Agenda Myung, Pitt, & Kim Olsson, Wennerholm, & Lyxzen.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

INTRODUCTION TO Machine Learning 3rd Edition

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Machine Learning 5. Parametric Methods.

Review of statistical modeling and probability theory Alan Moses ML4bio.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 2. Bayesian Decision Theory

LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION

Probability Theory and Parameter Estimation I

Ch3: Model Building through Regression

Parameter Estimation 主講人：虞台文.

CH 5: Multivariate Methods

Maximum Likelihood Estimation

Special Topics In Scientific Computing

Bias and Variance of the Estimator

CHAPTER 3: Bayesian Decision Theory

INTRODUCTION TO Machine Learning

INTRODUCTION TO Machine Learning

Covered only ML estimator

10701 / Machine Learning Today: - Cross validation,

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Recognition and Machine Learning

INTRODUCTION TO Machine Learning

Machine Learning” Dr. Alper Özpınar.

LECTURE 07: BAYESIAN ESTIMATION

A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.

Parametric Methods Berlin Chen, 2005 References:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Parametric Estimation

Test #1 Thursday September 20th

INTRODUCTION TO Machine Learning

INTRODUCTION TO Machine Learning 3rd Edition

Applied Statistics and Probability for Engineers

Presentation transcript:

INTRODUCTION TO Machine Learning Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 4: Parametric Methods

Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation: Assume a form for p (x | θ) and estimate θ, its sufficient statistics, using X e.g., N ( μ, σ2) where θ = { μ, σ2} Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Maximum Likelihood Estimation Likelihood of θ given the sample X l (θ|X) = p (X |θ) = ∏t p (xt|θ) Log likelihood L(θ|X) = log l (θ|X) = ∑t log p (xt|θ) Maximum likelihood estimator (MLE) θ* = argmaxθ L(θ|X) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = pox (1 – po ) (1 – x) L (po|X) = log ∏t poxt (1 – po ) (1 – xt) MLE: po = ∑t xt / N Multinomial: K>2 states, xi in {0,1} P (x1,x2,...,xK) = ∏i pixi L(p1,p2,...,pK|X) = log ∏t ∏i pixit MLE: pi = ∑t xit / N Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Gaussian (Normal) Distribution p(x) = N ( μ, σ2) MLE for μ and σ2: μ σ Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bias and Variance Unknown parameter θ Estimator di = d (Xi) on sample Xi Bias: bθ(d) = E [d] – θ Variance: E [(d–E [d])2] Mean square error: r (d,θ) = E [(d–θ)2] = (E [d] – θ)2 + E [(d–E [d])2] = Bias2 + Variance Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X) Maximum Likelihood (ML): θML = argmaxθ p(X|θ) Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayes’ Estimator: Example xt ~ N (θ, σo2) and θ ~ N ( μ, σ2) θML = m θMAP = θBayes’ = Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Parametric Classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Given the sample ML estimates are Discriminant becomes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Equal variances Single boundary at halfway between means Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Variances are different Two boundaries Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Regression: From LogL to Error Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Linear Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Polynomial Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Other Error Measures Square Error: Relative Square Error: Absolute Error: E (θ|X) = ∑t |rt – g(xt|θ)| ε-sensitive Error: E (θ|X) = ∑ t 1(|rt – g(xt|θ)|>ε) (|rt – g(xt|θ)| – ε) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bias and Variance noise squared error bias variance Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Estimating Bias and Variance M samples Xi={xti , rti}, i=1,...,M are used to fit gi (x), i =1,...,M Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bias/Variance Dilemma Example: gi(x)=2 has no variance and high bias gi(x)= ∑t rti/N has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al., 1992) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

f f bias gi g variance Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Polynomial Regression Best fit “min error” Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Best fit, “elbow” Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models E’=error on data + λ model complexity Akaike’s information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter 15) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)