CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

Slides:

Advertisements

Similar presentations

: INTRODUCTION TO Machine Learning Parametric Methods.

Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Pattern Recognition and Machine Learning

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

INTRODUCTION TO Machine Learning 3rd Edition

Chapter 4: Linear Models for Classification

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Classification and risk prediction

Machine Learning CMPT 726 Simon Fraser University

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Principles of Pattern Recognition

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

INTRODUCTION TO Machine Learning 3rd Edition

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Machine Learning 5. Parametric Methods.

Review of statistical modeling and probability theory Alan Moses ML4bio.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

CS479/679 Pattern Recognition Dr. George Bebis

Chapter 3: Maximum-Likelihood Parameter Estimation

Probability Theory and Parameter Estimation I

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Ch3: Model Building through Regression

Parameter Estimation 主講人：虞台文.

CH 5: Multivariate Methods

Maximum Likelihood Estimation

Special Topics In Scientific Computing

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

INTRODUCTION TO Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Recognition and Machine Learning

INTRODUCTION TO Machine Learning

LECTURE 07: BAYESIAN ESTIMATION

A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Presentation transcript:

CHAPTER 4: Parametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given X = { x t } t goal: infer probability distribution p(x) Parametric estimation: Assume a form for p (x | θ ) and estimate θ, its sufficient statistics, using X e.g., N ( μ, σ 2 ) where θ = { μ, σ 2 } Problem: How can we obtain θ from X? Assumption: X contains samples of a one- dimensional random variable Later multivariate estimation: X contains multiple and not only a single measurement. Example; Gaussian Distribution

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Maximum Likelihood Estimation Density function p with parameters θ is given and x t ~p (X | θ ) Likelihood of θ given the sample X l ( θ |X) = p (X | θ ) = ∏ t p (x t | θ ) We look θ for that “maximizes the likelihood of the sample”! Log likelihood L( θ |X) = log l ( θ |X) = ∑ t log p (x t | θ ) Maximum likelihood estimator (MLE) θ * = argmax θ L( θ |X) Homework: Sample: 0, 3, 3, 4, 5 and x~N( ,  )? Use MLE to find( ,  )!

4 Bayes’ Estimator Treat θ as a random var with prior p ( θ ) Bayes’ rule: p ( θ |X) = p(X| θ ) * p( θ ) / p(X) Maximum a Posteriori (MAP): θ MAP = argmax θ p( θ |X) Maximum Likelihood (ML): θ ML = argmax θ p(X| θ ) Bayes’ Estimator: θ Bayes’ = E[ θ |X] = ∫ θ p( θ |X) d θ Comments: ML just takes the maximum value of the density function Compared with ML, MAP additionally considers priors Bayes’ estimator averages over all possible values of θ which are weighted by their likelihood to occur (which is measured by a probability distribution p( θ )). For MAP see: For comparison see:

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Parametric Classification kind of p(C i |x)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 Parametric Classification kind of p(C i |x) Using Bayes Theorem P(C 1 |x)=P(C 1 )xP(x|C 1 )/P(x) P(C 2 |x)=P(C 2 )xP(x|C 2 )/P(x) As P(x) is the same in both formulas, we can drop it! Data ML/MAP P(x|C i )

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Given the sample ML estimates are Discriminant becomes

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 Equal variances Single boundary at halfway between means

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Variances are different Two boundaries Homework!

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10 Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models E’=error on data + λ model complexity Akaike’s information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM) Remark: will be discussed in more depth later: Topic 11

CHAPTER 5: Multivariate Methods Normal Distribution: Z-score: see

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Multivariate Data Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Multivariate Parameters Correlation: Example:

14 Parameter Estimation

15 Multivariate Normal Distribution Mahalanobis distance between x and  (5.9)

16 Mahalanobis Distance Mahalanobis distance between x and  The Mahalanobis distance is based on correlations between variables by which different patterns can be identified and analyzed. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant. correlationsEuclidean distancedata setscale-invariant

Z-score: see 17 Multivariate Normal Distribution Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations) Bivariate: d = 2 Remark:  is the correlation between the two variables Called z-score zi for xi

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Bivariate Normal

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Model Selection As we increase complexity (less restricted S), bias decreases and variance increases Assume simple models (allow some bias) to control variance (regularization) AssumptionCovariance matrixNo of parameters Shared, HypersphericSi=S=s2ISi=S=s2I1 Shared, Axis-alignedS i =S, with s ij =0d Shared, HyperellipsoidalSi=SSi=Sd(d+1)/2 Different, Hyperellipsoidal SiSi K d(d+1)/2