Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.

Slides:

Advertisements

Similar presentations

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

Advertisements

Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Bayesian Decision Theory

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Chapter 4: Linear Models for Classification

What is Statistical Modeling

Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.

Machine Learning CMPT 726 Simon Fraser University

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Bayesian Decision Theory Making Decisions Under uncertainty 1.

PATTERN RECOGNITION AND MACHINE LEARNING

Institute of Systems and Robotics ISR – Coimbra Mobile Robotics Lab Bayesian Approaches 1 1 jrett.

Principles of Pattern Recognition

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.

Lecture 2: Bayesian Decision Theory 1. Diagram and formulation

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.

Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory Statistical Decision Making.

Optimal Bayes Classification

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.

Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Lecture 2. Bayesian Decision Theory

Basic Technical Concepts in Machine Learning

Chapter 3: Maximum-Likelihood Parameter Estimation

Special Topics In Scientific Computing

LECTURE 03: DECISION SURFACES

Machine learning, pattern recognition and statistical data modelling

CS668: Pattern Recognition Ch 1: Introduction

Special Topics In Scientific Computing

Bias and Variance of the Estimator

Pattern Recognition PhD Course.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Advanced Pattern Recognition

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Recognition and Machine Learning

LECTURE 23: INFORMATION THEORY REVIEW

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Multivariate Methods Berlin Chen

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Multivariate Methods Berlin Chen, 2005 References:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Bayesian Decision Theory

Presentation transcript:

Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory

Ch1. Pattern Recognition & Machine Learning Automatically discover regularities in data. Based on regularities classify data into categories Example:

Pattern Recognition & Machine Learning Supervised Learning: Classification Regression Unsupervised Learning: Clustering Density Estimation Dimensionality reduction

Ch1. Pattern Recognition & Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory

Supervised Learning Supervised Learning: Training set x = {x1, x2, …, xN} Class or target vector t = {t1, t2, …, tN} Find a function y(x) that takes a vector x and outputs a class t. {(x,t)} y(x)

Supervised Learning Medical example: X = {patient1, patient2, …. patientN} patient1 = (high pressure, normal temp., high glucose,…) t1 = cancer patient2 = (low pressure, normal temp., normal glucose,…) t1 = not cancer new patient = (low pressure, low temp., normal glucose,…) t = ?

Ch1. Pattern Recognition & Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory

Pattern Recognition & Machine Learning Two problems in supervised learning: 1.Overfitting – Underfitting 2.Curse of Dimensionality

Pattern Recognition & Machine Learning Example Polynomial Fitting (Regression):

Pattern Recognition & Machine Learning Minimize Error:

Pattern Recognition & Machine Learning If our function is linear: y(x,w) = w0 + w1x + w2x 2 + … + wMX M Minimize error: E(w) = ½ Σ n {y(xn,w) – tn) 2 What happens as we vary M?

Underfitting

Overfitting

Pattern Recognition & Machine Learning Root Mean Square Error (RSM)

Regularization One solution: Regularization Penalize for models that are too complex: Minimize error: E(w) = ½ Σn {y(xn,w) – tn)2 + λ/2 ||w|| 2

Pattern Recognition & Machine Learning Two problems in supervised learning: 1.Overfitting – Underfitting 2.Curse of Dimensionality

Curse of Dimensionality Example: Classify vector x in one of 3 classes:

Curse of Dimensionaliy Solution: Divide space into cells:

Curse of Dimensionality But if the no. of dimensions is high we need to take a huge amount of space to look at the ”neighborhood” of x.

Introduction Supervised learning Problems in supervised learning Bayesian decision theory

Decision Theory State of nature. Let C denote the state of nature. C is a random variable. (e.g., C = C1 for sea bass or C = C2 for salmon ) A priori probabilities. Let P(C1) and P(C2) denote the a priori probability of C1 and C2 respectively. We know P(C1) + P(C2) = 1. Decision rule. Decide C1 if P(C1) > P(C2); otherwise choose C2.

Basic Concepts Class-conditional probability density function. Let x be a continuous random variable. p(x|C) is the probability density for x given the state of nature C. For example, what is the probability of lightness given that the class is salmon? p(lightness | salmon) ? Or what is the probability of lightness given sea bass? P(lightness | sea bass) ?

Figure 2.1

Bayes Formula How do we combine a priori and class-conditional Probabilities to know the probability of a state of nature? Bayes Formula. P(Cj | x) = p(x|Cj) P(Cj) / p(x) posterior probability likelihood evidence prior probability Bayes Decision: Choose C1 if P(C1|x) > P(C2|x); otherwise choose C2.

Figure 2.2

Minimizing Error What is the probability of error? P(error | x) = P(C1|x) if we decide C2 P(C2|x) if we decide C1 Does Bayes rule minimize the probability of error? P(error) = ∫ P(error,x) dx = ∫ P(error|x) p(x) dx and if for every x we minimize the error then P(error|x) is as small as it can be. Answer is “yes”.

Simplifying Bayes Rule Bayes Formula. P(Cj | x) = p(x|Cj) P(Cj) / p(x) The evidence p(x) is the same for all states or classes so we can dispense with it to make a decision. Rule: Choose C1 if p(x|C1)P(C1) > p(x|C2)P(C2); otherwise decide C2 If p(x|C1) = p(x|C2) then decision depends on the priors If P(C1) = P(C2) then decision depends on the likelihoods.

Loss Function Let {C1,C2, …, Cc} be the possible states of nature. Let {a1, a2, …, ak} be the possible actions. Loss function λ(ai|Cj) is the loss incurred for taking action ai when the state of nature is Cj. Expected loss R(ai|x) = Σj λ(ai|Cj) P(Cj|x) Decision: Select the action that minimizes the conditional risk ( ** best possible performance ** )

Zero-One Loss We will normally be concerned with the symmetrical or zero-one loss function: λ (ai|Cj) = 0 if i = j 1 if i = j In this case the conditional risk is: R(ai|x) = Σj λ(ai|Cj) P(Cj|x) = 1 – P(Ci|x)