INTRODUCTION TO Machine Learning 3rd Edition

Slides:



Advertisements
Similar presentations
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Advertisements

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Recognition and Machine Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.
Chapter 4: Linear Models for Classification
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Machine Learning CMPT 726 Simon Fraser University
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Thanks to Nir Friedman, HU
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
INTRODUCTION TO Machine Learning 3rd Edition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian classification review Bayesian statistics derive K nearest neighbors (KNN) classifier analysis of 2-way classification results homework assignment.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 2. Bayesian Decision Theory
Special Topics In Scientific Computing
INTRODUCTION TO Machine Learning 3rd Edition
CHAPTER 3: Bayesian Decision Theory
INTRODUCTION TO Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
INTRODUCTION TO Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
INTRODUCTION TO Machine Learning
Machine Learning” Dr. Alper Özpınar.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
INTRODUCTION TO Machine Learning
Linear Discrimination
Foundations 2.
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

INTRODUCTION TO Machine Learning 3rd Edition Lecture Slides for INTRODUCTION TO Machine Learning 3rd Edition ETHEM ALPAYDIN © The MIT Press, 2014 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e Modified by Prof. Carolina Ruiz for CS539 Machine Learning at WPI

CHAPTER 3: Bayesian Decision Theory

Probability and Inference Result of tossing a coin is Î {Heads,Tails} Random variable X Î{1,0}, where 1 = Heads, 0 = tails Bernoulli: P {X= 1} = po P {X= 0} = (1 ‒ po) Sample: X = {xt }Nt =1 Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N Prediction of next toss: Heads if po > ½, Tails otherwise

Classification Example: Credit scoring Inputs are income and savings Output is low-risk vs high-risk Input: x = [x1,x2]T Output: C belongs to {0,1} Prediction:

Bayes’ Rule For the case of 2 classes, C = 0 and C = 1: prior likelihood posterior evidence For the case of 2 classes, C = 0 and C = 1:

Bayes’ Rule: K>2 Classes To classify x:

Losses and Risks Actions: αi : choose class Ci Loss of αi when the actual class is Ck : λik Expected risk (Duda and Hart, 1973)

Losses and Risks: 0/1 Loss For minimum risk, choose the most probable class

Losses and Risks: Misclassification Cost What class Ci to pick or to Reject all classes? Assume: there are K classes there is a loss function: cost of making a misclassification λik : cost of misclassifying an instance as class Ci when it is actually of class Ck there is a “Reject” option (i.e., not to classify an instance in any class) Let the cost of “Reject” be λ. A possible loss function is: For minimum risk, choose most probable class, unless is better to reject

Example: Exercise 4 from Chapter 4 Assume 2 classes: C1 and C2 Case 1: Assume the two misclassifications are equally costly, and there is no reject option: λ11= λ22 = 0, λ12 = λ21 = 1 Case 2: Assume the two misclassifications are not equally costly, and there is no reject option: λ11= λ22 = 0, λ12 = 10, λ21 = 5 Case 3: Like Case 2 but with a reject option: λ11= λ22 = 0, λ12 = 10, λ21 = 5, λ = 1 See optimal decision boundaries on the next slide

Different Losses and Reject See calculations for these plots on solutions to Exercise 4 P(C1|x) P(C2|x) Equal losses Unequal losses classify x as C1 classify x as C2 See the solutions to Exercise 4 on pages 61-62 for the calculations needed for these plots. the boundary shifts toward the class that incurs the most cost when misclassified classify x as C1 classify x as C2 With reject classify x as C1 reject classify x as C2

Discriminant Functions Classification can be seen as implementing a set of discriminant functions gi(x): K decision regions R1,...,RK

K=2 Classes see Chapter 3 Exercises 2 and 3 Some alternative ways of combining discriminant functions g1(x)= P(C1|x) and g2(x)= P(C2|x) into just one g(x): define g(x) = g1(x) – g2(x) In terms of log odds: log[P(C1|x)/P(C2|x)] define In terms of likelihood ratio: P(x|C1)/P(x|C2) if the priors are equal, P(C1) = P(C2), then the discriminant = likelihood ratio