INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI

Slides:

Advertisements

Similar presentations

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Advertisements

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Pattern Recognition and Machine Learning

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 3rd Edition

INTRODUCTION TO Machine Learning 3rd Edition

Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.

Chapter 4: Linear Models for Classification

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.

Machine Learning CMPT 726 Simon Fraser University

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

INTRODUCTION TO Machine Learning 3rd Edition

Bayesian Decision Theory Making Decisions Under uncertainty 1.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian Networks. Male brain wiring Female brain wiring.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Machine Learning 5. Parametric Methods.

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian classification review Bayesian statistics derive K nearest neighbors (KNN) classifier analysis of 2-way classification results homework assignment.

Lecture 2. Bayesian Decision Theory

INTRODUCTION TO Machine Learning 2nd Edition

INTRODUCTION TO Machine Learning 3rd Edition

CHAPTER 3: Bayesian Decision Theory

INTRODUCTION TO Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Recognition and Machine Learning

INTRODUCTION TO Machine Learning

Machine Learning” Dr. Alper Özpınar.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

INTRODUCTION TO Machine Learning

Linear Discrimination

INTRODUCTION TO Machine Learning 3rd Edition

Presentation transcript:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI Lecture Slides for

CHAPTER 3: BAYESIAN DECISION THEORY

Probability and Inference 3  Result of tossing a coin is  {Heads,Tails}  Random variable X  {1,0}, where 1 = Heads, 0 = tails Bernoulli: P {X= 1} = p o P {X= 0} = (1 ‒ p o )  Sample: X = {x t } N t =1 Estimation: p o = # {Heads}/#{Tosses} = ∑ t x t / N  Prediction of next toss: Heads if p o > ½, Tails otherwise

Classification  Example: Credit scoring  Inputs are income and savings.  Output is low-risk vs high-risk  Input: x = [x 1,x 2 ] T Output: C belongs to {0,1}  Prediction: 4

Bayes’ Rule 5 posterior likelihoodprior evidence For the case of 2 classes, C = 0 and C = 1:

Bayes’ Rule: K>2 Classes 6

Losses and Risks  Actions: α i  Loss of α i when the state is C k : λ ik  Expected risk (Duda and Hart, 1973) 7

Losses and Risks: 0/1 Loss 8 For minimum risk, choose the most probable class

Losses and Risks: Misclassification Cost What class C i to pick or to Reject all classes? 9 Assume: there are K classes there is a loss function: cost of making a misclassification λ ik : cost of misclassifying an instance as class C i when it is actually of class C k there is a “Reject” option (i.e., not to classify an instance in any class. Let the cost of “Reject” be λ. For minimum risk, choose most probable class, unless is better to reject

Example: Exercise 4 from Chapter 4 Assume 2 classes: C1 and C2  Case 1: Assume the two misclassifications are equally costly, and there is no reject option: λ 11 = λ 22 = 0, λ 12 = λ 21 = 1  Case 2: Assume the two misclassifications are not equally costly, and there is no reject option: λ 11 = λ 22 = 0, λ 12 = 10, λ 21 = 5  Case 3: Like Case 2 but with a reject option: λ 11 = λ 22 = 0, λ 12 = 10, λ 21 = 5, λ = 1 See decision boundaries on the next slide 10

Different Losses and Reject See calculations for these plots on solutions to Exercise 4 11 Equal losses Unequal losses With reject

Discriminant Functions 12 K decision regions R 1,...,R K Classification can be seen as implementing a set of discriminant functions g i (x):

K=2 Classes see Chapter 3 Exercises 2 and 3 Some alternative ways of combining discriminant functions g 1 (x)= P(C 1 |x) and g 2 (x)= P(C 2 |x) into just one g(x):  define g(x) = g 1 (x) – g 2 (x)  In terms of log odds: log[P(C 1 |x)/P(C 2 |x)] define  In terms of likelihood ratio: P(x|C 1 )/P(x|C 2 ) define 13

Utility Theory  Prob of state k given exidence x: P (S k |x)  Utility of α i when state is k: U ik  Expected utility: 14

Association Rules  Association rule: X  Y  People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y.  A rule implies association, not necessarily causation. 15

Association measures 16  Support (X  Y):  Confidence (X  Y):  Lift (X  Y):

Example 17

Apriori algorithm (Agrawal et al., 1996) 18  For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent.  If (X,Y) is not frequent, none of its supersets can be frequent.  Once we find the frequent k-item sets, we convert them to rules: X, Y  Z,... and X  Y, Z,...