ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

A Tutorial on Learning with Bayesian Networks

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Lauritzen-Spiegelhalter Algorithm

Pattern Recognition and Machine Learning

Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

What is Statistical Modeling

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.

Machine Learning CMPT 726 Simon Fraser University

Visual Recognition Tutorial

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Bayesian Decision Theory Making Decisions Under uncertainty 1.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

A Brief Introduction to Graphical Models

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.

Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:

Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Lecture 2: Statistical learning primer for biologists

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Machine Learning 5. Parametric Methods.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.

Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 1.31 Criteria for optimal reception of radio signals.

INTRODUCTION TO Machine Learning 2nd Edition

CHAPTER 16: Graphical Models

Data Mining Lecture 11.

CHAPTER 3: Bayesian Decision Theory

INTRODUCTION TO Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CS498-EA Reasoning in AI Lecture #20

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Recognition and Machine Learning

Machine Learning” Dr. Alper Özpınar.

LECTURE 23: INFORMATION THEORY REVIEW

LECTURE 07: BAYESIAN ESTIMATION

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CS639: Data Management for Data Science

INTRODUCTION TO Machine Learning

Presentation transcript:

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

2

Probability and Inference Result of tossing a coin is  {Heads,Tails} Random var X  {1,0} Bernoulli distribution P (X = 1) = p o P (X = 0) = (1 ‒ p o ) Sample: X = {x t } N t =1 Estimation: p o = # {Heads}/#{Tosses} = ∑ t x t / N Prediction of next toss: Heads if p o > ½, Tails otherwise 3 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

4 Classification Credit scoring: Inputs are income(x 1 ) and savings (x 2 ). Output is low-risk vs high-risk Input: x = [x 1, x 2 ] T, Output: C  {0, 1} Prediction: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Bayes’ Rule posterior Likelihood : conditional probability prior Evidence : marginal probability 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Bayes’ Rule: K>2 Classes 6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Losses and Risks Action α i : the decision to assign the input to class C i Loss of α i when the input actually belongs to C k : λ ik Expected risk (Duda and Hart, 1973) for taking action α i 7 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Losses and Risks: 0/1 Loss In the special case of the zero-one loss case: then For minimum risk, choose the most probable class. 8 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

9 Losses and Risks: Reject Define an additional action of reject, : Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Discriminant Functions Set Bayes’ classifier – Use 0-1 loss function: Ignoring the term, p ( x ) : K decision regions R 1,...,R K 10 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

K=2 Classes Examples of K=2 : g(x) = g 1 (x) – g 2 (x) Log odds: 11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Utility Theory Certain features may be costly to observe. To assess the value of information that additional features may provide. Prob of state S k given evidence x : P (S k |x) Utility function of action α i when state is S k : U ik To measure how good it is to take action α i when the state is S k Expected utility For example, in classification, decisions correspond to choosing one of the classes, and maximizing the expected utility is equivalent to minimizing expected risk. 12 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Association Rules Association rule: X  Y People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y. A rule implies association, not necessarily causation ( 因果關係 ). 13 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Association measures Association rule: X  Y Support (X  Y): the statistical significance of the rule Confidence (X  Y): the conditional probability Lift (X  Y) (the interest of the association rule) If X and Y are independent, then we expect lift to be close to 1. If the lift is more than 1, that X makes Y more likely, and if the lift is less than 1, having X make Y less likely. 14 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Apriori algorithm (Agrawal et al., 1996) For example, {X, Y, Z } is a 3-item set, and we may look for a rule, such as X, Y  Z. All such rules have high enough support and confidence. Since a sales database is generally very large, it needs an efficient algorithm ( Apriori algorithm ) to find these rules by doing a small number of passes over the database. Apriori algorithm Finding frequent ( 時常發生的 ) itemsets which have enough support. If {X,Y,Z} is frequent, then {X,Y}, {X,Z}, and {Y,Z} should be frequent. If {X,Y} is not frequent, none of its supersets can be frequent. Converting them to rules with enough confidence Once we find the frequent k -item sets, we convert them to rules: X, Y  Z,... and X  Y, Z,... For all possible single consequents ( 後項 ), we check if the rule has enough confidence and remove it if it does not. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15

Exercise In a two-class, two-action problem, if the loss function is,, and, write the optimal decision rule. Show that as we move an item from the antecedent to the consequent, confidence can never increase: confidence(ABC  D)  confidence(AB  CD) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16

17

Bayesian Networks Bayesian Networks: also known as graphical models, probabilistic networks Nodes are hypotheses (random variables) and the probability corresponds to our belief in the truth of the hypothesis. Arcs are direct influences between hypotheses. The structure is represented as a directed acyclic graph (DAG). The parameters are the conditional probabilities in the arcs. (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996) 18 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Causes and Bayes’ Rule Diagnostic inference: Knowing that the grass is wet, what is the probability that rain is the cause? causal diagnostic 19Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Conditional Independence X and Y are independent if P(X,Y)=P(X)P(Y) X and Y are conditionally independent given Z if P(X,Y|Z)=P(X|Z)P(Y|Z) or P(X|Y,Z)=P(X|Z) Three canonical ( 標準的 ) cases for conditional independence: Head-to-tail connection Tail-to-tail connection Head-to-head connection 20 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Case 1: Head-to-Tail Connection P(X,Y,Z)=P(X)P(Y|X)P(Z|Y) X and Z are independent given Y 21 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) P(W|C) = P(W|R)P(R|C)+P (W|~R)P(~R|C) = 0.9 X X 0.2 = 0.76 P(C|W) = P(W|C)P(C)/P(W) = 0.76 X 0.4/0.47 = 0.65 P(R) = P(R|C)P(C)+P(R|~C)P(~C) = 0.8 X X 0.6 = 0.38 P(W) = P(W|R)P(R)+P(W|~R)P(~R) = 0.9 X X 0.62 = 0.47

Case 2: Tail-to-Tail Connection P(X,Y,Z)=P(X)P(Y|X)P(Z|X) Y and Z are independent given X 22 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) P(C|R) = P(R|C)P(C)/P(R)= P(R|C)P(C)/(P(R|C)P(C)+ P(R|~C)P(~C)) = 0.8 X 0.5/(0.8 X X 0.5) = 0.89 P(R|S) = P(R|C)P(C|S)+P(R|~C)P(~C|S) = 0.22 (Pages ) 0.22 =P(R|S) < P(R) = 0.45

Case 3: Head-to-Head Connection P(X,Y,Z)=P(X)P(Y)P(Z|X,Y) X and Y are independent 23 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Causal Inference Causal inference: If the sprinkler is on, what is the probability that the grass is wet? P (W |S) = P (W |R, S) P (R |S) + P (W |~R,S) P (~R |S) = P (W |R,S) P (R) + P (W |~R, S) P (~R) = 0.95x x0.6 = 0.92 P(R) and P(S) are independent. 24 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Diagnostic Inference Diagnostic inference: If the grass is wet, what is the probability that the sprinkler is on? P (S |W) = 0.35 > 0.2 ( =P (S)) where P (W) = 0.52 P (S |R,W) = 0.21 < 0.35 Explaining away: Knowing that it has rained decreases the probability that the sprinkler is on. 25 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Diagnostic Inference 26 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Exercise P417 (3) Calculate P(R|W), P(R|W,S), and P(R|W,~S). Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27

Bayesian Networks: Causes Causal inference: P (W |C) = P (W |R,S,C) P (R,S |C) + P (W |~R,S,C) P (~R,S |C) + P(W |R,~S,C) P (R,~S |C) + P (W |~R,~S,C) P (~R,~S|C) = 0.76 use the fact that P (R, S |C) = P (R |C) P (S |C) Diagnostic: P (C |W ) = ? (Exercise) 28 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Causal Inference 29 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Bayesian Networks Belief propagation (Pearl, 1988) use for inference when the network is a tree Junction trees (Lauritzen and Spiegelhalter, 1988) convert a given directed acyclic graph to a tree 30 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Bayesian Networks: Classification diagnostic P (C | x ) Bayes’ rule inverts the arc: 31 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)