INTRODUCTION TO Machine Learning

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

A Tutorial on Learning with Bayesian Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
INTRODUCTION TO Machine Learning 3rd Edition
Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.
Bayes for beginners Methods for dummies 27 February 2013 Claire Berna
Part II: Graphical models
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Machine Learning CMPT 726 Simon Fraser University
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review: Probability Random variables, events Axioms of probability
Crash Course on Machine Learning
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
INTRODUCTION TO Machine Learning 2nd Edition
CHAPTER 16: Graphical Models
CHAPTER 3: Bayesian Decision Theory
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
CSCI 5822 Probabilistic Models of Human and Machine Learning
Pattern Recognition and Machine Learning
Machine Learning” Dr. Alper Özpınar.
Class #16 – Tuesday, October 26
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Foundations 2.
Presentation transcript:

INTRODUCTION TO Machine Learning Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 3: Bayesian Decision Theory

Probability Tossing a coin is a random process because we cannot predict at any toss whether the outcome will be heads or tails, that is why we toss coins, or buy lottery tickets, or get insurance. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Probability and Inference Result of tossing a coin is Î {Heads,Tails} Random var X Î{1,0} Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X) Sample: X = {xt }Nt =1 Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N Prediction of next toss: Heads if po > ½, Tails otherwise Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Classification Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk Input: x = [x1,x2]T ,Output: C Î {0,1} Prediction: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayes’ Rule prior likelihood posterior evidence Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Prior probability Class Likelihood Evidence Poster probability P(C=1) (A customer is high-risk) P(C=0)+P(C=1) = 1 Class Likelihood P(x|C): The conditional probability that an event belonging to C has the associated observation value x. Evidence P(x): The marginal probability that an observation x is seen, regardless of whether it is positive or negative example. P(x) = P(x|C=1)P(C=1)+ P(x|C=0)P(C=0) Poster probability Posterior = prior*likelihood/evidence Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayes’ Rule prior likelihood posterior evidence Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayes’ Rule: K>2 Classes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayes’ Classifier Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Losses and Risks Actions: αi Loss of αi when the state is Ck : λik Expected risk (Duda and Hart, 1973) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Losses and Risks: 0/1 Loss For minimum risk, choose the most probable class Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Losses and Risks: Reject Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Discriminant Functions K decision regions R1,...,RK Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) Log odds: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Utility Theory Prob of state k given exidence x: P (Sk|x) Utility of αi when state is k: Uik Expected utility: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Value of Information Expected utility using x only Expected utility using x and new feature z z is useful if EU (x,z) > EU (x) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Networks Also called belief networks Probabilistic networks They are graphical models for representing the interaction between variable visually. A bayesian network is composed of nodes and arcs between nodes. Each node corresponds to a random variable, X, and has a value corresponding to the probability of the random variable, P(X). Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Networks If there is a directed arc from node X to node Y, this indicates that X has a direct influence on Y. This influence is specified by the conditional probability P(X|Y) . The networks is a directed acyclic graph (DAG), namely, there are no cycles. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Networks Aka graphical models, probabilistic networks Nodes are hypotheses (random vars) and the prob corresponds to our belief in the truth of the hypothesis Arcs are direct direct influences between hypotheses The structure is represented as a directed acyclic graph (DAG) The parameters are the conditional probs in the arcs (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Causes and Bayes’ Rule Diagnostic inference: Knowing that the grass is wet, what is the probability that rain is the cause? diagnostic causal Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Causal vs Diagnostic Inference Causal inference: If the sprinkler is on, what is the probability that the grass is wet? P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S) = P(W|R,S) P(R) + P(W|~R,S) P(~R) = 0.95 0.4 + 0.9 0.6 = 0.92 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Diagnostic inference: If the grass is wet, what is the probability that the sprinkler is on? P(S|W)=?. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

P(S|W) P(S|W) = P(W|S)P(S)/P(W) = 0.92*0.2/0.52 = 0.35 Where P(W) = P(W| R, S)P( R, S) + P(W|~R, S)P(~R, S) + P(W| R, ~S)P( R,~S) + P(W|~R,~S)P(~R,~S) = P(W| R, S)P(R) P(S) + P(W|~R, S)P(~R)P(S) + P(W| R, ~S)P(R) P(~S) + P(W|~R,~S)P(~R)P(~S) = 0.52 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

P(S|R,W): P(S|R,W) = P(W|R,S) P(S|R)/P(W|R) = P(W|R,S) P(S)/P(W|R) = 0.21 < P(S|W) = 0.35 Explaining away: given that we know it rained, the probability of sprinkler causing the wet grass decreses. Knowing that the grass is wet, rain and sprinkler become dependent. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Networks: Causes Causal inference: P(W|C) = P(W|R,S) P(R,S|C) + P(W|~R,S) P(~R,S|C) + P(W|R,~S) P(R,~S|C) + P(W|~R,~S) P(~R,~S|C) and use the fact that P(R,S|C) = P(R|C) P(S|C) Diagnostic: P(C|W ) = ? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Nets: Local structure P (F | C) = ? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Networks: Inference P (C,S,R,W,F) = P (C) P (S|C) P (R|C) P (W|R,S) P (F|R) P (C,F) = ∑S ∑R ∑W P (C,S,R,W,F) P (F|C) = P (C,F) / P(C) Not efficient! Belief propagation (Pearl, 1988) Junction trees (Lauritzen and Spiegelhalter, 1988) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayesian Networks: Classification Bayes’ rule inverts the arc: diagnostic P (C | x ) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Naive Bayes’ Classifier Given C, xj are independent: p(x|C) = p(x1|C) p(x2|C) ... p(xd|C) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Influence Diagrams decision node utility node chance node Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Association Rules Association rule: X ® Y Support (X ® Y): Confidence (X ® Y): Apriori algorithm (Agrawal et al., 1996) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)