Machine Learning” Dr. Alper Özpınar.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

: INTRODUCTION TO Machine Learning Parametric Methods.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Machine Learning CMPT 726 Simon Fraser University
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Thanks to Nir Friedman, HU
Bayesian Decision Theory Making Decisions Under uncertainty 1.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
INTRODUCTION TO Machine Learning 3rd Edition
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Maximum Likelihood Estimation
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian classification review Bayesian statistics derive K nearest neighbors (KNN) classifier analysis of 2-way classification results homework assignment.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Lecture 2. Bayesian Decision Theory
Univariate Gaussian Case (Cont.)
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Bayesian Estimation and Confidence Intervals
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
Parameter Estimation 主講人:虞台文.
Maximum Likelihood Estimation
Special Topics In Scientific Computing
Digital Systems: Hardware Organization and Design
CHAPTER 3: Bayesian Decision Theory
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning
Covered only ML estimator
INTRODUCTION TO Machine Learning 3rd Edition
Pattern Recognition and Machine Learning
CHAPTER 6 Statistical Inference & Hypothesis Testing
INTRODUCTION TO Machine Learning
LECTURE 07: BAYESIAN ESTIMATION
A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Parametric Estimation
INTRODUCTION TO Machine Learning
Foundations 2.
Mathematical Foundations of BME Reza Shadmehr
Applied Statistics and Probability for Engineers
Presentation transcript:

Machine Learning” Dr. Alper Özpınar

CHAPTER 2: Supervised Learning

CHAPTER 4: Parametric Methods

Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation: Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X e.g., N ( μ, σ2) where q = { μ, σ2}

February 24, 2019

Maximum Likelihood Estimation Likelihood of q given the sample X l (θ|X) = p (X |θ) = ∏t p (xt|θ) Log likelihood L(θ|X) = log l (θ|X) = ∑t log p (xt|θ) Maximum likelihood estimator (MLE) θ* = argmaxθ L(θ|X)

Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = pox (1 – po ) (1 – x) L (po|X) = log ∏t poxt (1 – po ) (1 – xt) MLE: po = ∑t xt / N Multinomial: K>2 states, xi in {0,1} P (x1,x2,...,xK) = ∏i pixi L(p1,p2,...,pK|X) = log ∏t ∏i pixit MLE: pi = ∑t xit / N

Gaussian (Normal) Distribution p(x) = N ( μ, σ2) MLE for μ and σ2: μ σ

Bias and Variance Unknown parameter q Estimator di = d (Xi) on sample Xi Bias: bq(d) = E [d] – q Variance: E [(d–E [d])2] Mean square error: r (d,q) = E [(d–q)2] = (E [d] – q)2 + E [(d–E [d])2] = Bias2 + Variance q

Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X) Maximum Likelihood (ML): θML = argmaxθ p(X|θ) Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

Probability and Inference Result of tossing a coin is Î {Heads,Tails} Random var X Î{1,0} Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X) Sample: X = {xt }Nt =1 Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N Prediction of next toss: Heads if po > ½, Tails otherwise

Classification Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk Input: x = [x1,x2]T ,Output: C Î {0,1} Prediction:

Bayes’ Rule prior likelihood posterior evidence

Bayes’ Rule: K>2 Classes

Losses and Risks Actions: αi Loss of αi when the state is Ck : λik Expected risk (Duda and Hart, 1973)

Losses and Risks: 0/1 Loss For minimum risk, choose the most probable class

Losses and Risks: Reject

Different Losses and Reject Equal losses Unequal losses With reject

Discriminant Functions K decision regions R1,...,RK

K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) Log odds:

Utility Theory Prob of state k given exidence x: P (Sk|x) Utility of αi when state is k: Uik Expected utility:

Association Rules Association rule: X ® Y People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y. A rule implies association, not necessarily causation.

Association measures Support (X ® Y): Confidence (X ® Y): Lift (X ® Y):

Example

Apriori algorithm (Agrawal et al., 1996) For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent. If (X,Y) is not frequent, none of its supersets can be frequent. Once we find the frequent k-item sets, we convert them to rules: X, Y ® Z, ... and X ® Y, Z, ...