Machine Learning” Dr. Alper Özpınar.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

: INTRODUCTION TO Machine Learning Parametric Methods.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Pattern Recognition and Machine Learning

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

CS479/679 Pattern Recognition Dr. George Bebis

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Machine Learning CMPT 726 Simon Fraser University

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

Thanks to Nir Friedman, HU

Bayesian Decision Theory Making Decisions Under uncertainty 1.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

INTRODUCTION TO Machine Learning 3rd Edition

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Maximum Likelihood Estimation

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Machine Learning 5. Parametric Methods.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian classification review Bayesian statistics derive K nearest neighbors (KNN) classifier analysis of 2-way classification results homework assignment.

Bayesian Estimation and Confidence Intervals Lecture XXII.

Lecture 2. Bayesian Decision Theory

Univariate Gaussian Case (Cont.)

Lecture 1.31 Criteria for optimal reception of radio signals.

CS479/679 Pattern Recognition Dr. George Bebis

Bayesian Estimation and Confidence Intervals

LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION

Probability Theory and Parameter Estimation I

Parameter Estimation 主講人：虞台文.

Maximum Likelihood Estimation

Special Topics In Scientific Computing

Digital Systems: Hardware Organization and Design

CHAPTER 3: Bayesian Decision Theory

INTRODUCTION TO Machine Learning

INTRODUCTION TO Machine Learning

Covered only ML estimator

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Recognition and Machine Learning

CHAPTER 6 Statistical Inference & Hypothesis Testing

INTRODUCTION TO Machine Learning

LECTURE 07: BAYESIAN ESTIMATION

A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.

Parametric Methods Berlin Chen, 2005 References:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Parametric Estimation

INTRODUCTION TO Machine Learning

Mathematical Foundations of BME Reza Shadmehr

Applied Statistics and Probability for Engineers

Presentation transcript:

Machine Learning” Dr. Alper Özpınar

CHAPTER 2: Supervised Learning

CHAPTER 4: Parametric Methods

Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation: Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X e.g., N ( μ, σ2) where q = { μ, σ2}

February 24, 2019

Maximum Likelihood Estimation Likelihood of q given the sample X l (θ|X) = p (X |θ) = ∏t p (xt|θ) Log likelihood L(θ|X) = log l (θ|X) = ∑t log p (xt|θ) Maximum likelihood estimator (MLE) θ* = argmaxθ L(θ|X)

Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = pox (1 – po ) (1 – x) L (po|X) = log ∏t poxt (1 – po ) (1 – xt) MLE: po = ∑t xt / N Multinomial: K>2 states, xi in {0,1} P (x1,x2,...,xK) = ∏i pixi L(p1,p2,...,pK|X) = log ∏t ∏i pixit MLE: pi = ∑t xit / N

Gaussian (Normal) Distribution p(x) = N ( μ, σ2) MLE for μ and σ2: μ σ

Bias and Variance Unknown parameter q Estimator di = d (Xi) on sample Xi Bias: bq(d) = E [d] – q Variance: E [(d–E [d])2] Mean square error: r (d,q) = E [(d–q)2] = (E [d] – q)2 + E [(d–E [d])2] = Bias2 + Variance q

Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X) Maximum Likelihood (ML): θML = argmaxθ p(X|θ) Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

Probability and Inference Result of tossing a coin is Î {Heads,Tails} Random var X Î{1,0} Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X) Sample: X = {xt }Nt =1 Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N Prediction of next toss: Heads if po > ½, Tails otherwise

Classification Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk Input: x = [x1,x2]T ,Output: C Î {0,1} Prediction:

Bayes’ Rule prior likelihood posterior evidence

Bayes’ Rule: K>2 Classes

Losses and Risks Actions: αi Loss of αi when the state is Ck : λik Expected risk (Duda and Hart, 1973)

Losses and Risks: 0/1 Loss For minimum risk, choose the most probable class

Losses and Risks: Reject

Different Losses and Reject Equal losses Unequal losses With reject

Discriminant Functions K decision regions R1,...,RK

K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) Log odds:

Utility Theory Prob of state k given exidence x: P (Sk|x) Utility of αi when state is k: Uik Expected utility:

Association Rules Association rule: X ® Y People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y. A rule implies association, not necessarily causation.

Association measures Support (X ® Y): Confidence (X ® Y): Lift (X ® Y):

Example

Apriori algorithm (Agrawal et al., 1996) For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent. If (X,Y) is not frequent, none of its supersets can be frequent. Once we find the frequent k-item sets, we convert them to rules: X, Y ® Z, ... and X ® Y, Z, ...