ECE – Pattern Recognition Lecture 14 – Gradient Descent

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

ECE 471/571 - Lecture 13 Gradient Descent 10/13/14.
Lecture 20 Object recognition I
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Linear Discriminant Functions Chapter 5 (Duda et al.)
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
This week: overview on pattern recognition (related to machine learning)
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
ECE 471/571 – Lecture 21 Syntactic Pattern Recognition 11/19/15.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Syntactic Pattern Recognition 04/28/17
ECE 471/571 - Lecture 19 Review 02/24/17.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Performance Evaluation 02/15/17
ECE 5424: Introduction to Machine Learning
Dimensionality reduction
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
Unsupervised Learning - Clustering 04/03/17
Basic machine learning background with Python scikit-learn
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Unsupervised Learning - Clustering
ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants
ECE 599/692 – Deep Learning Lecture 2 - Background
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Linear machines 28/02/2017.
ECE 471/571 - Lecture 14 Gradient Descent.
Ying shen Sse, tongji university Sep. 2016
ECE 471/571 – Lecture 12 Perceptron.
ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)
ECE 599/692 – Deep Learning Lecture 4 – CNN: Practical Issues
ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power
Hairong Qi, Gonzalez Family Professor
ECE 471/571 – Review 1.
Parametric Estimation
Hairong Qi, Gonzalez Family Professor
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Parametric Methods Berlin Chen, 2005 References:
ECE – Pattern Recognition Lecture 16: NN – Back Propagation
ECE – Pattern Recognition Lecture 15: NN - Perceptron
ECE – Pattern Recognition Lecture 13 – Decision Tree
Hairong Qi, Gonzalez Family Professor
Machine Learning Perceptron: Linearly Separable Supervised Learning
Linear Discrimination
Machine Learning – a Probabilistic Perspective
Hairong Qi, Gonzalez Family Professor
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
ECE – Lecture 1 Introduction.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Bayesian Decision Theory
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Midterm Review
Presentation transcript:

ECE471-571 – Pattern Recognition Lecture 14 – Gradient Descent Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi Email: hqi@utk.edu

Pattern Classification Statistical Approach Non-Statistical Approach Supervised Unsupervised Decision-tree Decision Tree Basic concepts: Baysian decision rule (MPP, LR, Discri.) Basic concepts: Distance Agglomerative method Syntactic approach Parameter estimate (ML, BL) k-means Non-Parametric learning (kNN) Winner-takes-all LDF (Perceptron) Kohonen maps NN (BP) Mean-shift Support Vector Machine Deep Learning (DL) Dimensionality Reduction FLD, PCA Performance Evaluation ROC curve (TP, TN, FN, FP) cross validation Stochastic Methods local opt (GD) global opt (SA, GA) Classifier Fusion majority voting NB, BKS

Review - Bayes Decision Rule Maximum Posterior Probability For a given x, if P(w1|x) > P(w2|x), then x belongs to class 1, otherwise 2 If , then x belongs to class 1, otherwise, 2. Likelihood Ratio Discriminant Function Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S Case 3: Quadratic classifier , Si = arbitrary Non-parametric kNN For a given x, if k1/k > k2/k, then x belongs to class 1, otherwise 2 Dimensionality reduction Estimate Gaussian (Maximum Likelihood Estimation, MLE), Two-modal Gaussian Performance evaluation ROC curve

General Approach to Learning Optimization methods Newton’s method Gradient descent Exhaustive search through the solution space Objective functions Maximum a-posteriori probability Maximum likelihood estimate Fisher’s linear discriminant Principal component analysis k-nearest neighbor Perceptron

General Approach to Learning Specify a model (objective function) and estimate its parameters Use optimization methods to find the parameters 1st derivative = 0 Gradient descent Exhaustive search through the solution space

Newton-Raphson Method Used to find solution to equations

Newton-Raphson Method vs. Gradient Descent f(x) = x2 – 5x – 4 f(x) = xcosx Newton-Raphson method Used to find solution to equations Find x for f(x) = 0 The approach Step 1: select initial x0 Step 2: Step 3: if |xk+1 – xk| < e, then stop; else xk = xk+1 and go back step 2. Gradient descent Used to find optima, i.e. solutions to derivatives Find x* such that f(x*) < f(x) The approach Step 1: select initial x0 Step 2: Step 3: if |xk+1 – xk| < e, then stop; else xk = xk+1 and go back step 2. x^{k+1} = x^k - \frac{f(x^k)}{f'(x^k)} x^{k+1} = x^k - \frac{f'(x^k)}{f''(x^k)}=x^k-cf'(x^k)

On the Learning Rate

Geometric Interpretation Gradient of tangent is 2 x1 x0 http://www.teacherschoice.com.au/Maths_Library/Calculus/tangents_and_normals.htm

function [y, dy] = g(x) y = 50*sin(x) + x .* x ; dy = 50*cos(x) + 2*x; % [x] = gd(x0, c, epsilon) % - Demonstrate gradient descent % - x0: the initial x % - c: the learning rate % - epsilon: controls the accuracy of the solution % - x: an array tracks x in each iteration % % Note: try c = 0.1, 0.2, 1; epsilon = 0.01, 0.1 function [x] = gd(x0, c, epsilon) % plot the curve clf; fx = -10:0.1:10; [fy, dfy] = g(fx); plot(fx, fy); hold on; % find the minimum x(1) = x0; finish = 0; i = 1; while ~finish [y, dy] = g(x(i)); plot(x(i), y, 'r*'); pause x(i+1) = x(i) - c * dy; if abs(x(i+1)-x(i)) < epsilon finish = 1; end i = i + 1; x(i) % plot trial points [y, dy] = g(x); for i=1:length(x) plot(x(i), y(i), 'r*'); function [y, dy] = g(x) y = 50*sin(x) + x .* x ; dy = 50*cos(x) + 2*x;