Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis 2018-11-19 Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis George Bebis
Final Exam Material Midterm Exam Material Feature Selection Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm Case studies are also included in the final exam
Feature Selection What is the goal of feature selection? Select features having high discrimination power while ignoring or paying less attention to the rest. What are the main steps in feature selection? Search the space of possible feature subsets. Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).
Feature Selection What are the main search and evaluation strategies? Search strategies: Optimal, Heuristic, Randomized Evaluation strategies: filter, wrapper What is the main difference between filter and wrapper methods? In filter methods, evaluation is independent of the classification algorithm. In wrapper methods, evaluation depends on the classification algorithm.
Feature Selection You need to be familiar with: Exhaustive and Naïve search Sequential Forward/Backward Selection (SFS/SBS) Plus-L Minus-R Selection Bidirectional Search Sequential Floating Selection (SFFS and SFBS) Feature selection using GAs
Linear Discriminant Functions General form of linear discriminant: What is the form of the decision boundary? The decision boundary is a hyperplane What is the meaning of w and w0? The orientation and location of the hyperplane are determined by w and w0 correspondingly.
Linear Discriminant Functions What is the geometric interpretation of g(x)? Distance of x from the decision boundary (hyperplane)
Linear Discriminant Functions How do we estimate w and w0? Apply learning using a set of labeled training examples What is the effect of each training example? Places a constraint on the solution solution space (ɑ1, ɑ2) feature space (y1, y2) a1 a2
Linear Discriminant Functions Iterative optimization – what is the main idea? Minimize some error function J(α) iteratively α(k) α(k+1) search direction learning rate
Linear Discriminant Functions Gradient descent method Newton method Perceptron rule
Support Vector Machines What is the capacity of a classifier? What is the VC dimension of a classifier? What is structural risk minimization? Find solutions that (1) minimize the empirical risk and (2) have low VC dimension. It can be shown that: with probability (1-δ)
Support Vector Machines What is the margin of separation? How is it defined? What is the relationship between VC dimension and margin of separation? VC dimension is minimized by maximizing the margin of separation. support vectors
Support Vector Machines What is the criterion being optimized by SVMs? maximize margin:
Support Vector Machines SVM solution depends only on the support vectors: Soft margin classifier – tolerate “outliers”
Support Vector Machines Non-linear SVM – what is the main idea? Map data to a high dimensional space h
Support Vector Machines What is the kernel trick? Compute dot products using a kernel function e.g., polynomial kernel: K(x,y)=(x . y) d
Support Vector Machines Comments about SVMs: SVM is based on exact optimization (no local optima). Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space. Performance depends on the choice of the kernel and its parameters.
Expectation-Maximization (EM) What is the EM algorithm? An iterative method to perform ML estimation i.e., max p(D/ θ) When is EM useful? Most useful for problems where the data is incomplete or can be thought as being incomplete.
Expectation-Maximization (EM) What are the steps of the EM algorithm? Initialization: θ0 Expectation Step: Maximization Step: Test for convergence: Convergence properties of EM ? Solution depends on the initial estimate θ0 No guarantee to find global maximum but stable
Expectation-Maximization (EM) What is a Mixture of Gaussians (MoG)? How are the MoG parameters estimated? Using the EM algorithm How is EM used to estimate the MoGs parameters? Introduce “hidden variables:
Expectation-Maximization (EM) Can you interpret the EM steps for MoGs?
Expectation-Maximization (EM) Can you interpret the EM steps for MoGs?