CS668: Pattern Recognition Ch 1: Introduction

CS668: Pattern Recognition Ch 1: Introduction
Daniel Barbará

Patterns Searching for patterns in data is a fundamental problem with a successful history Many patterns have led to laws: e.g., astronomical observations led to planetary motion laws, Patterns in atomic spectra led to quantum physics Discovery of regularities through computer algorithms and the use of those regularities to make decisions (e.g., classify data into categories)

Example Handwritten Digit Recognition

How can you do it? Develop a series of rules or heuristics describing the shapes of the digits Naïve Brittle A Machine Learning approach: Characterize the digits as a series of features x Discover a function y(x) that maps the feature vector to a category {c1,c2,…,ck} Called supervised learning (under the ‘teacher’ which is the training set)

Other pattern recognition problems
Unsupervised learning (clustering): discover groups of data (previously unknown) Density estimation: discover the distribution of data Prediction: like classification, but with real values Reinforcement learning: find suitable actions to take in a given situation in order to maximize a reward

Polynomial Curve Fitting

Sum-of-Squares Error Function
Minimizing and objective function: error function

Choosing the order of the polynomial: 0th Order

1st Order Polynomial

3rd Order Polynomial

9th Order Polynomial

Observations The 9th order polynomial results in zero errors (Is this the best?) Lots of oscillations How about predicting the future? OVERFITTING! Likely to do poorly in future data

Over-fitting Root-Mean-Square (RMS) Error:

What is going on? The data was generated using
A power expansion (e.g., Taylor) of that function contains all orders We should expect improvement as we increase M!!! What gives?

Polynomial Coefficients

What is going on? The larger values of M result in coefficients that are increasingly tuned to noise Paying to much attention to the training data is not a good thing! This problem varies with the size of the training set

Data Set Size: 9th Order Polynomial

Regularization Penalize large coefficient values

Regularization:

Regularization: vs.

Polynomial Coefficients

Classification Build a machine that can do: Fingerprint identification
OCR (Optical Character Recognition) DNA sequence identification

An Example “Sorting incoming Fish on a conveyor according to species using optical sensing” Sea bass Species Salmon

Problem Analysis Set up a camera and take some sample images to extract features Length Lightness Width Number and shape of fins Position of the mouth, etc… This is the set of all suggested features to explore for use in our classifier!

The features are passed to a classifier
Preprocessing Use a segmentation operation to isolate fishes from one another and from the background Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features The features are passed to a classifier

Classification Select the length of the fish as a possible feature for discrimination

The length is a poor feature alone!
Select the lightness as a possible feature.

Task of decision theory
Threshold decision boundary and cost relationship Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory

Adopt the lightness and add the width of the fish
Fish xT = [x1, x2] Lightness Width

We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such “noisy features” Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:

Issue of generalization!
However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input Issue of generalization!

Conclusion Reader seems to be overwhelmed by the number, complexity and magnitude of the sub-problems of Pattern Recognition Many of these sub-problems can indeed be solved Many fascinating unsolved problems still remain

Probability Theory Apples and Oranges

Probability Theory Marginal Probability Conditional Probability
Joint Probability

Probability Theory Sum Rule Product Rule

The Rules of Probability
Sum Rule Product Rule 44

Bayes’ Theorem posterior  likelihood × prior

Probability Densities

Transformed Densities

Expectations Conditional Expectation (discrete)
Approximate Expectation (discrete and continuous)

Variances and Covariances

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

Gaussian Parameter Estimation
Likelihood function

Maximum (Log) Likelihood

Properties of and

Curve Fitting Re-visited

Maximum Likelihood Determine by minimizing sum-of-squares error,

Predictive Distribution

MAP: A Step towards Bayes
Determine by minimizing regularized sum-of-squares error,

Bayesian Curve Fitting

Bayesian Predictive Distribution

With all the detail… See MLE&Bayesian.pdf

Lessons MLE: postulate a distribution (parametric) Bayesian:
Form the log likelihood Maximize with respect to parameters (use optimization techniques to find the optimal values) Bayesian: Postulate a prior and a likelihood distributions for the parameter (*CAREFUL: use conjugacy so the function form is preserved*) Determine the distribution for the parameter(s) using Bayes theorem

Model Selection Cross-Validation

Curse of Dimensionality
Grid approach Original problem

Volume What is the fraction of volume captured in a slice of a hypersphere between r =1- and r =1?

Polynomial curve fitting, M = 3 Gaussian Densities in higher dimensions

Decision Theory Inference step Determine either or . Decision step
For given x, determine optimal t.

Decision rule with only the prior information
Decide 1 if P(1) > P(2) otherwise decide 2 Use of the class –conditional information P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon

Bayes Posterior, likelihood, evidence
P(j | x) = P(x | j) . P (j) / P(x) Where in case of two categories Posterior = (Likelihood. Prior) / Evidence

Minimum Misclassification Rate

Decision given the posterior probabilities
X is an observation for which: if P(1 | x) > P(2 | x) True state of nature = 1 if P(1 | x) < P(2 | x) True state of nature = 2 Therefore: whenever we observe a particular x, the probability of error is : P(error | x) = P(1 | x) if we decide 2 P(error | x) = P(2 | x) if we decide 1

Minimum Expected Loss Example: classify medical images as ‘cancer’ or ‘normal’ Decision Truth

Minimum Expected Loss Regions are chosen to minimize

Reject Option

Why Separate Inference and Decision?
Minimizing risk (loss matrix may change over time) Reject option Unbalanced class priors Combining models

Decision Theory for Regression
Inference step Determine Decision step For given x, make optimal prediction, y(x), for t. Loss function:

The Squared Loss Function

Generative vs Discriminative
Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly

Entropy Important quantity in coding theory statistical physics
machine learning

Entropy Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x? All states equally likely

Entropy

Entropy In how many ways can N identical objects be allocated M bins?
Entropy maximized when

Entropy

Differential Entropy Put bins of width ¢ along the real line
Differential entropy maximized (for fixed ) when in which case

Conditional Entropy

The Kullback-Leibler Divergence

Mutual Information

CS668: Pattern Recognition Ch 1: Introduction

Similar presentations

Presentation on theme: "CS668: Pattern Recognition Ch 1: Introduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS668: Pattern Recognition Ch 1: Introduction

Similar presentations

Presentation on theme: "CS668: Pattern Recognition Ch 1: Introduction"— Presentation transcript:

Similar presentations

About project

Feedback