Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Slides:

Advertisements

Similar presentations

Dummy Dependent variable Models

Advertisements

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Linear Regression.

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Linear Classifiers (perceptrons)

Pattern Recognition and Machine Learning

Support Vector Machines

INTRODUCTION TO Machine Learning 3rd Edition

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Chapter 4: Linear Models for Classification

Computer vision: models, learning and inference

Classification and risk prediction

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Linear Discriminant Functions Chapter 5 (Duda et al.)

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Maximum likelihood (ML)

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Crash Course on Machine Learning

Collaborative Filtering Matrix Factorization Approach

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.

Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.

LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.

Today Ensemble Methods. Recap of the course. Classifier Fusion

1 E. Fatemizadeh Statistical Pattern Recognition.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Non-Bayes classifiers. Linear discriminants, neural networks.

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity

Lecture 4 Linear machine

Linear Models for Classification

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Review of statistical modeling and probability theory Alan Moses ML4bio.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Machine Learning – Classification David Fenyő

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

CH 5: Multivariate Methods

Lecture 8:Eigenfaces and Shared Features

Machine Learning Basics

Collaborative Filtering Matrix Factorization Approach

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Mathematical Foundations of BME

Multivariate Methods Berlin Chen, 2005 References:

Linear Discrimination

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

Classifiers: The Swiss Army Tool of Vision A HUGE number of vision problems can be reduced to:  Is this a _____ or not? The next two lectures will focus on making that decision Classifiers that we will cover  Bayesian classification  Logistic regression  Boosting  Support Vector Machines  Nearest-Neighbor Classifiers

Motivating Problem Which pixels in this image are “skin pixels”? Useful for tracking, finding people, finding images with too much skin.

How could you find skin pixels? Step 1: Get Data Label every pixel as skin or not skin

Getting Probabilities Now that I have a bunch of examples, I can create probability distributions.  P([r,g,b]|skin) = Probability of an [r,g,b] tuple given that the pixel is skin  P([r,g,b]|~skin) = Probability of an [r,g,b] tuple given that the pixel is not skin

(From Jones and Rehg)

Using Bayes Rule x – the observation y – some underlying cause (skin/not skin)

Using Bayes Rule Prior Likelihood Normalizing Constant

Classification In this case P[skin|x] = 1-P[~skin|x] So the classifier reduces to  P[skin|x] > 0.5? We can change this to  P[skin|x] > c And vary c

The effect of varying c This is called a Receiver Operating Curve (or ROC From Jones and Rehg

Application: Finding Adult Pictures Let's say you needed to build a web filter for a library Could look at a few simple measurements based on the skin model

Example of Misclassified Image

Example of Correctly Classified Image

ROC Curve

Generative versus Discriminative Models The classifier that I have just described is known as a generative model Once you know all of the probabilities, you can generate new samples of the data May be too much work You could also optimize a function to just discriminate skin and not skin

Discriminative Classification using Logistic Regression Imagine we had two measurements and we plotted each sample on a 2D chart

Discriminative Classification using Logistic Regression Imagine we had two measurements and we plotted each sample on a 2D chart To separate the two groups, we'll project each point onto a line Some points will be projected to positive values and some will be projected to negative values

Discriminative Classification using Logistic Regression This line defines a separating line Each point is classified based on where it falls on the line

How do we get the line? Common Option: Logistic Regression Logistic Function:

The logistic function Notice that g(x) goes from 0 to 1 We can use this to estimate the probability something being an x or an o We need to find a function that will have large positive values for x's And large negative values for o's

Fitting the Line Remember, we want a line. For the diagram below, x = +1, o = -1 y = label of point (-1 or +1)

Fitting the line The logistic function gives us an estimate of the probability of an example being either +1 or -1 We can fit the line by maximizing the conditional probability of the correct labeling of the training set Also called features

Fitting the Line We have multiple samples that we assume are independent, so the probability of the whole training set is

Fitting the line It is usually easier to optimize the log conditional probability

Optimizing Lots of options Easiest option: Gradient ascent : The Learning Rate parameter, many ways to choose this

Choosing My (current) personal favorite method Choose some value for Update w, Compute new probability If the new probability does not rise, divide by 2 Otherwise multiply it by 1.1 (or something similar) Called “Bold-Driver” heuristic

Faster Option Computing the gradient requires summing over every training example Could be slow for a large training set Speed-up: Stochastic Gradient Ascent Instead of computing the gradient over the whole training set, instead choose one point at random. Do update based on that one point

Limitations Remember, we are only separating the two classes with a line Separate this data with a line: This is a fundamental problem, most things can't be separated by a line

Overcoming these limitations Two options:  Train on a more complicated function Quadratic Cubic  Make a new set of features:

Advantages We achieve non-linear classification by doing linear classification on non-linear transformations of the features Only have to rewrite feature generation code Learning code stays the same

Nearest Neighbor Classifier Is the “?” an x or an o? ?

Nearest Neighbor Classifier Is the “?” an x or an o? ?

Nearest Neighbor Classifier Is the “?” an x or an o? ?

Basic idea For your new example, find the k nearest neighbors in the training set Each neighbor casts a vote Label with the most votes wins Disadvantages:  Have to find the nearest neighbors Can be slow for a large training set Good approximate methods available (LSH - Indyk)