Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University.

Slides:



Advertisements
Similar presentations
Detecting Faces in Images: A Survey
Advertisements

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Recognition by finding patterns
Lecture 3 Nonparametric density estimation and classification
AdaBoost & Its Applications
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
The Viola/Jones Face Detector (2001)
Lecture 5 Template matching
HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Learning and Vision: Discriminative Models
Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Slide credits for this chapter: Frank Dellaert, Forsyth & Ponce, Paul Viola, Christopher Rasmussen.
CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Project 2 due today Project 3 out today –help session today Announcements.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
FACE DETECTION AND RECOGNITION By: Paranjith Singh Lohiya Ravi Babu Lavu.
Face Detection using the Viola-Jones Method
A Tutorial on Object Detection Using OpenCV
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
So far focused on 3D modeling
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Face detection Slides adapted Grauman & Liebe’s tutorial
Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Recognition by finding patterns We have seen very.
Object Recognition in Images Slides originally created by Bernd Heisele.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Face Recognition: An Introduction
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Adaboost and Object Detection Xu and Arun. Principle of Adaboost Three cobblers with their wits combined equal Zhuge Liang the master mind. Failure is.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Face Detection Using Neural Network By Kamaljeet Verma ( ) Akshay Ukey ( )
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Face Detection and Head Tracking Ying Wu Electrical Engineering & Computer Science Northwestern University, Evanston, IL
Face detection Many slides adapted from P. Viola.
Recognition Part II: Face Detection via AdaBoost Linda Shapiro CSE
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
2. Skin - color filtering.
Cascade for Fast Detection
Session 7: Face Detection (cont.)
PRESENTED BY Yang Jiao Timo Ahonen, Matti Pietikainen
Lit part of blue dress and shadowed part of white dress are the same color
Classification ECE 847: Digital Image Processing Stan Birchfield
Announcements Project 1 artifact winners
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Lecture 26: Faces and probabilities
CS4670: Intro to Computer Vision
Announcements Project 2 artifacts Project 3 due Thursday night
Announcements Project 4 out today Project 2 winners help session today
Where are we? We have covered: Project 1b was due today
A Tutorial on Object Detection Using OpenCV
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Classification ECE 847: Digital Image Processing Stan Birchfield Clemson University

Acknowledgment Many slides are courtesy of Frank Dellaert and Jim Rehg at Georgia Tech from

Classification problems Detection – Search set, find all instances of class Recognition – Given instance, label its identity Verification – Given instance and hypothesized identity, verify whether correct Tracking – Like detection, but local search and fixed identity

Classification issues Feature extraction – needed for practical reasons; distinction is somewhat arbitrary: –Perfect feature extraction  classification is trivial –Perfect classifier  no need for feature extraction occlusion (missing features) mereology – study of part/whole relationships POLOPONY, BEATS (not BE EATS) segmentation – how can we classify before segmenting? how can we segment before classifying? context computational complexity: 20x20 binary input is patterns!

Mereology example What does this say?

Decision theory Decision theory – goal is to make a decision (i.e., set a decision boundary) so as to minimize cost Pattern classification is perhaps most important subfield of decision theory Supervised learning: features, data sets, algorithm decision boundary

Overfitting decision boundary Could separate perfectly using nearest neighbors But poor generalization (overfitting) – will not work well on new data Occam’s razor – The simplest explanation is the best (Philosophical principle based upon the orderliness of the creation)

Bayes decision theory 0 1 class-conditional pdfs Problem: Given a feature x, determine the most likely class:  1 or  2 Easy to measure with enough examples

Bayes’ rule prior evidence (normalization factor) likelihood (class-conditional pdf) posterior

What is this P(  1 |x) ? Probability of class 1 given data x P( 1 |x) P( 2 |x) ? P( 1 |x)+P( 2 |x)=1 ! x Note: Area under each curve is not 1

Bayes Classifier Classifier: Select Decision boundaries occur where P(1|x) P(2|x) select  2 select  1 select  2

Bayes Risk P(1|x) P(2|x) The shaded area is called the Bayes risk The total risk is the expected loss when using the classifier: where (We’re assuming loss is constant here)

Finding a decision boundary is not the same as modeling a conditional density. Discriminative vs. Generative Note: Bug in Forsyth-Ponce book: P(1|x)+P(2|x) != 1

Histograms One way to compute class- conditional pdfs is to collect a bunch of examples and store a histogram Then normalize

Application: Skin Histograms Skin has a very small range of (intensity independent) colours, and little texture –Compute colour measure, check if colour is in this range, check if there is little texture (median filter) –See this as a classifier - we can set up the tests by hand, or learn them. –get class conditional densities (histograms), priors from data (counting) Classifier is

Finding skin color 3D histogram in RGB space M. J. Jones and J. M. Rehg, Statistical Color Models with Application to Skin Detection, Int. J. of Computer Vision, 46(1):81-96, Jan 2002.

Histogram skinnon-skin

Results Note: We have assumed that all pixels are independent! Context is ignored

Confusion matrix true positive = hit false positive = false alarm = false detection = Type I error false negative = miss = false dismissal = Type II error sensitivity = true positive rate = hit rate = recall TPR = TP / (TP+FN) false negative rate FNR = FN / (TP+FN) false positive rate = false alarm rate = fallout FPR = FP / (FP+TN) specificity SPC = TN / (FP+TN) TPR + FNR = 1FPR + SPC = 1

Receiver operating characteristic (ROC) curve FPR TPR equal error rate (EER) = 88% confusion matrix for image classifier:

Cross-validation

Naïve Bayes Quantize image patches, then compute a histogram of patch types within a face But histograms suffer from the curse of dimensionality Histogram in N dimensions is intractable with N>5 To solve this, assume independence among the pixels Features are the patch types P(image|face) = P(label 1 at (x 1,y 1 )|face)...P(label k at (x k,y k )|face)

Histograms applied to faces and cars H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000)

Alternative: Kernel density estimation (Parzen windows) K/N is fraction of samples that fall into volume V

Parzen windows Non-parametric technique Center kernel at each data point, sum results (and normalize) to get pdf

Parzen windows

Gaussian Parzen Windows

Parzen Window Density Estimation

Comparison Histograms non-parametric smoothing parameter = # of bins discard data afterwards discontinuous boundaries arbitrary d dimensions  M d bins (curse of dimensionality) Parzen windows non-parametric smoothing parameter = size of kernel need data always discontinuous (box) or continuous (Gaussian) boundaries data driven (box) or no boundaries (Gaussian) dimensionality not as much of a curse

Another alternative: Locally Weighted Averaging (LWA) Keep instance database At each query point, form locally weighted average Equivalent to Parzen windows memory based, lazy learning, applicable to any kernel, can be slow f(i) = 1 for positive examples, 0 for negative examples

LWA Classifier, Circular Kernel Kernel Weights Data, 2 classes LWA Posterior All Data

K-Nearest Neighbors Classification = majority vote of K nearest neighbors

Recognition by finding patterns We have seen very simple template matching (under filters) Some objects behave like quite simple templates –Frontal faces Strategy: –Find image windows –Correct lighting –Pass them to a statistical test (a classifier) that accepts faces and rejects non-faces

Finding faces Faces “look like” templates (at least when they’re frontal). General strategy: –search image windows at a range of scales –Correct for illumination –Present corrected window to classifier Issues –How corrected? –What features? –What classifier? classifier learner feature extraction training database test image training image decision

Face detection

Face recognition

Linear discriminant functions g(x) = w T x+w 0 decision surface is hyperplane w is perpendicular to hyperplane neural network: combination of linear discriminant functions sigmoid function is differentiable, enables backpropagation

Neural networks for detecting faces Henry A. Rowley, Shumeet Baluja, and Takeo Kanade, Neural Network-Based Face Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 20, number 1, pages 23-38, January 1998.

Neural networks for detecting faces positive training images: scaled, rotated, translated, and mirrored negative training images

Neural networks for detecting faces

Arbitration

Bootstrapping Hardest examples to classify are those near the decision boundary These are also the most useful for training Approach: Run detector, find examples of misclassification, feed back into training process

Results

Real-time face detection Components –Cascade architecture –Box sum features (integral image) H1H1 H2H2 HnHn Non-face Face Viola and Jones, CVPR 2001

Haar-like features (Integral image makes computation fast)

More features

Example Feature’s value is calculated as the difference between the sum of the pixels within white and black rectangle regions.

Boosting

Adaboost The more distinctive the feature, the larger the weight.

Training images

Results

Training Viola-JonesDirect Feature Selection (two orders of magnitude faster) Jianxin Wu, James M. Rehg, Matthew D. Mullin. Learning a Rare Event Detection Cascade by Direct Feature Selection, NIPS 2003.

Using OpenCV detector 1.Collect a database of positive samples and a database of negative samples. 2.Mark object by objectmarker.exe 3.Build a vec file out of positive samples using createsamples.exe 4.Run haartraining.exe to build the classifier. 5.Run performance.exe to evaluate the classifier. 6.Run haarconv.exe to convert classifier to.xml file

Using OpenCV detector 1.Mark positive samples: info.txt 2.Use createsamples,exe to pack the positive samples into “hw.vec” file. createsamples –info info.txt –vec hw.vec –w 15 –h 12 (The minimum size of marked object was 15 by 12) 3.Use haartraining.exe to train the classifier. haartraining –data hw –vec hw.vec -bg background.txt – mem 100 –w 15 –h 12 –nstages 18 4.Convert classifier to xml. Convert hw hw.xml Use performance.exe to check the performance. performance –dada hw.xml –info.txt –w 15 –h 12 –ni 6.Use PatternDetector class in Blepo to display the results m_Detector = new PatternDetector(xml_file_name); 7.In the results, you will see a object detected twice or more, with overlap. from Zhichao Chen

Using OpenCV detector Result from checking performance: Here you can see that the classifier detected 469 positive objects and missed 36. The false positive is bigger(1991), because A positive object might be detected many times and the positions are slightly different. Some “good” detections are regarded as “false” We only used 18 stages. More stages would reduce the false positives, at the expense of more training time. No background image was included for training. Conclusions: Use the proper sample size for training. Basically, the sample size should be similar to the minimum size of the marked object. If the FPR is too high, increase the number of stages. from Zhichao Chen

OpenCV detector links Original Viola-Jones paper: PR2001.pdf PR2001.pdf OpenCV library: How-to build a cascade of boosted classifiers based on Haar- like features: bjectDetection_HowTo.pdf Objectmarker.exe and haarconv.exe, *.dll: from Zhichao Chen

Fisher linear discriminant

Linear SVMs

Non-linear SVMs

Eigenfaces