Download presentation
Published byAlfred Lamb Modified over 8 years ago
2
Lecture 10 Pattern Recognition and Classification II
Mata kuliah : T Computer Vision Tahun : 2010 Lecture 10 Pattern Recognition and Classification II
3
Learning Objectives After carefully listening this lecture, students will be able to do the following : demonstrate the use of PCA technique in face recognition Explain a real-time robust object detection procedure developed by Viola and Jones. January 20, 2010 T Computer Vision
4
Feature Selection What features to use? How do we extract them from the image? Using images themselves as feature vectors is easy, but has problem of high dimensionality A 128 x 128 image = 16,384-dimensional feature space! What do we know about the structure of the categories in feature space? Intuitively, we want features that result in well-separated classes January 20, 2010 T Computer Vision
5
Dimensionality Reduction
Functions yi = yi(x) can reduce dimensionality of feature space More efficient classification If chosen intelligently, we won’t lose much information and classification is easier Common methods Principal components analysis (PCA): Projection maximizing total variance of data Fisher’s Linear Discriminant (FLD): Maximize ratio of between-class variance to within-class variance January 20, 2010 T Computer Vision
6
Geometric Interpretation of Covariance
Covariance C = X XT can be thought of as linear transform that redistributes variance of unit normal distribution, where zero-mean X is n (number of dimensions) x d (number of points) January 20, 2010 T Computer Vision adapted from Z. Dodds
7
Geometric Factorization of Covariance
SVD of covariance matrix C = RT D R describes geometric components of transform by extracting: Diagonal scaling matrix D Rotation matrix R E.g., given points , the covariance factors as 2nd-best axis X = 2 5 -2 -5 1 -1 cos, sin of 70 “best” axis X XT = 2.5 5 13 15 .5 = .37 .93 -.93 major, minor axis lengths January 20, 2010 T Computer Vision
8
PCA for Dimensionality Reduction
Any point in n-dimensional feature space can be expressed as a linear combination of the n eigenvectors (the rows of R) via a set of weights [1 , 2, …, n] (this is just a coordinate system change) By projecting points onto only the first k << n principal components (eigenvectors with the largest eigenvalues), we are essentially throwing away the least important feature information January 20, 2010 T Computer Vision
9
Projection onto Principal Components
Full n-dimensional space (here n = 2) k-dimensional subspace (here k = 1) January 20, 2010 T Computer Vision adapted from Z. Dodds
10
Face Recognition ? 20 faces (i.e., classes), 9 examples (i.e., training data) of each January 20, 2010 T Computer Vision
11
Simple Face Recognition
Idea: Search over training set for most similar image (e.g., in SSD sense) and choose its class This is the same as a 1-nearest neighbor classifier when feature space = image space Issues Large storage requirements (nd, where n = image space dimensionality and d = number of faces in training set) Correlation is computationally expensive January 20, 2010 T Computer Vision
12
Eigenfaces Idea: Compress image space to “face space” by projecting onto principal components (“eigenfaces” = eigenvectors of image space) Represent each face as a low-dimensional vector (weights on eigenfaces) Measure similarity in face space for classification Advantage: Storage requirements are (n + d) k instead of nd January 20, 2010 T Computer Vision
13
Eigenfaces: Initialization
Calculate eigenfaces Compute n–dimensional mean face Compute difference of every face from mean face j = j - Form covariance matrix of these C = AAT, where A = [1, 2, …, d] Extract eigenvectors ui from C such that Cui = iui Eigenfaces are k eigenvectors with largest eigenvalues Example eigenfaces January 20, 2010 T Computer Vision
14
Eigenfaces: Initialization
Project faces into face space Get eigenface weights for every face in the training set The weights [j1 , j2, …, jk] for face j are computed via dot products ji = ui T j January 20, 2010 T Computer Vision
15
Calculating Eigenfaces
Obvious way is to perform SVD of covariance matrix, but this is often prohibitively expensive E.g., for 128 x 128 images, C is 16,384 x 16,384 Consider eigenvector decomposition of d x d matrix ATA: ATAvi = ivi. Multiplying both sides on the left by A, we have AATAvi = iAvi So ui = Avi are the eigenvectors of C = AAT C January 20, 2010 T Computer Vision
16
Eigenfaces: Recognition
Project new face into face space Classify Assign class of nearest face from training set Or, precalculate class means over training set and find nearest mean class face [1 , 2, …, 8] Original face 8 eigenfaces Weights January 20, 2010 T Computer Vision adapted from Z. Dodds
17
Robust Real-time Object Detection by Paul Viola and Michael Jones
Presentation by Chen Goldberg Computer Science Tel Aviv university June 13, 2007
18
About the paper Paul viola Michael Jones
Presented in 2001 by Paul Viola and Michael Jones (published 2002 – IJCV) Specifically demonstrated (and motivated by) the face detection task. Placed a strong emphasis upon speed optimization. Allegedly, was the first real-time face detection system. Was widely adopted and re-implemented. Intel distributes this algorithm in a computer vision toolkit (OpenCV). Paul viola Michael Jones January 20, 2010 T Computer Vision
19
Framework scheme Framework consists of :
Trainer Detector The trainer is supplied with positive and negative samples: Positive samples – images containing the object. Negative samples – images not containing the object. The trainer then creates a final classifier. A lengthy process, to be calculated offline. The detector utilizes the final classifier across a given input image. January 20, 2010 T Computer Vision
20
Abstract detector Iteratively sample image windows.
Operate Final Classifier on each window, and mark accordingly. Repeat with larger window. January 20, 2010 T Computer Vision
21
Features We describe an object using simple functions also called: Harr-like features. Given a sub-window, the feature function calculates a brightness differential. For example: The value of a two-rectangle feature is the difference between the sum of the pixels within the two rectangular regions. January 20, 2010 T Computer Vision
22
Features example Faces share many similar properties which can be represented with Haar-like features For example, it is easy to notice that: The eye region is darker than the upper-cheeks. The nose bridge region is brighter than the eyes. January 20, 2010 T Computer Vision
23
Three challenges ahead
How can we evaluate features quickly? Feature calculation is critically frequent. Image scale pyramid is too expensive to calculate. How do we obtain the best representing features possible? How can we refrain from wasting time on image background? (i.e. non-object) January 20, 2010 T Computer Vision
24
Introducing Integral Image
Definition: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. we can calculate the integral image representation of the image in a single pass. January 20, 2010 T Computer Vision
25
Rapid evaluation of rectangular features
Using the integral image representation one can compute the value of any rectangular sum in constant time. For example the integral sum inside rectangle D we can compute as: ii(4) + ii(1) – ii(2) – ii(3) As a result: two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Now that’s fast! January 20, 2010 T Computer Vision
26
Scaling Integral image enables us to evaluate all rectangle sizes in constant time. Therefore, no image scaling is necessary. Scale the rectangular features instead! 1 2 3 4 5 6 January 20, 2010 T Computer Vision
27
Feature selection Given a feature set and labeled training set of images, we create a strong object classifier. However, we have 45,396 features associated with each image sub-window, hence the computation of all features is computationally prohibitive. Hypothesis: A combination of only a small number of discriminant features can yield an effective classifier. Variety is the key here – if we want a small number of features – we must make sure they compensate each other’s flaws. January 20, 2010 T Computer Vision
28
Boosting Boosting is a machine learning meta-algorithm for performing supervised learning. Creates a “strong” classifier from a set of “weak” classifiers. Definitions: “weak” classifier - has an error rate <0.5 (i.e. a better than average advice). “strong” classifier - has an error rate of ε (i.e. our final classifier). מידע נוסף A meta-algorithm is an algorithm that can be usefully considered to have other significant algorithms, not just elementary operations and simple control structures, as its constituents; also an algorithm that has subordinate algorithms as variable and replaceable parameters. Thus a meta-algorithm defines a class of concrete algorithms January 20, 2010 T Computer Vision
29
AdaBoost Stands for “Adaptive boost”.
AdaBoost is a boosting algorithm for searching out a small number of good classifiers which have significant variety. AdaBoost accomplishes this, by endowing misclassified training examples with more weight (thus enhancing their chances to be classified correctly next). The weights tell the learning algorithm the importance of the example. מידע נוסף AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. January 20, 2010 T Computer Vision
30
AdaBoost example Adaboost starts with a uniform distribution of “weights” over training examples. Select the classifier with the lowest weighted error (i.e. a “weak” classifier) Increase the weights on the training examples that were misclassified. (Repeat) At the end, carefully make a linear combination of the weak classifiers obtained at all iterations. Slide taken from a presentation by Qing Chen, Discover Lab, University of Ottawa January 20, 2010 T Computer Vision
31
Back to Feature selection
We use a variation of AdaBoost for aggressive feature selection. Basically similar to the previous example. Our training set consists of positive and negative images. Our simple classifier consists of a single feature. January 20, 2010 T Computer Vision
32
Simple classifier A Simple classifier depends on a single feature.
Hence, there are 45,396 classifiers to choose from. For each classifier we set an optimal threshold such that the minimum number of examples are misclassified. January 20, 2010 T Computer Vision
33
Feature selection pseudo-code
January 20, 2010 T Computer Vision
34
The Attentional Cascade
Overwhelming majority of windows are in fact negative. Simpler, boosted classifiers can reject many of negative sub-windows while detecting all positive instances. A cascade of gradually more complex classifiers achieves good detection rates. Consequently, on average, much fewer features are calculated per window. January 20, 2010 T Computer Vision
35
Training a Cascaded Classifier
Subsequent classifiers are trained only on examples which pass through all the previous classifiers The task faced by classifiers further down the cascade is more difficult. January 20, 2010 T Computer Vision
36
Training a Cascaded Classifier (cont.)
Given false positive rate F and detection rate D, we would like to minimize the expected number of features evaluated per window. Since this optimization is extremely difficult, the usual framework is to choose a minimal acceptable false positive and detection rate per layer. January 20, 2010 T Computer Vision
37
Pseudo-Code for Cascade Trainer
January 20, 2010 T Computer Vision
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.