Thomas Berg and Peter Belhumeur “POOF: Part Based One-vs-One Features for Fine Grained Categorization, Face Verification, and Attribute Estimation” Thomas Berg and Peter Belhumeur CVPR 2013 VGG Reading Group 4.7.2013 Eric Sommerlade
Summary A POOF is a scalar defined Perks: for a discriminative region between two classes and two landmarks for a set of base features (e.g. HOG or colour hist.) Perks: Regions automatically learned from data set Great Performance transfers in knowledge from external datasets
Motivation: Standard approach to part based recognition: - extract standard feature (SIFT, HOG, LBP) - train classifier - relevant regions tuned by hand Idea: “standard” features hardly optimal for specific problem “best” according to - domain (dog features != bird features) - task (face recognition != gender classification)
POOF feature learning: From dataset with landmark annotations
POOF feature learning: Choose feature part f Choose alignment part a Align and crop to 128x64 region Larger/shorter distance -> coarser/finer scale
POOF feature learning: Scales: 8x8 and 16x16 8*16 + 4*8 = 160 cells
POOF feature learning: Per cell: 8 bin gradient direction histogram Dg=8 (‘gradhist’) Or Felsenszwalb HOG: Dg=31 Color histogram Dc=32 Concatenated length (Dg+Dc)*160
POOF feature learning: For each scale (8x8, 16x16): learn linear SVM, get weights w Keep max abs(w_i) per cell Keep cells with max(w_c)>=median(max(w_c)) keep connected component (4?) starting at f W: c1 c2 cn … c1 c2 cn max: … c1 c2 cn threshold: …
POOF feature learning: retrain SVM on selected cells only Get POOF (bitmap+svm weight vector):
POOF feature extraction: Find corresponding landmarks Authors use Belhumeur CVPR 2011 Align & crop to 128x64 region Get base features Get SVM score from features in masked region
Results: categorization UCSD birds dataset, 200 classes 13 landmarks used About 5m POOF combinations possible Randomly chosen subset of 5000 POOFs Use as feature vector in one-vs-all linear SVM Evaluation on gt bbox of object gt landmarks or detected landmarks
Results: categorization
Results: categorization gradhist HOG lowlevel baseline [27] [4] (MKL) [33] (RF) [32] [8] [35] 200det 54 56 28 14det 65 70 57 200gt 69 73 40 17 19 14gt 80 85 44 5det 55
Results: Face Verification Are two images of the same person? LFW dataset 16 landmarks 120 subjects ~3.5m POOF choices Each image yields 10000 random POOFs f(I) For image pair concat [|f(I)-f(J)| f(I).*f(J)] Train same-vs-different classifier
Results: Face Verification
Results: Face Verification Performance equal to Tom-vs-Pete (bmvc2012) But: Support regions learned automatically Linear SVM, not RBF faster Uses same “identity preserving alignment” on landmark detections [2] input affine canonical Mean of all closest in dataset
Results: Attribute classification Attributes such as gender, “big nose”, “eyeglasses” (Kumar [14]) POOFs learned as before, on LFW dataset Extracts POOFs from attribute dataset Train linear SVM for each attribute POOFs transfer discriminability from different classes no need for fully labelled attribute dataset
Results: Attribute classification Restricted number of attribute samples POOF features don’t latch on to noise …