Download presentation
Presentation is loading. Please wait.
Published byMorgan Johnson Modified over 9 years ago
1
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce
2
Introduction Invariant local descriptors => robust recognition of specific objects or scenes Recognition of textures and object classes => description of intra-class variation, selection of discriminant features, spatial relations texture recognitioncar detection
3
1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview
4
Affine-invariant texture recognition Texture recognition under viewpoint changes and non-rigid transformations Use of affine-invariant regions –invariance to viewpoint changes –spatial selection => more compact representation, reduction of redundancy in texton dictionary [A sparse texture representation using affine-invariant regions, S. Lazebnik, C. Schmid and J. Ponce, CVPR 2003]
5
Spatial selection clustering each pixelclustering selected pixels
6
Overview of the approach
7
Harris detector Laplace detector Region extraction
8
Descriptors – Spin images
9
Signature and EMD Hierarchical clustering => Signature : Earth movers distance –robust distance, optimizes the flow between distributions –can match signatures of different size –not sensitive to the number of clusters S S = { ( m 1, w 1 ), …, ( m k, w k ) } SS’ D( S, S’ ) = [ i,j f ij d( m i, m’ j )] / [ i,j f ij ]
10
Database with viewpoint changes 20 samples of 10 different textures
11
Results Spin images Gabor-like filters
12
1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview
13
A two-layer architecture Texture recognition + segmentation Classification of individual regions + spatial layout [A generative architecture for semi-supervised texture recognition, S. Lazebnik, C. Schmid, J. Ponce, ICCV 2003]
14
A two-layer architecture Modeling : 1.Distribution of the local descriptors (affine invariants) Gaussian mixture model estimation with EM, allows incorporating unsegmented images 2.Co-occurrence statistics of sub-class labels over affinely adapted neighborhoods Segmentation + Recognition : 1.Generative model for initial class probabilities 2.Co-occurrence statistics + relaxation to improve labels
15
Texture Dataset – Training Images T1 (brick)T2 (carpet)T3 (chair)T4 (floor 1) T5 (floor 2) T6 (marble)T7 (wood)
16
Effect of relaxation + co-occurrence Original image Top: before relaxation (indivual regions), bottom: after relaxation (co-occurrence)
17
Recognition + Segmentation Examples
18
Animal Dataset – Training Images no manual segmentation, weakly supervised 10 training images per animal (with background) no purely negative images
19
Recognition + Segmentation Examples
20
1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview
21
Object class detection/classification Description of intra-class variations of object parts [Selection of scale inv. regions for object class recognition, G. Dorko and C. Schmid, ICCV’03]
22
Object class detection/classification Description of intra-class variations of object parts Selection of discrimiant features (weakly supervised)
23
Training the model Training phase 1 –Input : Images of the object with background (positive images), no normalization, alignment of the image –Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT –Clustering : estimation of Gaussian mixture with EM
24
Training the model Training phase 1 –Input : Images of the object with background (positive images), no normalization, alignment of the image/object –Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT –Clustering : estimation of Gaussian mixture with EM
25
Training the model Training phase 2 (selection) –Input : verification set, positive and negative images –Rank each cluster with likelihood (or mutual information) –MAP classifier with the n top clusters
26
5 LikelihoodMutual Information 25 Likelihood – mutual information –likelihood: more discriminant but very specific –mutual Information: discriminant but not too specific
27
Results for test images Harris-Laplace 354 points49 correct + 37 incorrect31 correct + 20 incorrect 25 Likelihood10 Mutual InformationDetection Harris-Laplace 277 points43 correct + 36 incorrect26 correct + 20 incorrect
28
Relaxation – propagation of probablities
29
Classification Assign each test descriptor to the most probable cluster (MAP) Each descriptor assigned to one of the top n clusters is positive If the number of positive descriptors are above a threshold p classify the image as positive
30
Classification experiments AirplanesMotorbikes Wild Cats Training Phase 1 #Positive images200 25 Training Phase 2 #Positive images200 25 #Negative images450 Testing #Positive images400 50 #Negative images450 Training Verification Test http://www.robots.ox.ac.uk/~vgg/dataCorel Image Library
31
Results: Motorbikes Equal-Error-Rates as a function of p. Receiver-Operating-Characteristic p=6
32
BestEstimated pp=6Fergus p%p%% Airplanes Harris 897,559797.25- Kadir 18973096.59694 Motorbikes Harris 99959898.25- Kadir 1998.753298.259896 Wild Cats Harris 3194349272- Kadir 178645828490 97.5 99 94 Classification results: ROC equal error rates
33
1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview
34
Matching collections of local affine-invariant regions that map with an affine transformation => part Matching works for unsegmented images Model = a collection of parts A Affine-invariant part models
35
Matching: Faces spurious match
36
Matching: 3D Objects closeup
37
Matching: Finding Repeated Patterns
38
Matching: Finding Symmetry
39
Modeling for Recognition Match multiple pairs of training images to produce several candidate parts. Use additional validation images to evaluate repeatability of parts and individual patches. Retain a fixed number of parts having the best repeatability score as class model. No background model
40
The Butterfly Dataset 16 training images (8 pairs) per class 10 validation images per class 437 test images 619 images total
41
Butterfly Models Top two rows: pairs of images used for modeling. Bottom two rows: closeup views of some of the parts making up the models of the seven butterfly classes.
42
Recognition Top 10 models per class used for recognition Multi-class classification results: total model size (smallest/largest)
43
Classification Rate vs. Number of Parts Number of parts
44
Successful Detection Examples Model part Yellow: detected in test image Blue: occluded in test image Test image: All ellipses Test image: Matched ellipses Note: only one of the two training images is shown
45
Successful Detection Examples (cont.)
46
Detection of Multiple Instances
47
Detection Failures
48
Future Work Spatial relation –non-rigid models –relations between clusters and affine-invariant parts Feature selection: dimensionality reduction Shape information: appropriate descriptors Rapid search: structuring of the data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.