Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický1 Phil Torr2 Andrew Zisserman1 1 University of Oxford 2 Oxford Brookes University 1 1 1
Human Detection Find all objects of interest Enclose them tightly in a bounding box 2
Human Detection Find all objects of interest Enclose them tightly in a bounding box 3
HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
HOG Detector Does not fit well ! Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05
Deformable Part-based Model Allows parts to move relative to the centre Effectively allows the template to deform Multiple models based on an aspect ratio Felzenszwalb et al. CVPR08
Deformable Part-based Model Allows parts to move relative to the centre Effectively allows the template to deform Multiple models based on an aspect ratio Felzenszwalb et al. CVPR08
Comparison with other approaches HOG template (no deformation) Part-based model (rigid movable parts) Our model (deformation field) 14
HOG Detector Classifier response : Dalal & Triggs CVPR05 weights bias HOG feature over the set of cells c Dalal & Triggs CVPR05
HOG Detector Classifier response : The weights w* and the bias b* learnt using the Linear SVM as: regularization number of training samples hinge loss ground truth labels Dalal & Triggs CVPR05
Detector with Deformation Field Cells c displaced by the deformation field d:
over the deformed template Regularization of the deformation field Detector with Deformation Field Cells c displaced by the deformation field d: Classifier response : HOG feature over the deformed template Regularization of the deformation field
Detector with Deformation Field Cells c displaced by the deformation field d: Classifier response : Regularisation takes the form of a smoothness MRF prior: Pairwise weight Pairwise cost
Detector with Deformation Field Cells c displaced by the deformation field d: Classifier response : The weights w* and the bias b* learnt using the Latent Linear SVM as:
Why hasn’t anyone tried it before? Detector with Deformation Field Why hasn’t anyone tried it before?
Why hasn’t anyone tried it before? Detector with Deformation Field Why hasn’t anyone tried it before? Latent models with many latent variables tend to over-fit Inference not feasible for a sliding window
To resolve these problems we propose: Detector with Deformation Field To resolve these problems we propose: Flexible constraints on the deformation field which avoid over-fitting Feasible inference method under these constraints Clustering of the training data into multiple models
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):
Optimisation Weights / bias (w*, b*) and the deformation fields dk estimated iteratively
Optimisation Weights / bias (w*, b*) and the deformation fields dk estimated iteratively Given the deformation fields the problem is a standard linear SVM:
Optimisation Given (w*, b*) the problem is a constrained MRF optimisation:
Optimisation Given (w*, b*) the problem is a constrained MRF optimisation: The last can be decomposed as :
Optimisation Given (w*, b*) the problem is a constrained MRF optimisation: The last can be decomposed as : By defining the optimisation becomes:
Didn’t we make the problem harder ? Optimisation Given (w*, b*) the problem is a constrained MRF optimisation: The last can be decomposed as : By defining the optimisation becomes: Didn’t we make the problem harder ?
Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell
Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy)
Optimisation Columns move
Optimisation Rows move
Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy) These moves do not alter the local affinity Columns move Rows move
Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy) These moves do not alter the local affinity Both moves can be solved quickly using dynamic programming
Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy) These moves do not alter the local affinity Both moves can be solved quickly using dynamic programming Works for any form of pairwise potentials
Learning multiple poses / viewpoints We define a similarity measure between two training samples as : where
Learning multiple poses / viewpoints We define a similarity measure between two training samples as : where K-medoid clustering of S matrix clusters the data into multi models
Experiments Buffy dataset (typically used for pose estimation) Contains large variety of poses, viewpoints and aspect ratios Consists of 748 images Episode s5e3 used for training Episode s5e4 used for validation Episodes s5e2, s5e5 and s5e6 used for testing Ferrari et al. CVPR08
Clustering of training samples Each row corresponds to one model (out of 10 models)
Qualitative results
Qualitative results
Qualitative results
Quantitative results
Conclusion and Further Work We propose Novel inference for locally affine deformation field (LADF) Object detector using LADF Clustering using LADF Further work Explore usability for other vision problems (tracking, flow) Explore generalisations of LADF where the same inference method is applicable
Thank you Questions? 56 56