Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

Slides:



Advertisements
Similar presentations
POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts Mathieu Bray Pushmeet Kohli Philip H.S. Torr Department of.
Advertisements

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford.
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
Presenter: Duan Tran (Part of slides are from Pedro’s)
Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Face Alignment with Part-Based Modeling
Crash Course on Machine Learning Part IV Several slides from Derek Hoiem, and Ben Taskar.
Large Scale Visual Recognition Challenge (ILSVRC) 2013: Detection spotlights.
Lecture 31: Modern object recognition
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Face Detection, Pose Estimation, and Landmark Localization in the Wild
Articulated Pose Estimation with Flexible Mixtures of Parts
Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.
Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/10/12.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray.
1. Introduction Humanising GrabCut: Learning to segment humans using the Kinect Varun Gulshan, Victor Lempitksy and Andrew Zisserman Dept. of Engineering.
Robust Higher Order Potentials For Enforcing Label Consistency
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr.
1 Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University.
On the Object Proposal Presented by Yao Lu
What, Where & How Many? Combining Object Detectors and CRFs
Lecture 29: Recent work in recognition CS4670: Computer Vision Noah Snavely.
Generic object detection with deformable part-based models
Face Alignment Using Cascaded Boosted Regression Active Shape Models
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca
Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Pedestrian Detection and Localization
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.
Associative Hierarchical CRFs for Object Class Image Segmentation Ľubor Ladický 1 1 Oxford Brookes University 2 Microsoft Research Cambridge Based on the.
Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.
Discriminative Sub-categorization Minh Hoai Nguyen, Andrew Zisserman University of Oxford 1.
1 Scale and Rotation Invariant Matching Using Linearly Augmented Tree Hao Jiang Boston College Tai-peng Tian, Stan Sclaroff Boston University.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
More sliding window detection: Discriminative part-based models
A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.
MASKS © 2004 Invitation to 3D vision Lecture 3 Image Primitives andCorrespondence.
Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Cascade for Fast Detection
Object detection with deformable part-based models
Data Driven Attributes for Action Detection
Performance of Computer Vision
Nonparametric Semantic Segmentation
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Object detection as supervised classification
A Tutorial on HOG Human Detection
Bilinear Classifiers for Visual Recognition
Weakly Supervised Action Recognition
Presentation transcript:

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický1 Phil Torr2 Andrew Zisserman1 1 University of Oxford 2 Oxford Brookes University 1 1 1

Human Detection Find all objects of interest Enclose them tightly in a bounding box 2

Human Detection Find all objects of interest Enclose them tightly in a bounding box 3

HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

HOG Detector Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

HOG Detector Does not fit well ! Sliding window using learnt HOG template Post-processing using non-maxima suppression Dalal & Triggs CVPR05

Deformable Part-based Model Allows parts to move relative to the centre Effectively allows the template to deform Multiple models based on an aspect ratio Felzenszwalb et al. CVPR08

Deformable Part-based Model Allows parts to move relative to the centre Effectively allows the template to deform Multiple models based on an aspect ratio Felzenszwalb et al. CVPR08

Comparison with other approaches HOG template (no deformation) Part-based model (rigid movable parts) Our model (deformation field) 14

HOG Detector Classifier response : Dalal & Triggs CVPR05 weights bias HOG feature over the set of cells c Dalal & Triggs CVPR05

HOG Detector Classifier response : The weights w* and the bias b* learnt using the Linear SVM as: regularization number of training samples hinge loss ground truth labels Dalal & Triggs CVPR05

Detector with Deformation Field Cells c displaced by the deformation field d:

over the deformed template Regularization of the deformation field Detector with Deformation Field Cells c displaced by the deformation field d: Classifier response : HOG feature over the deformed template Regularization of the deformation field

Detector with Deformation Field Cells c displaced by the deformation field d: Classifier response : Regularisation takes the form of a smoothness MRF prior: Pairwise weight Pairwise cost

Detector with Deformation Field Cells c displaced by the deformation field d: Classifier response : The weights w* and the bias b* learnt using the Latent Linear SVM as:

Why hasn’t anyone tried it before? Detector with Deformation Field Why hasn’t anyone tried it before?

Why hasn’t anyone tried it before? Detector with Deformation Field Why hasn’t anyone tried it before? Latent models with many latent variables tend to over-fit Inference not feasible for a sliding window

To resolve these problems we propose: Detector with Deformation Field To resolve these problems we propose: Flexible constraints on the deformation field which avoid over-fitting Feasible inference method under these constraints Clustering of the training data into multiple models

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Locally Affine Deformation Field We restrict the deformation field to be locally affine ( ):

Optimisation Weights / bias (w*, b*) and the deformation fields dk estimated iteratively

Optimisation Weights / bias (w*, b*) and the deformation fields dk estimated iteratively Given the deformation fields the problem is a standard linear SVM:

Optimisation Given (w*, b*) the problem is a constrained MRF optimisation:

Optimisation Given (w*, b*) the problem is a constrained MRF optimisation: The last can be decomposed as :

Optimisation Given (w*, b*) the problem is a constrained MRF optimisation: The last can be decomposed as : By defining the optimisation becomes:

Didn’t we make the problem harder ? Optimisation Given (w*, b*) the problem is a constrained MRF optimisation: The last can be decomposed as : By defining the optimisation becomes: Didn’t we make the problem harder ?

Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell

Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy)

Optimisation Columns move

Optimisation Rows move

Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy) These moves do not alter the local affinity Columns move Rows move

Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy) These moves do not alter the local affinity Both moves can be solved quickly using dynamic programming

Optimisation The location of the cells in the first row and in the first column fully determine the location of each cell Any locally affine deformation field can be reached by two moves : move all columns where each column i can move by (Δcdix ,Δcdiy) move all rows where each row j can move by (Δrdjx ,Δrdjy) These moves do not alter the local affinity Both moves can be solved quickly using dynamic programming Works for any form of pairwise potentials

Learning multiple poses / viewpoints We define a similarity measure between two training samples as : where

Learning multiple poses / viewpoints We define a similarity measure between two training samples as : where K-medoid clustering of S matrix clusters the data into multi models

Experiments Buffy dataset (typically used for pose estimation) Contains large variety of poses, viewpoints and aspect ratios Consists of 748 images Episode s5e3 used for training Episode s5e4 used for validation Episodes s5e2, s5e5 and s5e6 used for testing Ferrari et al. CVPR08

Clustering of training samples Each row corresponds to one model (out of 10 models)

Qualitative results

Qualitative results

Qualitative results

Quantitative results

Conclusion and Further Work We propose Novel inference for locally affine deformation field (LADF) Object detector using LADF Clustering using LADF Further work Explore usability for other vision problems (tracking, flow) Explore generalisations of LADF where the same inference method is applicable

Thank you Questions? 56 56