Generic object detection with deformable part-based models

Slides:



Advertisements
Similar presentations
Classification using intersection kernel SVMs is efficient
Advertisements

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford.
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Presenter: Duan Tran (Part of slides are from Pedro’s)
Jan-Michael Frahm, Enrique Dunn Spring 2013
Ľubor Ladický1 Phil Torr2 Andrew Zisserman1
Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.
Large Scale Visual Recognition Challenge (ILSVRC) 2013: Detection spotlights.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Lecture 31: Modern object recognition
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Steerable Part Models Hamed Pirsiavash and Deva Ramanan
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles { y t (x) = +1 if h t (x) >  t -1 otherwise Unique.
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Object Recognizing We will discuss: Features Classifiers Example ‘winning’ system.
Object Detection using Histograms of Oriented Gradients
Spatial Pyramid Pooling in Deep Convolutional
What, Where & How Many? Combining Object Detectors and CRFs
Lecture 29: Recent work in recognition CS4670: Computer Vision Noah Snavely.
From R-CNN to Fast R-CNN
Object Recognizing. Object Classes Individual Recognition.
Object Recognizing. Recognition -- topics Features Classifiers Example ‘winning’ system.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Detection, Segmentation and Fine-grained Localization
Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Pedestrian Detection and Localization
O BJECT D ETECTION WITH D ISCRIMINATIVELY T RAINED P ART B ASED M ODELS PRESENTED BY Xiaolong Wang.
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Training and Evaluating of Object Bank Models Presenter : Changyu Liu Advisor : Prof. Alex Interest : Multimedia Analysis May 16 th, 2013.
Object detection, deep learning, and R-CNNs
Methods for classification and image representation
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Object Recognizing. Object Classes Individual Recognition.
CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.
More sliding window detection: Discriminative part-based models
Object Recognizing. Object Classes Individual Recognition.
A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.
Fine-grained Fine-grained Recognition( 细粒度分类 ) 沈志强.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Recent developments in object detection
Cascade for Fast Detection
Object detection with deformable part-based models
Data Driven Attributes for Action Detection
Lit part of blue dress and shadowed part of white dress are the same color
Object detection, deep learning, and R-CNNs
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Object detection as supervised classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Object detection.
HOGgles Visualizing Object Detection Features
An HOG-LBP Human Detector with Partial Occlusion Handling
Image Classification.
“The Truth About Cats And Dogs”
Object Detection + Deep Learning
Outline Background Motivation Proposed Model Experimental Results
RCNN, Fast-RCNN, Faster-RCNN
Presentation transcript:

Generic object detection with deformable part-based models Many slides based on P. Felzenszwalb

Challenge: Generic object detection

Histograms of oriented gradients (HOG) Partition image into blocks at multiple scales and compute histogram of gradient orientations in each block 10x10 cells 20x20 cells N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005 Image credit: N. Snavely

Histograms of oriented gradients (HOG) Partition image into blocks at multiple scales and compute histogram of gradient orientations in each block N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005 Image credit: N. Snavely

Pedestrian detection with HOG Train a pedestrian template using a linear support vector machine positive training examples negative training examples N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005

Pedestrian detection with HOG Train a pedestrian template using a linear support vector machine At test time, convolve feature map with template Find local maxima of response For multi-scale detection, repeat over multiple levels of a HOG pyramid HOG feature map Template Detector response map N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005

Example detections [Dalal and Triggs, CVPR 2005]

Are we done? Single rigid template usually not enough to represent a category Many objects (e.g. humans) are articulated, or have parts that can vary in configuration Many object categories look very different from different viewpoints, or from instance to instance Slide by N. Snavely

Discriminative part-based models Root filter Part filters Deformation weights P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010

Discriminative part-based models Multiple components P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010

Discriminative part-based models P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010

Object hypothesis Multiscale model: the resolution of part filters is twice the resolution of the root

Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costs Subwindow features Displacements Filters Deformation weights

Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costs Subwindow features Displacements Filters Deformation weights Concatenation of filter and deformation weights Concatenation of subwindow features and displacements

Detection Define the score of each root filter location as the score given the best part placements:

Detection Define the score of each root filter location as the score given the best part placements: Efficient computation: generalized distance transforms For each “default” part location, find the score of the “best” displacement Head filter Deformation cost

Detection Define the score of each root filter location as the score given the best part placements: Efficient computation: generalized distance transforms For each “default” part location, find the score of the “best” displacement Distance transform Head filter responses Head filter

Detection

Detection result

Training Training data consists of images with labeled bounding boxes Need to learn the filters and deformation parameters

Training Our classifier has the form w are model parameters, z are latent hypotheses Latent SVM training: Initialize w and iterate: Fix w and find the best z for each training example (detection) Fix z and solve for w (standard SVM training) Issue: too many negative examples Do “data mining” to find “hard” negatives

Car model Component 1 Component 2

Car detections

Person model

Person detections

Cat model

Cat detections

Bottle model

More detections

Quantitative results (PASCAL 2008) 7 systems competed in the 2008 challenge Out of 20 classes, first place in 7 classes and second place in 8 classes Bicycles Person Bird Proposed approach Proposed approach Proposed approach

Detection state of the art Object detection system overview. Our system (1) takes an input image, (2) extracts around 2000 bottom-up region proposals, (3) computes features for each proposal using a large convolutional neural network (CNN), and then (4) classifies each region using class-specific linear SVMs. R-CNN achieves a mean average precision (mAP) of 53.7% on PASCAL VOC 2010. For comparison, Uijlings et al. (2013) report 35.1% mAP using the same region proposals, but with a spatial pyramid and bag-of-visual-words approach. The popular deformable part models perform at 33.4%. R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014, to appear.