O BJECT D ETECTION WITH D ISCRIMINATIVELY T RAINED P ART B ASED M ODELS PRESENTED BY Xiaolong Wang
D ETECTION
C HALLENGE Deformation Part of the Slides From Ross Girshick
C HALLENGE Viewpoint
C HALLENGE Variable structure
C HALLENGE Images from Chaitanya Desai
2-layer Model Deformable D EFORMABLE P ART M ODELS Leo Zhu, CVPR 2010
HOG P YRAMID Root Filter Part Filters
F ORMULATION One root (i=0) + n parts. Model Parameters for HOG HOG Features Model Parameters for Deformation
I NFERENCE
M ULTI - VIEWS
L ATENT O RIENTATION No orientation in PAMI paper (DPM v3) Use latent orientation (DPM v4) Guess what is it? right-facing horse
U NSUPERVISED ORIENTATION CLUSTERING
L ATENT O RIENTATION Inference: Choose the best view and best orientation. Learning: Train the parameters for 3 views, and flip the weights to get 3*2 views.
H OW IMPORTANT IT IS One view:42.1% 3-view: 47.3% 3*2-view: 56.8% For horse:
H OW IMPORTANT IT IS For all classes (DPM v4):
L EARNING Linear Formulation Putting all features in one vector Latent variable z represents part locations (and component index for multi-views)
L ATENT SVM
Detection on Positive Samples Sliding window Overlap with root-node window > 0.7
L ATENT SVM Hard Negative Mining Carl Vondrick HOGgles, ICCV 2013
L ATENT SVM Hard Negative Mining Small or no overlap High detection score Maintaining Sample Cache Select no more than 500 negative samples per image; Cache size = 20000
L ATENT SVM Dual Method Not scalable. Stochastic gradient descent(DPM v4) Important: Shuffle everytime! LBFGS(DPM v5) Second-order Newton Method Faster & better performance
3- STEP I NITIALIZATION Step-1: Only Train Root Filter positive data (highest overlap) No hard negative mining Car
3- STEP I NITIALIZATION Step-2: Merg Components Setting root selection as latent variable
3- STEP I NITIALIZATION Step-3: Initialize Part Filters Fix part number as 8 (DPM v4/5) Sliding window, calculate L1/L2 norm of the positive weights.
P OST P ROCESSING Bounding Box Regression Linear regression for (x1,y1,x2,y2) Non-Maximum Suppression Pick up high score boxes Context
C ONTEXT Marr Prize 2009 Context SVM,CVPR2010 segDPM,CVPR2013
N UMBERS VOC 2010: 29.6 and 32.2 VOC 2007: 33.7 and 35.4 VOC 2010: segDPM(with tons of things) 40.4
L ARGE - SCALE D ATASET ImageNet 2013 DPM v4 in cpp
S UMMARY Although DPMs is loosing to CNNs, the techniques and small tricks we learned from DPMs help solving many other vision problems.
Q UESTIONS