1 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George Papandreou Toyota Technological Institute at Chicago Iasonas Kokkinos Ecole Centrale Paris/INRIA
2 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search TTIC_ECP entry in a nutshell Goal: Invariance in Deep CNNs Part 1: Deep epitomic nets: local translation (deformation) Part 2: Global scaling and translation Top-5 error. All DCNNs have 6 convolutional and 2 fully-connected layers. (0) Baseline: max-pooled net 13.0% (1) epitomic DCNN 11.9% (2) epitomic DCNN+ search 10.56% Fusion (1)+(2) 10.22% ~1% gain ~1.5% gain
3 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search LeCun et al.: Gradient-Based Learning Applied to Document Recognition, Proc. IEEE 1998 Krizhevsky et al.: ImageNet Classification with Deep CNNs, NIPS 2012 Cascade of convolution + max-pooling blocks (deformation-invariant template matching) Deep Convolutional Neural Networks (DCNNs) Our work: different blocks (P1) & different architecture (P2) convolutionalfully connected
4 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Part 1: Deep epitomic nets
5 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Epitomes: translation-invariant patch models Patch Templates Jojic, Frey, Kannan: Epitomic analysis of appearance and shape, ICCV 2003 EM-based training Benoit, Mairal, Bach, Ponce: Sparse image representation with epitomes, CVPR 2011 Grosse, Raina, Kwong, Ng: Shift-invariant sparse coding, UAI 2007 Epitomes: a lot more for just a bit more Separate modeling: more data & less power per parameter
6 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Papandreou, Chen, Yuille: Modeling Image Patches with a Dictionary of Mini-Epitomes, CVPR14 Mini-epitomes for image classification Dictionary of mini-epitomesDictionary of patches (K-means) Gains in (flat) BoW classification
7 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search From flat to deep: Epitomic convolution Max-Pooling Epitomic Convolution G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June Max over image positions Max over epitome positions
8 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Convolutional Nets Convolution + max-pooling Epitomic convolution Supervised dictionary learning by back-propagation G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.
9 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Convolutional Nets Parameter sharing: faster and more reliable model learning (0) Baseline: max-pooled net 13.0% (1) epitomic DCNN 11.9% ~1% gain Consistent improvements
10 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Part 2: Global scaling and translation
11 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Scale-dependent (area) Category-dependent (ear detector) Dogs
12 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Scale-dependent Category-dependent (ear detector) Dogs Skyscrapers
13 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Scale-dependent Category-dependent (ear detector) Dogs Skyscrapers Training set
14 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Scale-dependent Category-dependent (ear detector) Dogs Skyscrapers Rule: Large skyscrapers have ears, large dogs don’t
15 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariant classification Scale-dependent Category-dependent A. Howard. Some improvements on deep convolutional neural network based image classification, T. Dietterich et al. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, This work: MIL: End-to-end training! ‘bag’ of features feature
16 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 1: Efficient multi-scale convolutional features stitch pyramid GPU unstitch I(x,y) I(x,y,s) Patchwork(x,y)C(x,y) C(x,y,s) multi-scale convolutional features Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat : ICLR 2014 Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. ECCV 2012 Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet. arXiv x220x3 5x5x512
17 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 2: From fully connected to fully convolutional convolutional pyramid GPU I(x,y) Patchwork(x,y) F(x,y) stich I(x,y,s) 220x220x3 1x1x4096 convolutionalfully connected
18 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 3: Global max-pooling pyramid GPU I(x,y) Patchwork(x,y) stich I(x,y,s) For free: argmax yields 48% localization error (0) Baseline: max-pooled net 13.0% (1) epitomic DCNN 11.9% ~1% gain (2) epitomic DCNN+ search 10.56% ~1.5% gain learned class-specific bias Fusion (1)+(2) 10.22% Consistent, explicit position and scale search during training and testing
19 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search DCNN: 6 Convolutional + 2 Fully Connected layers (0) Baseline: max-pooled net 13.0% (1)Epitomic DCNN 11.9% (2) search 10.56% Fusion (1)+(2) 10.22% ~1% gain ~1.5% gain The Deeper the Better: stay tuned! Deep Epitomic Nets and Scale/Position Search for Image Classification Goal: Invariance in Deep CNNs ?
20 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Epitomic implementation details Architecture of our deep epitomic net (11.94%) Training took 3 weeks on a singe Titan (60 epochs) Standard choices for learning rate, momentum, etc.
21 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Pyramidal search implementation details Image warp to square image. Position in mosaic is fixed Scales: 400, 300, 220, 160, 120, 90 pixels Mosaic: 720 pixels