Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Pyramid Pooling in Deep Convolutional

Similar presentations


Presentation on theme: "Spatial Pyramid Pooling in Deep Convolutional"— Presentation transcript:

1 Spatial Pyramid Pooling in Deep Convolutional
CS688/WST665 Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition Presenter ByungIn Yoo 1

2 Contents Introduction Motivation Previous work Main Idea Details
Experiments Conclusion

3 Introduction Web-scale image retrieval Why is this challenging?
Classify images or videos Detect and localize object Estimate semantic and geometrical attributes Why is this challenging? View point Illumination Occlusion Scale Deformation Clutter background

4 Convolutional Neural Network (CNN)
Motivation The current CNN require a fixed input image size (e.g., 224 x 224 ) Recognition accuracy is degraded! Content loss Crop Distortion 224x224 Convolutional Neural Network (CNN) Warp

5 Convolutional Neural Network (CNN)
Motivation Spatial Pyramid Pooling The current CNN require a fixed input image size (e.g., 224 x 224 ) Recognition accuracy is degraded! Content loss Crop Distortion 224x224 Convolutional Neural Network (CNN) Warp

6 Previous work (1/2) Spatial Pyramid Matching
- very successful in traditional computer vision Grauman et al, The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, ICCV 2005. Lazebnik et al, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR 2006.

7 Previous work (2/2) Zeiler-Fergus Architecture (2013, 1st)
Google LeNet (2014, 1st) 8 Layers Still low accuracy! & Fixed Image Size Convolution Pooling Softmax Other 22 Layers Too complex model! & Fixed Image Size M.D. Zeiler et al, “Visualizing and understanding convolutional neural networks”, aXiv: , 2013. Christian Szegedy et a, “Going Deeper with Convolutions”, arXiv: , 2014.

8 Main Idea (1/2) Add Spatial Pyramid Pooling layer! Previous Nets
SPP Net

9 Main Idea (2/2) Generate fixed length representation regardless of image size/scale. Simple (still 8 layers) and Powerful Model! Variable input size/scale Multi-size training, Multi-scale testing, Full image view Multi-level pooling Robust to deformation Operated on feature map Pooling in regions

10 Details – Convolutional Layers and Feature Maps
Inherently, the convolutional layers can accept arbitrary size image. Feature map involve not only the strength of the responses, but also their spatial positions.

11 Details – The Spatial Pyramid Pooling Layer
SPP-net is a new layer with Spatial Pyramid Pooling 256 x ( 4x4 + 2x2 + 1) = 5376 Dimension vector Conv1 Conv2 Conv3 Conv4 Conv5 SPP FC6 FC7 SoftMax 256 filters

12 Details – Training with the Spatial Pyramid Pooling
Single-size training Simply modify the configuration file of CNN frameworks Conv1 Conv2 Conv3 Conv4 Conv5 SPP FC6 FC7 SoftMax Feature map: 13x13

13 Details – Training with the Spatial Pyramid Pooling
Multiple-size training Multiple networks sharing all weights Each network for a single size. (e.g. 224x224, 180x180) Improve scale-invariance resize

14 Details – Fast CNN-based Object Detection
The features can be computed from entire image only once. Similar accuracy, much faster (24x~64x) than R-CNN 2000 Convolutions! 1 Convolution!

15 Experiments (1/4) ILSVRC image classification task
1000 object classes (1,431,167 images)

16 Experiments (2/4) ILSVRC image classification task (rank #3)
SPP improves all CNN architectures Top-5 test accuracy Top-5 val. accuracy

17 Experiments (3/4) ILSVRC image detection task
Fully annotated 200 object classes across 121,931 images Allows evaluation of generic object detection in cluttered scenes at scale Detected Region Ground-truth :True :False

18 Experiments (4/4) ILSVRC image detection task (rank #2)
More practical than R-CNN

19 Conclusion SPP is flexible solution for handling different scales, sizes, and aspect ration. Spatial Pyramid Pooling improves accuracy. Multi-size training improves accuracy. Full-image representation improves accuracy. Classification: SPP improves all CNNs in the literature. Detection: Practical, fast and accurate than R-CNN.


Download ppt "Spatial Pyramid Pooling in Deep Convolutional"

Similar presentations


Ads by Google