CS6890 Deep Learning Weizhen Cai 04-10-2018 Image Segmentation and Generation: Upsampling through Transposed Convolution and Max Unpooling Part II: Applications CS6890 Deep Learning Weizhen Cai 04-10-2018
Visualization of feature maps Image segmentation FCN SegNet Conclusion Outline Outline Two applications of Transposed Convolution and Max Unpooling in neural network Visualization of feature maps Image segmentation FCN SegNet Conclusion Visualization Segmentation Transposed Convolutiaon and max unpooling are usually applied with in NN. Conclusion
Applications Same operations as CNNs, but in reverse Visualization of the feature maps Mapping activations at high layers back to the input pixel space Showing the patterns caused a given activation in the feature maps Visualization Transposed convolution Originally proposed as a way of unsupervised leaning ([Zeiler11]) No learning, no inference Segmentation [Zeiler11] M. Zeiler, G. Taylor, and R. Fergus: Adaptive Deconvolutional Networks for Mid and High Level Feature Learning. ICCV 2011 Same operations as CNNs, but in reverse Unpool feature maps Convolve unpooled maps Conclusion
Applications Visualization of the feature maps Visualization of the feature activations on the ImageNet validation set after the training is complete. Visualization Deconvnet Max unpooling Convnet Max pooling Segmentation Same filter but in revised mode in Transposed Convolution: transposed filter (flipped horizontal and vertical) Figure coms from: http://www.mdpi.com/1099-4300/19/6/242/htm Conclusion
Applications Input Feature maps Visualization of the feature maps Deconvnet model Visualization Input Feature maps Coming from convnet, one at a time Max Unpooling Segmentation Rectification Non-linearity (e.g. ReLU ) Left figure modified from source: http://www.aisociety.kr/prml/PRMLSS_2015_deconvolution.pdf Filtering Using transposed filter in convnet (flippe horizontal and vertical) Conclusion Output images Visualized filters/kernels (raw weights)
Applications Visualization of the feature maps Results of visualization of a fully trained model with the deconvet described earlier. Visualization Layer 1 edges colors Layer 2 corners Segmentation Figure coming from: https://zhuanlan.zhihu.com/p/24833574 Displaying which part of the image the given feature map focuses. Reconstructed patterns from the validation set that cause top 9 highest activations in a given feature map. For each feature map the corresponding image patches are also attached. Conclusion
Applications Visualization of the feature maps Results of visualization with the deconvet. Visualization Segmentation Figure coming from: https://zhuanlan.zhihu.com/p/24833574 Layer 3 similar structures consisted of lines Conclusion
Applications Visualization of the feature maps Results of visualization of a fully trained model with the deconvet described earlier. Visualization Layer 4 & 5 more information used for classify Segmentation Figure coming from: https://zhuanlan.zhihu.com/p/24833574 Conclusion
Applications Visualization of the feature maps Visualization of the different layers during training Visualization Segmentation Figure coming from: https://zhuanlan.zhihu.com/p/24833574 Conclusion
Applications Image segmentation (FCN) Brief introduction of FCN ( FCN modified from AlexNet) Input: Arbitrary size of image Segmentation Converting the fully connected layer in a classifier neural network to a convolutional layer + upsample 8x, 16x, 32x Figure coming from Long, J., Shelhamer, E., & Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation. arXiv:1411.4038 [cs]. Retrieved from http://arxiv.org/abs/1411.4038 Conclusion Output: Same size as input, pixel level prediction
Applications Image segmentation (FCN) Conversion from CNN to FCN: Convolutionalizaion (e.g. AlexNet) A fully connected layer can also be viewed as convolutions with kernels that cover its entire input region. Segmentation Figure comes from: https://arxiv.org/pdf/1411.4038.pdf Fine-tuning using AlexNet Output 1000 heatmaps. Conclusion Figure: Transforming fully connected layers into convolution layers enables a classification net to output a heatmap.
Image segmentation (FCN) Conversion from CNN to FCN (e.g. AlexNet) Converting all the FC layers to convolution layers. Using transposed convolution layer to recover the activation positions to something meaningful related to the image size. Imagine that we're just scaling up the activation size to the same image size. Segmentation Converting the fully connected layer in a classifier neural network to a convolutional layer + upsample 8x, 16x, 32x Figure coming from Long, J., Shelhamer, E., & Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation. arXiv:1411.4038 [cs]. Retrieved from http://arxiv.org/abs/1411.4038 Conclusion
Applications Image segmentation (FCN) Upsampling and the “skip” processor Upsampling with factor f is convolution with a fractional input stride of 1/f. e.g. Deconvolution with an output stride of f. Simply reverses the forward and backward passes of convolution Segmentation Using AlexNet as example. Upsampling is deconcovolution (backwards convolution) which is simply reverses the forward and backward passes of convolution. Higher layer- deep, global, coarse, what Lower layer – shallow, local, fine, where Figures comes from : https://blog.csdn.net/taigw/article/details/51401448 Conclusion
Applications Image segmentation (FCN) Advantages Disadvantages Arbitrary input size of images High efficiency to train the entire image instead of patches Segmentation Disadvantages Not sensitive to details No spatial regulation No leaning in this paper, does not count as strict transposed convolution Using AlexNet as example. Conclusion
Applications Image segmentation (SegNet) Segmentation Figure comes from: https://arxiv.org/pdf/1511.00561.pdf Maximums remain the same locations. Rest value is 0s. The decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. Conclusion
Conclusions Structures predictions: Semantic segmentation Visualization of features Training transposed CNN requires too many parameter Large dataset Large GPU memory Having a lot potential and applications Conclusion
Acknowledgment Dr. Jundong Liu Zhewei Wang Nidal Abuhajar Tao Sun Dr. Razvan Bunescu
Additional Applications Object generation (generator in GAN) A. Dosovitskiy, J. T. Springenberg and T. Brox. Learning to generate chairs with convolutional neural networks. CVPR 2015 Generation Deconvolution: 5 * 5 Fixed unplooing: Assuming the maximum values always locates at the left most corner. (e.g. 2 * 2 => 4 * 4) Conclusion