Feedforward semantic segmentation with zoom-out features

Slides:

Advertisements

Similar presentations

Face Recognition: A Convolutional Neural Network Approach

Advertisements

Lecture 6: Classification & Localization

ImageNet Classification with Deep Convolutional Neural Networks

Large-Scale Object Recognition with Weak Supervision

R-CNN By Zhang Liliang.

Spatial Pyramid Pooling in Deep Convolutional

From R-CNN to Fast R-CNN

Generic object detection with deformable part-based models

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.

Fully Convolutional Networks for Semantic Segmentation

Unsupervised Visual Representation Learning by Context Prediction

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Spatial Localization and Detection

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Lecture 3a Analysis of training of NN

Recent developments in object detection

Deep Learning for Dual-Energy X-Ray

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Convolutional Neural Network

Summary of “Efficient Deep Learning for Stereo Matching”

Object Detection based on Segment Masks

Compact Bilinear Pooling

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Understanding and Predicting Image Memorability at a Large Scale

Perceptual Loss Deep Feature Interpolation for Image Content Changes

Lecture 24: Convolutional neural networks

Combining CNN with RNN for scene labeling (segmentation)

Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.

Dhruv Batra Georgia Tech

ECE 6504 Deep Learning for Perception

Huazhong University of Science and Technology

Training Techniques for Deep Neural Networks

Efficient Deep Model for Monocular Road Segmentation

CS6890 Deep Learning Weizhen Cai

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Object detection.

Fully Convolutional Networks for Semantic Segmentation

Human-level control through deep reinforcement learning

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

Vessel Extraction in X-Ray Angiograms Using Deep Learning

CS 4501: Introduction to Computer Vision Training Neural Networks II

Deep Learning Hierarchical Representations for Image Steganalysis

Object Classification through Deconvolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Tina Jiang. , Vivek Natarajan. , Xinlei Chen

Smart Robots, Drones, IoT

Basics of Deep Learning No Math Required

Visualizing and Understanding Convolutional Networks

Example of a simple deep network architecture.

Object Tracking: Comparison of

Analysis of Trained CNN (Receptive Field & Weights of Network)

RCNN, Fast-RCNN, Faster-RCNN

Heterogeneous convolutional neural networks for visual recognition

Face Recognition: A Convolutional Neural Network Approach

Department of Computer Science Ben-Gurion University of the Negev

Deep Object Co-Segmentation

Object Detection Implementations

Learning Deconvolution Network for Semantic Segmentation

Example of a simple deep network architecture.

Motivation The subjects/objects are correlated to each other under semantic relationships.

Directional Occlusion with Neural Network

Presentation transcript:

Feedforward semantic segmentation with zoom-out features Mostajabi, Yadollahpour and Shakhnarovich Toyota Technological Institute at Chicago

Photo credit: Mostajabi et al. Main Ideas Casting semantic segmentation as classifying a set of superpixels. Extracting CNN features from different levels of spatial context around the superpixel at hand. Using MLP as the classifier Photo credit: Mostajabi et al.

Zoom-out feature extraction Photo credit: Mostajabi et al.

Zoom-out feature extraction Subscene Level Features Bounding box of superpixels within radius three from the superpixel at hand Warp bounding box to 256 x 256 pixels Activations of the last fully connected layer Scene Level Features Warp image to 256 x 256 pixels

Training Extracting the features from the mirror images and take element- wise max over the resulting two features vectors. 12416-dimensional representation for each superpixel. Training 2 classifiers Linear classifier (Softmax) MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout

Loss Function Imbalanced dataset Loss function: Wheighted loss function Loss function: Let 𝑓 𝑐 be frequency of class c in the training data and 𝑐 𝑓 𝑐 =1.

Effect of Zoom-out Levels Image Ground Truth G1:3 G1:5 G1:5+S1 G1:5+S1+S2 Photo and Table credit: Mostajabi et al.

Table credit: Mostajabi et al. Quantitative Results Softmax Results on VOC 2012 Table credit: Mostajabi et al.

Table credit: Mostajabi et al. Quantitative Results MLP Results Table credit: Mostajabi et al.

Photo credit: Mostajabi et al. Qualitative Results Photo credit: Mostajabi et al.

Learning Deconvolution Network for Semantic Segmentation Noh, Hong and Han POSTECH, Korea

Motivations Image Ground Truth FCN Prediction Photo credit: Noh et al.

Motivations Photo credit: Noh et al.

Deconvolution Network Architecture Photo credit: Noh et al.

Unpooling Photo credit: Noh et al.

Deconvolution Photo credit: Noh et al.

Unpooling and Deconvolution Effects Photo credit: Noh et al.

Pipeline Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores. Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average. Constructing the class conditional probability map using Softmax Apply fully-conncected CRF to the probability map. Ensemble with FCN Computing mean of probability map generated with DeconvNet and FCN applying CRF. Photo credit: Noh et al.

Training Deep Network Adding a batch normalization layer to the output of every convolutional and deconvolutional layer. Two-stage Training Train on easy examples first and then fine-tune with more challenging ones. Constructing easy examples: Crop object instances using ground-truth annotations Limiting the variations in object location and size reduces the search space for semantic segmentation substantially

Effect of Number of Proposals Photo credit: Noh et al.

Quantitative Results Table credit: Noh et al.

Qualitative Results Photo credit: Noh et al.

Qualitative Results Examples that FCN produces better results than DeconvNet. Photo credit: Noh et al.

Qualitative Results Examples that inaccurate predictions from our method and FCN are improved by ensemble. Photo credit: Noh et al.