Feedforward semantic segmentation with zoom-out features Mostajabi, Yadollahpour and Shakhnarovich Toyota Technological Institute at Chicago
Photo credit: Mostajabi et al. Main Ideas Casting semantic segmentation as classifying a set of superpixels. Extracting CNN features from different levels of spatial context around the superpixel at hand. Using MLP as the classifier Photo credit: Mostajabi et al.
Zoom-out feature extraction Photo credit: Mostajabi et al.
Zoom-out feature extraction Subscene Level Features Bounding box of superpixels within radius three from the superpixel at hand Warp bounding box to 256 x 256 pixels Activations of the last fully connected layer Scene Level Features Warp image to 256 x 256 pixels
Training Extracting the features from the mirror images and take element- wise max over the resulting two features vectors. 12416-dimensional representation for each superpixel. Training 2 classifiers Linear classifier (Softmax) MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout
Loss Function Imbalanced dataset Loss function: Wheighted loss function Loss function: Let 𝑓 𝑐 be frequency of class c in the training data and 𝑐 𝑓 𝑐 =1.
Effect of Zoom-out Levels Image Ground Truth G1:3 G1:5 G1:5+S1 G1:5+S1+S2 Photo and Table credit: Mostajabi et al.
Table credit: Mostajabi et al. Quantitative Results Softmax Results on VOC 2012 Table credit: Mostajabi et al.
Table credit: Mostajabi et al. Quantitative Results MLP Results Table credit: Mostajabi et al.
Photo credit: Mostajabi et al. Qualitative Results Photo credit: Mostajabi et al.
Learning Deconvolution Network for Semantic Segmentation Noh, Hong and Han POSTECH, Korea
Motivations Image Ground Truth FCN Prediction Photo credit: Noh et al.
Motivations Photo credit: Noh et al.
Deconvolution Network Architecture Photo credit: Noh et al.
Unpooling Photo credit: Noh et al.
Deconvolution Photo credit: Noh et al.
Unpooling and Deconvolution Effects Photo credit: Noh et al.
Pipeline Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores. Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average. Constructing the class conditional probability map using Softmax Apply fully-conncected CRF to the probability map. Ensemble with FCN Computing mean of probability map generated with DeconvNet and FCN applying CRF. Photo credit: Noh et al.
Training Deep Network Adding a batch normalization layer to the output of every convolutional and deconvolutional layer. Two-stage Training Train on easy examples first and then fine-tune with more challenging ones. Constructing easy examples: Crop object instances using ground-truth annotations Limiting the variations in object location and size reduces the search space for semantic segmentation substantially
Effect of Number of Proposals Photo credit: Noh et al.
Quantitative Results Table credit: Noh et al.
Qualitative Results Photo credit: Noh et al.
Qualitative Results Examples that FCN produces better results than DeconvNet. Photo credit: Noh et al.
Qualitative Results Examples that inaccurate predictions from our method and FCN are improved by ensemble. Photo credit: Noh et al.