Download presentation
Published byAvis Hancock Modified over 9 years ago
1
Feedforward semantic segmentation with zoom-out features
Mostajabi, Yadollahpour and Shakhnarovich Toyota Technological Institute at Chicago
2
Photo credit: Mostajabi et al.
Main Ideas Casting semantic segmentation as classifying a set of superpixels. Extracting CNN features from different levels of spatial context around the superpixel at hand. Using MLP as the classifier Photo credit: Mostajabi et al.
3
Zoom-out feature extraction
Photo credit: Mostajabi et al.
4
Zoom-out feature extraction
Subscene Level Features Bounding box of superpixels within radius three from the superpixel at hand Warp bounding box to 256 x 256 pixels Activations of the last fully connected layer Scene Level Features Warp image to 256 x 256 pixels
5
Training Extracting the features from the mirror images and take element- wise max over the resulting two features vectors. 12416-dimensional representation for each superpixel. Training 2 classifiers Linear classifier (Softmax) MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout
6
Loss Function Imbalanced dataset Loss function:
Wheighted loss function Loss function: Let 𝑓 𝑐 be frequency of class c in the training data and 𝑐 𝑓 𝑐 =1.
7
Effect of Zoom-out Levels
Image Ground Truth G1:3 G1:5 G1:5+S1 G1:5+S1+S2 Photo and Table credit: Mostajabi et al.
8
Table credit: Mostajabi et al.
Quantitative Results Softmax Results on VOC 2012 Table credit: Mostajabi et al.
9
Table credit: Mostajabi et al.
Quantitative Results MLP Results Table credit: Mostajabi et al.
10
Photo credit: Mostajabi et al.
Qualitative Results Photo credit: Mostajabi et al.
11
Learning Deconvolution Network for Semantic Segmentation
Noh, Hong and Han POSTECH, Korea
12
Motivations Image Ground Truth FCN Prediction Photo credit: Noh et al.
13
Motivations Photo credit: Noh et al.
14
Deconvolution Network Architecture
Photo credit: Noh et al.
15
Unpooling Photo credit: Noh et al.
16
Deconvolution Photo credit: Noh et al.
17
Unpooling and Deconvolution Effects
Photo credit: Noh et al.
18
Pipeline Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores. Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average. Constructing the class conditional probability map using Softmax Apply fully-conncected CRF to the probability map. Ensemble with FCN Computing mean of probability map generated with DeconvNet and FCN applying CRF. Photo credit: Noh et al.
19
Training Deep Network Adding a batch normalization layer to the output of every convolutional and deconvolutional layer. Two-stage Training Train on easy examples first and then fine-tune with more challenging ones. Constructing easy examples: Crop object instances using ground-truth annotations Limiting the variations in object location and size reduces the search space for semantic segmentation substantially
20
Effect of Number of Proposals
Photo credit: Noh et al.
21
Quantitative Results Table credit: Noh et al.
22
Qualitative Results Photo credit: Noh et al.
23
Qualitative Results Examples that FCN produces better results than DeconvNet. Photo credit: Noh et al.
24
Qualitative Results Examples that inaccurate predictions from our method and FCN are improved by ensemble. Photo credit: Noh et al.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.