Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video CVPR 2019 Oral Samvit Jain; Xin Wang; Joseph E. Gonzalez University of California, Berkeley
Semantic Segmentation on Video Problem Definition Input: a video clip, not any ground truth Output: segmentation of each frame
Semantic Segmentation on Video Classical Approach Segment on each single frame
Accel, an efficient approach “Cheap” feature extraction net → Resnet 18 “Expensive” feature extraction net → Resnet 101
Network Architecture Optical Flow Warp Operation
Experiment Ablation Study 𝑁 𝑅 is always ResNet 101
Experiment Accuracy vs. inference time On CityScapes Dataset On CamVid Dataset
Experiment Comparison with Others On CityScapes Dataset On CamVid Dataset
r1. input frames r2. Accel NR branch r3. Accel NU branch r4. NR+NU, Resnet18
Conclusions The structure is very simple. And it looks faster and could get higher performance. But the comparison is unfair. The baseline is not STOA. Where is BiSeNet and ICNet??