Learning Multi-Domain Convolutional Neural Networks for Visual Tracking arXiv : 1510.07945 [cs.CV] v1 2015, v2 2016 Hyeonseob Nam, Bohyung Han Dept.

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

DeepID-Net: deformable deep convolutional neural network for generic object detection Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng.
Robust Object Tracking via Sparsity-based Collaborative Model
Deep Learning and Neural Nets Spring 2015
Large-Scale Object Recognition with Weak Supervision
R-CNN By Zhang Liliang.
Spatial Pyramid Pooling in Deep Convolutional
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 11, NOVEMBER 2011 Qian Zhang, King Ngi Ngan Department of Electronic Engineering, the Chinese university.
WEEK 6: DEEP TRACKING STUDENTS: SI CHEN & MEERA HAHN MENTOR: AFSHIN DEGHAN.
Kuan-Chuan Peng Tsuhan Chen
Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE
Orderless Tracking through Model-Averaged Posterior Estimation Seunghoon Hong* Suha Kwak Bohyung Han Computer Vision Lab. Dept. of Computer Science and.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
A General Framework for Tracking Multiple People from a Moving Camera
Video Tracking Using Learned Hierarchical Features
Detection, Segmentation and Fine-grained Localization
Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.
Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Feedforward semantic segmentation with zoom-out features
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue
Robust Nighttime Vehicle Detection by Tracking and Grouping Headlights Qi Zou, Haibin Ling, Siwei Luo, Yaping Huang, and Mei Tian.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Convolutional Neural Network
Max-Confidence Boosting With Uncertainty for Visual tracking WEN GUO, LIANGLIANG CAO, TONY X. HAN, SHUICHENG YAN AND CHANGSHENG XU IEEE TRANSACTIONS ON.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Philipp Gysel ECE Department University of California, Davis
Spatial Localization and Detection
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
Detecting Occlusion from Color Information to Improve Visual Tracking
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Hierarchical Convolutional Features for Visual Tracking Chao Ma , SJTU Jia-Bin Huang , UIUC Xiaokang Yang , SJTU Ming-Hsuan Yang , UC Merced.
Recent developments in object detection
Big data classification using neural network
Understanding and Diagnosing Visual Tracking Systems
Faster R-CNN – Concepts
Convolutional Neural Network
Object Detection based on Segment Masks
CSE Jeongbin Choe Advisor: Prof. Bohyung Han (CV Lab)
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Combining CNN with RNN for scene labeling (segmentation)
Part-Based Room Categorization for Household Service Robots
Ajita Rattani and Reza Derakhshani,
Real-Time Object Localization and Tracking from Image Sequences
CS6890 Deep Learning Weizhen Cai
Fast and Robust Object Tracking with Adaptive Detection
Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
Introduction to Neural Networks
Convolutional Neural Networks for Visual Tracking
Object Detection + Deep Learning
Online Graph-Based Tracking
Age and Gender Classification using Convolutional Neural Networks
Transferring Rich Feature Hierarchies for Robust Visual Tracking
Outline Background Motivation Proposed Model Experimental Results
Object Tracking: Comparison of
RCNN, Fast-RCNN, Faster-RCNN
Introduction to Object Tracking
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
Heterogeneous convolutional neural networks for visual recognition
Course Recap and What’s Next?
Spatially Supervised Recurrent Neural Networks for Visual Object Tracking Authors: Guanghan Ning, Zhi Zhang, Chen Huang, Xiaobo Ren, Haohong Wang, Canhui.
Human-object interaction
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Jiahe Li
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Presentation transcript:

Learning Multi-Domain Convolutional Neural Networks for Visual Tracking arXiv : 1510.07945 [cs.CV] v1 2015, v2 2016 Hyeonseob Nam, Bohyung Han Dept. of Computer Science and Engineering, POSTECH, Korea

Outline Introduction Multi-Domain Network (MDNet) Online Tracking using MDNet Experiment Conclusion

Outline Introduction Multi-Domain Network (MDNet) Online Tracking using MDNet Experiment Conclusion

Introduction In recent years, correlation filters have gained attention in the area of visual tracking [3]MOSSE, [18]KCF, [6]DSST, [21]MUSTer [3] D. S. Bolme, J. R. Beveridge, B. Draper, Y. M. Lui, et al. Visual object tracking using adaptive correlation filters. In CVPR, 2010. [18] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell., 37(3):583–596, 2015. [6] M. Danelljan, G. Hager, F. Khan, and M. Felsberg. Accurate ¨ scale estimation for robust visual tracking. In BMVC, 2014. [21] Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In CVPR, 2015.

Introduction They resort to low-level hand-crafted features which are vulnerable in dynamic situations (illumination changes, occlusion, deformations, etc) CNN

Fundamental inconsistency between classification & tracking problems Introduction Offline Lack of training data [10], [28], [20], [39] using the representations from CNNs [10] J. Fan, W. Xu, Y. Wu, and Y. Gong. Human tracking using convolutional neural networks. IEEE Trans. Neural Networks, 21(10):1610–1623, 2010. [28] H. Li, Y. Li, and F. Porikli. DeepTrack: Learning discriminative feature representations by convolutional neural networks for visual tracking. In BMVC, 2014. [20] S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. In ICML, 2015. [39] N. Wang, S. Li, A. Gupta, and D.-Y. Yeung. Transferring rich feature hierarchies for robust visual tracking. arXiv preprint arXiv:1501.04587, 2015. Pretrained CNNs on a large-scale classification dataset such as ImageNet Fundamental inconsistency between classification & tracking problems

Introduction The approach to capture sequence-independent information should be incorporated for better representations for tracking Multi-Domain Network (MDNet)

Outline Introduction Multi-Domain Network (MDNet) Online Tracking using MDNet Experiment Conclusion

Multi-Domain Network (MDNet) Learning Method : SGD Binary classification layer with softmax crossentropy loss nK+1 iteration nK+2 iteration nK iteration Simple architecture(shallow) : VGG-M network[5] (adjust) [5] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014

Outline Introduction Multi-Domain Network (MDNet) Online Tracking using MDNet Experiment Conclusion

Online Tracking using MDNet Fine-tune

Online Tracking using MDNet

Online Tracking using MDNet To estimate the target state in each frame : N target candidates 𝑥 1 , . . . , 𝑥 𝑁 sampled around the previous target state Evaluated using the network (positive 𝑓 + ( 𝑥 𝑖 ) , negative 𝑓 − ( 𝑥 𝑖 )) Optimal target state :

Online Tracking using MDNet

Online Tracking using MDNet Long-term Updates Short-term Updates Robustness Adaptiveness Performed in regular intervals 𝑓 + ( 𝐱 ∗ ) < 0.5 Using long-term positive samples Using short-term positive samples

Online Tracking using MDNet

Online Tracking using MDNet Hard Negative (Minibatch) Mining [35] In each iteration of our learning procedure, a minibatch consists of M + positives and M h − hard negatives Test M − (>> M h − ) negative samples and select the ones with top M h − positive scores [35] K.-K. Sung and T. Poggio. Example-based learning for viewbased human face detection. IEEE Trans. Pattern Anal. Mach. Intell., 20(1):39–51, 1998.

Online Tracking using MDNet

Online Tracking using MDNet Bounding Box Regression (DPM, RCNN) First frame of a test sequence We train a simple linear regression model to predict the precise target location using conv3 features of the samples near the target location Subsequent frame We adjust the target locations estimated from Eq. (1)

τ s = 20, τ 𝑙 = 100

Implementation Details Target candidate generation N(= 256) samples in translation and scale dimension, x 𝑡 𝑖 = ( 𝑥 𝑡 𝑖 , 𝑦 𝑡 𝑖 , 𝑠 𝑡 𝑖 ), i = 1, . . . , N, from a Gaussian distribution Training data Offline : We collect 50 positive (IoU≥0.7) and 200 negative (IoU≤0.5) samples from every frame Online : We collect S 𝑡 + (=50) positive (IoU≥0.7) and S 𝑡 − (=200) negative (IoU≤0.3) samples S 1 + =500, S 1 − =5000

Implementation Details Network learning We train the network for 100K iterations with learning rates 0.0001 for convolutional layers and 0.001 for fully connected layers Initial frame of a test sequence : Train for 30 iterations with learning rate 0.0001 for fc4-5 and 0.001 for fc6 Online update : Train for 10 iterations with the learning rate three times larger than that in the initial frame

Implementation Details The momentum = 0.9 Weight decay = 0.0005 Each mini-batch consists of M + (= 32) positives and M ℎ − (= 96) hard negatives selected out of M − (= 1024) negative examples

Outline Introduction Multi-Domain Network (MDNet) Online Tracking using MDNet Experiment Conclusion

Experiment Object Tracking Benchmark OTB50 [41] OTB100 [40] VOT2014 [25] [25] M. Kristan, R. Pflugfelder, A. Leonardis, J. Matas, L. Cehovin, G. Nebehay, T. Voj ˇ ´ıˇr, G. Fernandez, et al. The visual object tracking VOT2014 challenge results. In ECCVW, 2014 [40] Y. Wu, J. Lim, and M. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, Sept 2015 [41] Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, 2013

Experiment-OTB For offline training of MDNet, we use 58 training sequences collected from VOT2013, VOT2014 and VOT2015, excluding the videos included in OTB100 Six state-of-the-art trackers MUSTer [21] CNN-SVM [20] MEEM [42] TGPR [12] DSST [6] KCF [18] Top 2 trackers included in the benchmark SCM [44] Struck [17]

Experiment-VOT2014 Pretrained using 89 sequences from OTB100, which do not include the common sequences with the VOT2014 dataset Trackers are initialized with either Ground-truth bounding boxes (baseline) or Randomly perturbed ones (region noise)

Outline Introduction Multi-Domain Network (MDNet) Online Tracking using MDNet Experiment Conclusion

Conclusion We propose a multi-domain learning framework based on CNNs The CNN pretrained by multi-domain learning is updated online in the context of a new sequence to learn domain-specific information adaptively Our extensive experiment demonstrates the outstanding performance of our tracking algorithm in two public benchmarks: Object Tracking Benchmark and VOT2014

Thanks for listening!