Convolutional Neural Networks for Visual Tracking Computer Vision Lab. 남현섭
Contents Convolutional Neural Networks Tracking by CNN J. Fan, et al., Human tracking using convolutional neural networks, Neural Networks, IEEE Transactions on, 2010 H. Li, et al., DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking, BMVC, 2014 On-going research
Convolutional Neural Network
J. Fan, et al., Human tracking using convolutional neural networks, Neural Networks, IEEE Transactions on, 2010
Contributions Learn both spatial and temporal features from image pairs of two adjacent images. Use multiple path ways in CNN to fuse local and global information. Use Shift-variant CNN architecture to alleviate the drift problem to distracting objects.
CNN Architecture
Shift-Variant Architecture Shift-invariant Shift-variant
Handling Scale Change
Results temporal&spatial features spatial features only global&local branch, shift-variant global branch only local branch only Shift-invariant
Results
Results
H. Li, et al., DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking, BMVC, 2014
Contributions A candidate pool of multiple CNNs => temporal adaptation Structural loss function => large, reliable training examples Class-specific tracking => Combine class-level detector and instance-level tracker
CNN Architecture
Structural Loss Function Traditional loss function Structural loss function Structural importance CNN loss overlapping ratio => Can use the training samples with high importance to avoid class ambiguity.
Online Learning: A Coordinate-Descent => Reduce overfitting, increase training speed
Temporal Adaptation With a CNN Pool
Temporal Adaptation With a CNN Pool Can accommodate as many as possible appearance variations without learning an ensemble of CNNs of a very complicated CNN Can explicitly refine the model pool and discard unreliable CNNs
Class-Specific Tracking Combine the class-level detector and the instance-level tracker
Results
Results
Results – Class Specific Tracking
Observations Need to combine low-level and high-level information. Deep CNN features lack of exact localization ability. Learning a CNN with few examples leads an overfitting problem.
On-Going Research Learning a CNN Probability map Re-initialize