Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatially Supervised Recurrent Neural Networks for Visual Object Tracking Authors: Guanghan Ning, Zhi Zhang, Chen Huang, Xiaobo Ren, Haohong Wang, Canhui.

Similar presentations


Presentation on theme: "Spatially Supervised Recurrent Neural Networks for Visual Object Tracking Authors: Guanghan Ning, Zhi Zhang, Chen Huang, Xiaobo Ren, Haohong Wang, Canhui."โ€” Presentation transcript:

1 Spatially Supervised Recurrent Neural Networks for Visual Object Tracking
Authors: Guanghan Ning, Zhi Zhang, Chen Huang, Xiaobo Ren, Haohong Wang, Canhui Cai, Zhihai(Henry) He

2 The Problem Object Tracking
Visual Object Tracking is the process of localizing a single target in a video or sequential images, given the target position in the first frame. The significance lies in two aspects: It has a wide range of applications such as motion analysis, activity recognition, surveillance, and human-computer interaction. It can be a prerequisite or a necessary component of another system. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

3 Dataset Object Tracking Benchmark (OTB)
OTB is one of the most commonly used datasets. Each video is annotated with one or more attributes: IV: Illumination Variation SV: Scale Variation OCC: Occlusion DEF: Deformation MB: Motion Blur FM: Fast Motion IPR: In-plane Rotation OPR: Out-of-Plane Rotation OV: Out-of-View BC: Background Clutters LR: Low Resolution Figure 1: OTB dataset ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

4 Evaluation 1. How do we measure the performance?
Measured by OPE (one pass evaluation), testing a sequence with initialization from the ground truth position in the 1st frame and report the [average precision] or [success rate]. 2. How to calculate the [average precision] or [success rate]? Average precision is the average overlap score over frames Frame is a success when its overlap score is above threshold 3. How to evaluate over a range of thresholds? The [success plot] shows the ratios of successful frames at the thresholds varied from 0 to 1. We use the [area under curve (AUC)] of each success plot to rank the tracking algorithms. S= | ๐‘Ÿ ๐‘ก โˆฉ ๐‘Ÿ ๐‘Ž | | ๐‘Ÿ ๐‘ก โˆช ๐‘Ÿ ๐‘Ž | S : Overlap score ๐’“ ๐’• : Tracked bounding box. ๐’“ ๐’‚ : Ground-truth bounding box โˆฉ : The intersection of two regions โˆช: The union of two regions | ยท |: The number of pixels in a region ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

5 Challenges 1. Appearance Variations: Target deformations
Illumination variations Scale changes Background Clutters Fast and abrupt motion 2. Occlusion Partial Occlusion Full Occlusion 3. Difficulties Introduced by Camera Uneven lighting Illumination Blur Low resolution Perspective distortion ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

6 Related Works What are the major related works?
Regression-Based Object Recognition YOLO is regression-based CNN network that detects generic objects, and inspires us to research into the regression capabilities of LSTM. 2. LSTM LSTM is an RNN module with memory that is temporally deep. We research into incorporating CNN and LSTM to interpreted high-level visual features both spatially and temporally, and propose to regress the features into object locations. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

7 Related Works Our proposed network is the first work that incorporates CNN and LSTM for the purpose of object tracking on real-world datasets. There are two prior papers [1, 2] that are closely related to this work: [1] Quan Gan, Qipeng Guo, Zheng Zhang, and Kyunghyun Cho. First step toward model-free, anonymous object tracking with recurrent neural networks. arXiv preprint arXiv: , Traditional RNN, not LSTM Focused on artificially generated sequences and synthesized data, not real-world videos ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

8 Related Works Our proposed network is the first work that incorporates CNN and LSTM for the purpose of object tracking on real-world datasets. There are two prior papers [1, 2] that are closely related to this work: [2] Samira Ebrahimi Kahou, Vincent Michalski, and Roland Memisevic. Ratm: Recurrent attentive tracking model. arXiv preprint arXiv: , 2015. Traditional RNN as an attention scheme In contrast, we directly regress coordinates or heatmaps instead of using sub-region classifiers. We use the LSTM for an end-to-end spatio-temporal regression with a single evaluation ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

9 Related Works Many recent works [3, 4, 5] have appeared since our work. (July 2016 on Arxiv). Some works [3, 4] extend our proposed YOLO + LSTM scheme with multi-target tracking and reinforcement learning. Some works [4] seem to be built upon our open-sourced code. Some works [5] are similar but independent. [3] Dan Iter, et. al. Target Tracking with Kalman Filtering, KNN and LSTMs, December 2016. [4] Da Zhang, et. al. Deep Reinforcement Learning for Visual Object Tracking in Videos, April 2017. [5] Anton Milan, et. al. Online Multi-Target Tracking Using Recurrent Neural Networks, December 2016. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

10 Overview Figure 2: Flowchart of the proposed algorithm Motives:
Aims to interpret high-level visual features produced by YOLO to regress into object locations Incorporate with an LSTM module that also takes into account the temporal flow of these visual features. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

11 Flowchart of the Proposed Algorithm
What is the network? Figure 3: Recurrent YOLO (ROLO) How do we incorporate YOLO and the recurrent module? Extract high-level visual features with YOLO (a CNN network with Conv layers and Pooling layers) Use an FC layer to regress the features into target coordinates/heatmaps for spatial supervision Concatenate them with the visual features Feed the concatenated features into LSTM modules ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

12 LSTM Why is LSTM useful in this network?
Traditional Kalman Filter only takes into account location histories The memory of LSTM is useful to store visual dynamics as well as location histories High-level visual features and locations over frames [Input of LSTM] are jointly used to regress into tracking predictions [Output of LSTM] More robust in occlusion situations. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

13 Spatiao-temporal Robustness Against Occlusion
Figure 5: Visualization with Regression of Locations (Unseen Frames) Green: ROLO Blue: YOLO Red: Ground Truth ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

14 Spatiao-temporal Robustness Against Occlusion
ROLO is effective due to several reasons: (1) the representation power of the high-level visual features from convNets, (2) the feature interpretation power of LSTM, therefore the ability to detect visual objects, (3) spatially supervised by a location or heatmap vector, (3) the capability of LSTM in regressing effectively with spatio-temporal information. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

15 Spatiao-temporal Robustness Against Occlusion
It is shown in the figure that ROLO tracks the object in near-complete occlusions. Two similar targets occur in this video, ROLO tracks the correct target as the detection module inherently feeds the LSTM unit with spatial constraint. Between frame 47-60, YOLO fails in detection but ROLO does not lose the track. Heatmap is with minor noise when no detection is presented because the similar target is still in sight. Figure 6: Visualization with Regression of Heatmaps (Unseen Videos) ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

16 Results: Tracking results on OTB dataset
Figure 7: Tracking Results: Bounding rectangles of the ground truth are indicated in red while the detection results in blue. The ROLO output from LSTM modules is in green. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

17 Area Under Curve (AUC) score reflected on right-top.
Results: Comparing with Other Methods Due to fast motions, occlusions, and therefore poor detections, YOLO with the kalman ๏ฌlter perform inferiorly lacking knowledge of the visual context. LSTM is capable of regressing both visual context and location histories, performing better than [YOLO + Kalman] Figure 8: Success Plot. Area Under Curve (AUC) score reflected on right-top. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

18 Results: Steps VS Speed
The frame per second (fps) drops as the steps increase. Steps VS Accuracy It appears that the accuracy does not continually increase as the number of steps for LSTM increases ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

19 Conclusion And contributions of this paper:
Our proposed ROLO method extends the deep neural network learning and analysis into the spatiotemporal domain. We have also studied LSTMโ€™s interpretation and regression capabilities of high-level visual features. Our proposed tracker is both spatially and temporally deep, and can effectively tackle problems of major occlusion and severe motion blur. ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM

20 THANKS! ICCAS 2017: Spatially Supervised Recurrent Neural Networks for Visual Object Tracking 7/1/2019 6:25 PM


Download ppt "Spatially Supervised Recurrent Neural Networks for Visual Object Tracking Authors: Guanghan Ning, Zhi Zhang, Chen Huang, Xiaobo Ren, Haohong Wang, Canhui."

Similar presentations


Ads by Google