Siamese Instance Search for Tracking

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Siamese Instance Search for Tracking 荷蘭 阿姆斯特丹大學 Ran Tao, Efstratios Gavves, Arnold W.M. Smeulders QUVA Lab, Science Park 904, 1098 XH Amsterdam CVPR 2016

Outline Introduction Siamese network architecture Siamese Instance Search Tracker Experiments Conclusion

Outline Introduction Siamese network architecture Siamese Instance Search Tracker Experiments Conclusion

Introduction At the core of many tracking algorithms is the function by which the image of the target is matched to the incoming frames We learn the matching mechanism from external videos that contain various disturbing factors the invariances without modeling these invariances Once the matching function has been learnt on the external data we do not adapt it anymore do not learn the tracking targets we apply it as is to new tracking videos of previously unseen target objects

Introduction Without explicit occlusion detection , model updating , tracker combination , forget mechanisms and other We simply match the initial target in the first frame with the candidates in a new frame

Outline Introduction Siamese network architecture Siamese Instance Search Tracker Experiments Conclusion

Siamese network architecture Siam 暹(ㄒㄧㄢ)羅 => now Thailand Siamese twins Siamese network [3] J. Bromley, J. W. Bentz, L. Bottou, I. Guyon, Y. LeCun, C. Moore, E. Sackinger, and R. Shah. Signature verification using a siamese time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence, 7(04):669–688, 1993 1811年-1874 恩(Eng)昌(Chang) 馬戲團

Siamese network architecture Cosine distance Shared weights Time Delay Neural Networks 1 for genuine pairs -1 for forgery pairs

Siamese network architecture Loss function : 𝑙(d( 𝑥 𝑗 , 𝑥 𝑘 )) d( 𝑥 𝑗 , 𝑥 𝑘 ) = || 𝑓 𝑥 𝑗 −𝑓( 𝑥 𝑘 ) || 2 𝑓 𝑥 𝑗 𝑓 𝑥 𝑘 Siamese network CNN CNN W Patch 𝑥 𝑗 Patch 𝑥 𝑘

Outline Introduction Siamese network architecture Siamese Instance Search Tracker Experiments Conclusion

Siamese Instance Search Tracker 1 2 3 Candidate sampling Box refinement

random pick another frame random pick one frame AlexNet VGGNet max pooling reduce location precision lower level visual patterns, such as edges and angles higher layers get activated the most on more complex patterns, such as faces and wheels random pick another frame overlap> ρ + : positive (y=1) overlap< ρ − : negative (y=0) random pick one frame

Siamese Instance Search Tracker 𝑦 𝑗𝑘 𝜖 {0,1} 𝜖 : the minimum distance margin Euclidean distance

Siamese Instance Search Tracker Tracking Inference Candidate sampling Radius sampling strategy Box refinement As in RCNN, train {x, y, w, h} of the box based on the first frame

Outline Introduction Siamese network architecture Siamese Instance Search Tracker Experiments Conclusion

Experiments Implementation Details Candidate Sampling 10 radial and 10 angular divisions Search radius : the longer axis of the initial box in the first frame Scales : { 2 2 , 1, 2 }

pairs of boxes/pair of frame [41] A. W. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah. Visual tracking: an experimental survey. TPAMI, 36(7):1442–1468, 2014 Experiments Implementation Details Network training Load the pre-trained network parameters (ImageNet) and fine tune (Learning rate & Weight decay = 0.001) ALOV dataset [41] - removing the 12 videos (OTB) Positive : IoU > 0.7 Negative : IoU < 0.5 decreased by a factor of 10 after every 2 epochs pairs of frames pairs of boxes/pair of frame Training 60,000 128 Validation 2,000

Experiments Dataset Online tracking benchmark (OTB) [52] 50 videos [52] Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, 2013 Experiments Dataset Online tracking benchmark (OTB) [52] 50 videos Evaluation metrics Success plot : Bounding boxes IoU (AUC) Precision plot : Bounding boxes center (Prec@20)

Experiments Design evaluation Network pre-tuned on ImageNet v.s. network tuned generically on external video data v.s. network fine tuned target-specifically on first frame To max pool or not to max pool? Multi-layer features vs. single-layer features Large net vs. small net

Experiments marginal improvement fine tuned using large amount of external data are to be preferred

Experiments To max pool or not to max pool?

Experiments Multi-layer features vs. single-layer features uses the outputs of layers conv4, conv5 and fc6 as features

Experiments Large net vs. small net

Experiments State-of-the-art comparison 29 trackers included in the benchmark [52] e.g. TLD [23], Struck [14] and SCM [57] recent trackers TGPR [9], MEEM [56], SO-DLT [47], KCFDP [19] and MUSTer [18] [18] Z. Hong, Z. Chen, C. Wang, X. Mei, D. Prokhorov, and D. Tao. Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking. In CVPR, 2015

Experiments SINT+ : uses an adaptive candidate sampling strategy suggested by [48] and optical flow [4] The sampling range is adaptive to the image resolution 30/512*w (w is image width) Filter out motion inconsistent candidates estimated optical flow, remove the candidate boxes that contain less than 25% of those pixels focuses on the tracking matching function [48] N. Wang, J. Shi, D.-Y. Yeung, and J. Jia. Understanding and diagnosing visual tracking systems. In ICCV, 2015 [4] T. Brox and J. Malik. Large displacement optical flow: descriptor matching in variational motion estimation. TPAMI, 33(3):500–513, 2011

Experiments

Experiments Temporal and spatial robustness

Experiments Per distortion type comparison occlusion and deformation motion blur, fast motion, in-plane rotation, out of view and low resolution

Experiments Failure modes of SINT similar confusing object occlusion same uniform occluded by the lighter

Experiments Additional sequences and re-identification 6 newly collected sequences from YouTube

Experiments Fishing Rally BirdAttack Soccer GD Dancing scale change ✓ fast motion out-of-plane rotation non-rigid deformation low contrast illumination variation poorly textured

Experiments

Experiments We, furthermore, observe that provided a window sampling over the whole image using [58] SINT is accurate in target re-identification, after the target was missing for a significant amount of time from the video (Yoda) [58] C. L. Zitnick and P. Dollar. Edge boxes: Locating object proposals from edges. In ECCV, 2014

1500-frame, 12-shot Star Wars video appearing in 6 of the shots Red : present Black : absent

Outline Introduction Siamese network architecture Siamese Instance Search Tracker Experiments Conclusion

Conclusion Learn a matching function matching the initial target in the first frame with candidates in a new frame The matching function is learned on ALOV, based on the proposed Siamese network No overlap between the training videos and any of the videos for evaluation Do not aim to do any pre-learning of the tracking targets

Conclusion Reaches state-of-the-art performance on OTB without updating the target, tracker combination, occlusion detection and alike Further, SINT allows for target re-identification after the target was absent for a complete shot

Thanks for listening!