Weak-supervision based Multi-Object Tracking

Slides:



Advertisements
Similar presentations
Wheres Waldo: Matching People in Images of Crowds Rahul GargDeva RamananSteven M. Seitz Noah Snavely Problem Definition University of Washington University.
Advertisements

CVPR2013 Poster Modeling Actions through State Changes.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Saad M. Khan and Mubarak Shah, PAMI, VOL. 31, NO. 3, MARCH 2009, Donguk Seo
A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,
Automatic Camera Calibration
3D LayoutCRF Derek Hoiem Carsten Rother John Winn.
A General Framework for Tracking Multiple People from a Moving Camera
An Information Fusion Approach for Multiview Feature Tracking Esra Ataer-Cansizoglu and Margrit Betke ) Image and.
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
Limitations of Cotemporary Classification Algorithms Major limitations of classification algorithms like Adaboost, SVMs, or Naïve Bayes include, Requirement.
Robust Object Tracking by Hierarchical Association of Detection Responses Present by fakewen.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.
Naifan Zhuang, Jun Ye, Kien A. Hua
Goal: Predicting the Where and What of actors and actions through Online Action Localization Figure 1.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR
Learning to Compare Image Patches via Convolutional Neural Networks
A Discriminative Feature Learning Approach for Deep Face Recognition
Ju Hong Yoon Chang-Ryeol Lee Ming-Hsuan Yang Kuk-Jin Yoon KETI
Summary of “Efficient Deep Learning for Stereo Matching”
Object Detection based on Segment Masks
Object detection with deformable part-based models
Deep Predictive Model for Autonomous Driving
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Huazhong University of Science and Technology
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Mauricio Hess-Flores1, Mark A. Duchaineau2, Kenneth I. Joy3
Object detection as supervised classification
Presenter: Hajar Emami
Deep Learning and Newtonian Physics
Textual Video Prediction
Counting In High Density Crowd Videos
A Convolutional Neural Network Cascade For Face Detection
Counting in High-Density Crowd Videos
Eric Grimson, Chris Stauffer,
The Open World of Micro-Videos
CornerNet: Detecting Objects as Paired Keypoints
Outline Background Motivation Proposed Model Experimental Results
Deep Robust Unsupervised Multi-Modal Network
Object Tracking: Comparison of
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Natural Language to SQL(nl2sql)
Introduction to Object Tracking
Textual Video Prediction
Related Work in Camera Network Tracking
Abnormally Detection
Attention for translation
Spatially Supervised Recurrent Neural Networks for Visual Object Tracking Authors: Guanghan Ning, Zhi Zhang, Chen Huang, Xiaobo Ren, Haohong Wang, Canhui.
Human-object interaction
Deep Object Co-Segmentation
Weekly Learning Alex Omar Ruiz Irene.
Multi-UAV Detection and Tracking
Multi-UAV to UAV Tracking
REU Program 2019 Week 3 Alex Ruiz Jyoti Kini.
End-to-End Facial Alignment and Recognition
Appearance Transformer (AT)
SFNet: Learning Object-aware Semantic Correspondence
Report 2 Brandon Silva.
Week 7 Presentation Ngoc Ta Aidean Sharghi
Multi-Target Detection and Tracking of UAVs from a UAV
REU Program 2019 Week 5 Alex Ruiz Jyoti Kini.
REU Program 2019 Week 6 Alex Ruiz Jyoti Kini.
SDSEN: Self-Refining Deep Symmetry Enhanced Network
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.
Presentation transcript:

Weak-supervision based Multi-Object Tracking Alex Ruiz, Jyoti Kini, Dr. Mubarak Shah University of Central Florida Project Description Architecture Quantitative Results Training: Reset101 extracts features for each input images. Self Attention Module retains geometric patterns and long-range dependencies. 4D tensor correlation tensor represents the similarity matrix between a pair of image features. Neighbourhood Consensus Network using 4D convolutions eliminates inconsistent matches to further refine the affinity between image pairs. Loss: For a training pair (Ia and Ib), the weakly supervised loss is computed as below: where (S-a and S-b) represent the mean matching scores, also positive pairs are labeled with y=1 and negative pairs with y=-1 The proposed network tracks the object detections by extracting pixel-to-pixel correspondence, and further generating tracklets per object by associating labels to the objects. Based on the given quantitative results, the loss appears to be reducing and converging in both training and testing phases respectively, and self-attention module enhances the performance of the network. Goal: To solve the Multiple Object Tracking (MOT) problem in a sequence of frames by finding dense correspondences between a pair of images in a weakly-supervised manner. Steps: Obtain dense pixel-to-pixel matches between set of images, using the proposed similarity model. Key-point Matching Extract set of relevant key-points/image-pixels per object. Find the tracklets by mapping the pixels to the object detections. Tracklet Association Introduce tracklet-history based cascading capabilities to account for occlusions and to reduce identity switches. Conclusion Dataset The proposed model consisting of Self-Attention based module followed by the Neighbourhood Consensus Network effectively captures both the global long-term dependencies as well as the local context to enable Multiple Object Tracking. MOT17 dataset, primarily, comprises of 14 video sequences with crowded scenarios, camera motions, varying viewpoints, challenging weather conditions and balanced distribution of crowd density across training and the test-set. Additionally, the dataset provides object detections using existing detectors - DPM, FRCNN and SDP. Each datapoint in the the detection CSV file is in the format: frame number, identity number, bounding box coordinates – left, top, width, height, confidence score, class and visibility. Detection annotation: L(Ia, Ib) = -y (S-a + S-b) Future Work We intend to implement the Multi-head Attention module to further improve the key-point matches. Qualitative Results References [1] Rocco I, Cimpoi M, Arandjelović R, Torii A, Pajdla T, Sivic J. Neighbourhood Consensus Networks. InAdvances in Neural Information Processing Systems 2018 (pp. 1651-1662). [2] Milan, A., Leal-Taixé, L., Reid, I., Roth, S., & Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831. [3] Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2018). Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318. 1, 2, 164.1, 19.6, 66.5, 163.2, 1, 1 , 0.5