On-going research on Object Detection *Some modification after seminar

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
What is Statistical Modeling
Large-Scale Object Recognition with Weak Supervision
Cooperating Intelligent Systems
Multi-Class Object Recognition Using Shared SIFT Features
COS 429 PS5: Finding Nemo. Exemplar -SVM Still a rigid template,but train a separate SVM for each positive instance For each category it can has exemplar.
Sparse vs. Ensemble Approaches to Supervised Learning
1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.
R-CNN By Zhang Liliang.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Learning from observations
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Object Detection with Discriminatively Trained Part Based Models
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Today Ensemble Methods. Recap of the course. Classifier Fusion
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Recognition Using Visual Phrases
Feedforward semantic segmentation with zoom-out features
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
Strong Supervision from Weak Annotation: Interactive Training of Deformable Part Models S. Branson, P. Perona, S. Belongie.
Radboud University Medical Center, Nijmegen, Netherlands
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Unsupervised Learning of Video Representations using LSTMs
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Summary of “Efficient Deep Learning for Stereo Matching”
Object Detection based on Segment Masks
Object detection with deformable part-based models
Data Mining, Neural Network and Genetic Programming
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Adversarial Learning for Neural Dialogue Generation
Understanding and Predicting Image Memorability at a Large Scale
Boosting and Additive Trees (2)
Huazhong University of Science and Technology
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Object detection as supervised classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Machine Learning Feature Creation and Selection
A Convolutional Neural Network Cascade For Face Detection
Statistical Learning Dong Liu Dept. EEIS, USTC.
Bird-species Recognition Using Convolutional Neural Network
Aoxiao Zhong Quanzheng Li Team HMS-MGH-CCDS
Figure 4. Testing minimal configurations with existing models for spatiotemporal recognition. (A-B) A binary classifier is trained to separate a positive.
INF 5860 Machine learning for image classification
Object Detection + Deep Learning
Approaching an ML Problem
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Faster R-CNN By Anthony Martinez.
Outline Background Motivation Proposed Model Experimental Results
Tuning CNN: Tips & Tricks
Object Tracking: Comparison of
RCNN, Fast-RCNN, Faster-RCNN
Machine learning overview
Introduction to Neural Networks
Multiple DAGs Learning with Non-negative Matrix Factorization
Semantic Segmentation
Object Detection Implementations
Jiahe Li
Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,
on Road Signs & Face Detection
An introduction to Machine Learning (ML)
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

On-going research on Object Detection *Some modification after seminar Tackgeun YOU

Contents Baseline Algorithm Observations & Proposals Fast R-CNN Observations & Proposals Fast R-CNN in Microsoft COCO

Object Detection Definition Traditional Pipeline Predict the location/label of objects in the scene Traditional Pipeline Approximate a search space by Sliding Window or Object Proposals Evaluate the approximated regions Non-maximal suppression to get proper regions

R-CNNCVPR 14 Object Proposals Fine-tuned CNN Feature  SVM Approximate search space Fine-tuned CNN Feature  SVM Score each region Bounding Box Regression Refinement region Non-maximal Suppression

Training Pipeline of R-CNN Supervised Pre-training Image-level Annotation in ILSVRC 2012 Domain-specific Fine-tuning Mini-batch with 128 samples 32 Positive samples - Region proposals ≥ 0.5 IoU 96 Negative samples – The rest Object Category Classifier (SVM) Positive – Only GT Negative – 0.3 ≤ IoU Hard Negative Mining Bounding Box Regression Using nearby-samples – maximum overlap in { 0.6 ≥ IoU } Ridge Regression (Regularization is important) Iteration does not improve the result

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Multi-task Loss 𝐿 𝑝, 𝑘 ∗ ,𝑡, 𝑡 ∗ = 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ +𝜆 𝑘 ∗ ≥1 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ Cross-Entropy Loss 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ =− log 𝑝 𝑘 ∗ = − 1 𝑛 𝑛=1 𝑁 𝑝 𝑛 log 𝑝 𝑛 + 1− 𝑝 𝑛 log 1− 𝑝 𝑛 𝑛=1 𝑁 𝑝 𝑛 log 𝑝 𝑛 + 1− 𝑝 𝑛 log 1− 𝑝 𝑛 𝑘 ∗ : true class label 𝑡 ∗ : true bounding box regression target 𝑡= 𝑤 𝑇 𝜙(𝐼) : predicted location

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Multi-task Loss 𝐿 𝑝, 𝑘 ∗ ,𝑡, 𝑡 ∗ = 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ +𝜆 𝑘 ∗ ≥1 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ =− log 𝑝 𝑘 ∗ Smooth Regression Loss 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ = 𝑖∈{𝑥,𝑦,𝑤,ℎ} 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 ( 𝑡 𝑖 − 𝑡 𝑖 ∗ ) 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 𝑥 = 0.5 𝑥 2 , 𝑖𝑓 |𝑥|≤1 𝑥 −0.5 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑘 ∗ : true class label 𝑡 ∗ : true bounding box regression target 𝐺 𝑘 ;𝑘={𝑥,𝑦,𝑤,ℎ} : GT bounding box 𝑃 𝑘 ;𝑘= 𝑥,𝑦,𝑤,ℎ : predicted bounding box 𝑡 𝑥 = 𝐺 𝑥 − 𝑃 𝑥 𝑃 𝑤 𝑡 𝑦 = 𝐺 𝑦 − 𝑃 𝑦 𝑃 ℎ 𝑡 𝑤 = log 𝐺 𝑤 𝑃 𝑤 𝑡 ℎ = log 𝐺 ℎ 𝑃 ℎ Constructed by whitening ground truth 𝑡= 𝑤 𝑇 𝜙(𝐼) : predicted location

Fast R-CNNArXiv15 Smooth Regression Loss 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ = 𝑖∈{𝑥,𝑦,𝑤,ℎ} 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 ( 𝑡 𝑖 − 𝑡 𝑖 ∗ ) 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 𝑥 = 0.5 𝑥 2 , 𝑖𝑓 |𝑥|≤1 𝑥 −0.5 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Training with L2 Loss requires the tuning of learning rate to prevent exploding gradient

Exploring VOC with Fast R-CNN Observation Failed to localize contiguous objects Hypothesis Multiple-objects region has a higher confidence than single-object object Experiment Check that maximum value is on tight object MCMC iteration start from ground truth

Red – ratio(IoU > 0.5) Blue – mean(IoU) Magenta – ratio(IoU > 0.5) Black – ratio(IoU < 0.3)

Hope to achieve below condition Tailoring confidence for precise localization Whole body of a single object (Highest) Partial body of a single object (Positive) Overlapped multiple object ( ? ) Other classes (Lowest)

Detailed Plans Dealing multiple-objects region? How to define multiple-objects region? Using Fast R-CNN Fine-tuning multi-object regions as negative samples Negative Sample on Batch Possible Failure - Decreases the performance, while alleviates the confidence on multiple-object regions. Adopting Proper Loss function Ranking

Microsoft COCO 80-classes Train (82783), Validation (40504) Test (81434) Split #imgs Submission Score Reported Test-Dev ~ 20 K Unlimited Immediately Test-Standard Limited Test-Challenge Workshop Test-Reserve Never

ref. Microsoft COCO: Common Objects in Context

ref. What makes for effective detection proposals?

Fast R-CNN with 1k-MCG proposals 240k-iters (5.8 epoch on train)

Fast R-CNN with 1k-MCG proposals 240k-iters + 130k-iters (6 Fast R-CNN with 1k-MCG proposals 240k-iters + 130k-iters (6.4 epoch on val)

Processing Time of Fast R-CNN Testing Speed With MCG @1k - 1.872 s/image ~21.06 hours @ validation set ~10 hours @ test-dev set ~42.35 hours @ test set Training Speed 0.564 s/iteration ~6.48 hours/epoch_on_training_set

End

Samples http://mscoco.org/explore/?id=407286 http://mscoco.org/explore/?id=161602 http://mscoco.org/explore/?id=123835 http://mscoco.org/explore/?id=242673

Label Difference in Fine-tuning & SVM Domain-specific Fine-tuning Mini-batch with 128 samples 32 Positive samples - Region proposals ≥ 0.5 IoU 96 Negative samples – The rest Object Category Classifier (SVM) Positive – Only GT Negative – {0.0, 0.1, 0.2, 0.3, 0.4, 0.5} ≤ IoU Fitting mAP on validation set 0.0  -4%, 0.5  -5% Hard Negative Mining (Fitting training set is impossible)

Conjecture The definition of positive examples used in fine-tuning does not emphasize precise localization. The softmax classifier was trained on randomly sampled negative examples rather than on the subset of “hard negatives” used for SVM training.

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Fine-tuning by Multi-task Loss Bounding box Regression + Detection