On-going research on Object Detection *Some modification after seminar

On-going research on Object Detection *Some modification after seminar
Tackgeun YOU

Contents Baseline Algorithm Observations & Proposals
Fast R-CNN Observations & Proposals Fast R-CNN in Microsoft COCO

Object Detection Definition Traditional Pipeline
Predict the location/label of objects in the scene Traditional Pipeline Approximate a search space by Sliding Window or Object Proposals Evaluate the approximated regions Non-maximal suppression to get proper regions

R-CNNCVPR 14 Object Proposals Fine-tuned CNN Feature  SVM
Approximate search space Fine-tuned CNN Feature  SVM Score each region Bounding Box Regression Refinement region Non-maximal Suppression

Training Pipeline of R-CNN
Supervised Pre-training Image-level Annotation in ILSVRC 2012 Domain-specific Fine-tuning Mini-batch with 128 samples 32 Positive samples - Region proposals ≥ 0.5 IoU 96 Negative samples – The rest Object Category Classifier (SVM) Positive – Only GT Negative – 0.3 ≤ IoU Hard Negative Mining Bounding Box Regression Using nearby-samples – maximum overlap in { 0.6 ≥ IoU } Ridge Regression (Regularization is important) Iteration does not improve the result

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Multi-task Loss
𝐿 𝑝, 𝑘 ∗ ,𝑡, 𝑡 ∗ = 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ +𝜆 𝑘 ∗ ≥1 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ Cross-Entropy Loss 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ =− log 𝑝 𝑘 ∗ = − 1 𝑛 𝑛=1 𝑁 𝑝 𝑛 log 𝑝 𝑛 + 1− 𝑝 𝑛 log 1− 𝑝 𝑛 𝑛=1 𝑁 𝑝 𝑛 log 𝑝 𝑛 + 1− 𝑝 𝑛 log 1− 𝑝 𝑛 𝑘 ∗ : true class label 𝑡 ∗ : true bounding box regression target 𝑡= 𝑤 𝑇 𝜙(𝐼) : predicted location

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Multi-task Loss
𝐿 𝑝, 𝑘 ∗ ,𝑡, 𝑡 ∗ = 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ +𝜆 𝑘 ∗ ≥1 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ =− log 𝑝 𝑘 ∗ Smooth Regression Loss 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ = 𝑖∈{𝑥,𝑦,𝑤,ℎ} 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 ( 𝑡 𝑖 − 𝑡 𝑖 ∗ ) 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 𝑥 = 0.5 𝑥 2 , 𝑖𝑓 |𝑥|≤1 𝑥 −0.5 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑘 ∗ : true class label 𝑡 ∗ : true bounding box regression target 𝐺 𝑘 ;𝑘={𝑥,𝑦,𝑤,ℎ} : GT bounding box 𝑃 𝑘 ;𝑘= 𝑥,𝑦,𝑤,ℎ : predicted bounding box 𝑡 𝑥 = 𝐺 𝑥 − 𝑃 𝑥 𝑃 𝑤 𝑡 𝑦 = 𝐺 𝑦 − 𝑃 𝑦 𝑃 ℎ 𝑡 𝑤 = log 𝐺 𝑤 𝑃 𝑤 𝑡 ℎ = log 𝐺 ℎ 𝑃 ℎ Constructed by whitening ground truth 𝑡= 𝑤 𝑇 𝜙(𝐼) : predicted location

Fast R-CNNArXiv15 Smooth Regression Loss
𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ = 𝑖∈{𝑥,𝑦,𝑤,ℎ} 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 ( 𝑡 𝑖 − 𝑡 𝑖 ∗ ) 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 𝑥 = 0.5 𝑥 2 , 𝑖𝑓 |𝑥|≤1 𝑥 −0.5 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Training with L2 Loss requires the tuning of learning rate to prevent exploding gradient

Exploring VOC with Fast R-CNN
Observation Failed to localize contiguous objects Hypothesis Multiple-objects region has a higher confidence than single-object object Experiment Check that maximum value is on tight object MCMC iteration start from ground truth

Red – ratio(IoU > 0.5) Blue – mean(IoU) Magenta – ratio(IoU > 0.5) Black – ratio(IoU < 0.3)

Hope to achieve below condition
Tailoring confidence for precise localization Whole body of a single object (Highest) Partial body of a single object (Positive) Overlapped multiple object ( ? ) Other classes (Lowest)

Detailed Plans Dealing multiple-objects region?
How to define multiple-objects region? Using Fast R-CNN Fine-tuning multi-object regions as negative samples Negative Sample on Batch Possible Failure - Decreases the performance, while alleviates the confidence on multiple-object regions. Adopting Proper Loss function Ranking

Microsoft COCO 80-classes Train (82783), Validation (40504)
Test (81434) Split #imgs Submission Score Reported Test-Dev ~ 20 K Unlimited Immediately Test-Standard Limited Test-Challenge Workshop Test-Reserve Never

ref. Microsoft COCO: Common Objects in Context

ref. What makes for effective detection proposals?

Fast R-CNN with 1k-MCG proposals 240k-iters (5.8 epoch on train)

Fast R-CNN with 1k-MCG proposals 240k-iters + 130k-iters (6
Fast R-CNN with 1k-MCG proposals 240k-iters + 130k-iters (6.4 epoch on val)

Processing Time of Fast R-CNN
Testing Speed With s/image ~21.06 validation set ~10 test-dev set ~42.35 test set Training Speed 0.564 s/iteration ~6.48 hours/epoch_on_training_set

Samples http://mscoco.org/explore/?id=407286

Label Difference in Fine-tuning & SVM
Domain-specific Fine-tuning Mini-batch with 128 samples 32 Positive samples - Region proposals ≥ 0.5 IoU 96 Negative samples – The rest Object Category Classifier (SVM) Positive – Only GT Negative – {0.0, 0.1, 0.2, 0.3, 0.4, 0.5} ≤ IoU Fitting mAP on validation set 0.0  -4%, 0.5  -5% Hard Negative Mining (Fitting training set is impossible)

Conjecture The definition of positive examples used in fine-tuning does not emphasize precise localization. The softmax classifier was trained on randomly sampled negative examples rather than on the subset of “hard negatives” used for SVM training.

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN)
Fine-tuning by Multi-task Loss Bounding box Regression + Detection

On-going research on Object Detection *Some modification after seminar

Similar presentations

Presentation on theme: "On-going research on Object Detection *Some modification after seminar"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On-going research on Object Detection *Some modification after seminar

Similar presentations

Presentation on theme: "On-going research on Object Detection *Some modification after seminar"— Presentation transcript:

Similar presentations

About project

Feedback