On-going research on Object Detection *Some modification after seminar

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

What is Statistical Modeling

Large-Scale Object Recognition with Weak Supervision

Cooperating Intelligent Systems

Multi-Class Object Recognition Using Shared SIFT Features

COS 429 PS5: Finding Nemo. Exemplar -SVM Still a rigid template,but train a separate SVM for each positive instance For each category it can has exemplar.

Sparse vs. Ensemble Approaches to Supervised Learning

1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.

R-CNN By Zhang Liliang.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Learning from observations

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Object Detection with Discriminatively Trained Part Based Models

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

Today Ensemble Methods. Recap of the course. Classifier Fusion

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Stable Multi-Target Tracking in Real-Time Surveillance Video

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Recognition Using Visual Phrases

Feedforward semantic segmentation with zoom-out features

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Spatial Localization and Detection

Strong Supervision from Weak Annotation: Interactive Training of Deformable Part Models S. Branson, P. Perona, S. Belongie.

Radboud University Medical Center, Nijmegen, Netherlands

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Unsupervised Learning of Video Representations using LSTMs

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Summary of “Efficient Deep Learning for Stereo Matching”

Object Detection based on Segment Masks

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Adversarial Learning for Neural Dialogue Generation

Understanding and Predicting Image Memorability at a Large Scale

Boosting and Additive Trees (2)

Huazhong University of Science and Technology

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

Object detection as supervised classification

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Machine Learning Feature Creation and Selection

A Convolutional Neural Network Cascade For Face Detection

Statistical Learning Dong Liu Dept. EEIS, USTC.

Bird-species Recognition Using Convolutional Neural Network

Aoxiao Zhong Quanzheng Li Team HMS-MGH-CCDS

Figure 4. Testing minimal configurations with existing models for spatiotemporal recognition. (A-B) A binary classifier is trained to separate a positive.

INF 5860 Machine learning for image classification

Object Detection + Deep Learning

Approaching an ML Problem

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Faster R-CNN By Anthony Martinez.

Outline Background Motivation Proposed Model Experimental Results

Tuning CNN: Tips & Tricks

Object Tracking: Comparison of

RCNN, Fast-RCNN, Faster-RCNN

Machine learning overview

Introduction to Neural Networks

Multiple DAGs Learning with Non-negative Matrix Factorization

Semantic Segmentation

Object Detection Implementations

Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,

on Road Signs & Face Detection

An introduction to Machine Learning (ML)

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

On-going research on Object Detection *Some modification after seminar Tackgeun YOU

Contents Baseline Algorithm Observations & Proposals Fast R-CNN Observations & Proposals Fast R-CNN in Microsoft COCO

Object Detection Definition Traditional Pipeline Predict the location/label of objects in the scene Traditional Pipeline Approximate a search space by Sliding Window or Object Proposals Evaluate the approximated regions Non-maximal suppression to get proper regions

R-CNNCVPR 14 Object Proposals Fine-tuned CNN Feature  SVM Approximate search space Fine-tuned CNN Feature  SVM Score each region Bounding Box Regression Refinement region Non-maximal Suppression

Training Pipeline of R-CNN Supervised Pre-training Image-level Annotation in ILSVRC 2012 Domain-specific Fine-tuning Mini-batch with 128 samples 32 Positive samples - Region proposals ≥ 0.5 IoU 96 Negative samples – The rest Object Category Classifier (SVM) Positive – Only GT Negative – 0.3 ≤ IoU Hard Negative Mining Bounding Box Regression Using nearby-samples – maximum overlap in { 0.6 ≥ IoU } Ridge Regression (Regularization is important) Iteration does not improve the result

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Multi-task Loss 𝐿 𝑝, 𝑘 ∗ ,𝑡, 𝑡 ∗ = 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ +𝜆 𝑘 ∗ ≥1 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ Cross-Entropy Loss 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ =− log 𝑝 𝑘 ∗ = − 1 𝑛 𝑛=1 𝑁 𝑝 𝑛 log 𝑝 𝑛 + 1− 𝑝 𝑛 log 1− 𝑝 𝑛 𝑛=1 𝑁 𝑝 𝑛 log 𝑝 𝑛 + 1− 𝑝 𝑛 log 1− 𝑝 𝑛 𝑘 ∗ : true class label 𝑡 ∗ : true bounding box regression target 𝑡= 𝑤 𝑇 𝜙(𝐼) : predicted location

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Multi-task Loss 𝐿 𝑝, 𝑘 ∗ ,𝑡, 𝑡 ∗ = 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ +𝜆 𝑘 ∗ ≥1 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ 𝐿 𝑐𝑙𝑠 𝑝, 𝑘 ∗ =− log 𝑝 𝑘 ∗ Smooth Regression Loss 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ = 𝑖∈{𝑥,𝑦,𝑤,ℎ} 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 ( 𝑡 𝑖 − 𝑡 𝑖 ∗ ) 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 𝑥 = 0.5 𝑥 2 , 𝑖𝑓 |𝑥|≤1 𝑥 −0.5 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑘 ∗ : true class label 𝑡 ∗ : true bounding box regression target 𝐺 𝑘 ;𝑘={𝑥,𝑦,𝑤,ℎ} : GT bounding box 𝑃 𝑘 ;𝑘= 𝑥,𝑦,𝑤,ℎ : predicted bounding box 𝑡 𝑥 = 𝐺 𝑥 − 𝑃 𝑥 𝑃 𝑤 𝑡 𝑦 = 𝐺 𝑦 − 𝑃 𝑦 𝑃 ℎ 𝑡 𝑤 = log 𝐺 𝑤 𝑃 𝑤 𝑡 ℎ = log 𝐺 ℎ 𝑃 ℎ Constructed by whitening ground truth 𝑡= 𝑤 𝑇 𝜙(𝐼) : predicted location

Fast R-CNNArXiv15 Smooth Regression Loss 𝐿 𝑙𝑜𝑐 𝑡, 𝑡 ∗ = 𝑖∈{𝑥,𝑦,𝑤,ℎ} 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 ( 𝑡 𝑖 − 𝑡 𝑖 ∗ ) 𝑠𝑚𝑜𝑜𝑡 ℎ 𝐿 1 𝑥 = 0.5 𝑥 2 , 𝑖𝑓 |𝑥|≤1 𝑥 −0.5 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Training with L2 Loss requires the tuning of learning rate to prevent exploding gradient

Exploring VOC with Fast R-CNN Observation Failed to localize contiguous objects Hypothesis Multiple-objects region has a higher confidence than single-object object Experiment Check that maximum value is on tight object MCMC iteration start from ground truth

Red – ratio(IoU > 0.5) Blue – mean(IoU) Magenta – ratio(IoU > 0.5) Black – ratio(IoU < 0.3)

Hope to achieve below condition Tailoring confidence for precise localization Whole body of a single object (Highest) Partial body of a single object (Positive) Overlapped multiple object ( ? ) Other classes (Lowest)

Detailed Plans Dealing multiple-objects region? How to define multiple-objects region? Using Fast R-CNN Fine-tuning multi-object regions as negative samples Negative Sample on Batch Possible Failure - Decreases the performance, while alleviates the confidence on multiple-object regions. Adopting Proper Loss function Ranking

Microsoft COCO 80-classes Train (82783), Validation (40504) Test (81434) Split #imgs Submission Score Reported Test-Dev ~ 20 K Unlimited Immediately Test-Standard Limited Test-Challenge Workshop Test-Reserve Never

ref. Microsoft COCO: Common Objects in Context

ref. What makes for effective detection proposals?

Fast R-CNN with 1k-MCG proposals 240k-iters (5.8 epoch on train)

Fast R-CNN with 1k-MCG proposals 240k-iters + 130k-iters (6 Fast R-CNN with 1k-MCG proposals 240k-iters + 130k-iters (6.4 epoch on val)

Processing Time of Fast R-CNN Testing Speed With MCG @1k - 1.872 s/image ~21.06 hours @ validation set ~10 hours @ test-dev set ~42.35 hours @ test set Training Speed 0.564 s/iteration ~6.48 hours/epoch_on_training_set

End

Samples http://mscoco.org/explore/?id=407286 http://mscoco.org/explore/?id=161602 http://mscoco.org/explore/?id=123835 http://mscoco.org/explore/?id=242673

Label Difference in Fine-tuning & SVM Domain-specific Fine-tuning Mini-batch with 128 samples 32 Positive samples - Region proposals ≥ 0.5 IoU 96 Negative samples – The rest Object Category Classifier (SVM) Positive – Only GT Negative – {0.0, 0.1, 0.2, 0.3, 0.4, 0.5} ≤ IoU Fitting mAP on validation set 0.0  -4%, 0.5  -5% Hard Negative Mining (Fitting training set is impossible)

Conjecture The definition of positive examples used in fine-tuning does not emphasize precise localization. The softmax classifier was trained on randomly sampled negative examples rather than on the subset of “hard negatives” used for SVM training.

Fast R-CNNArXiv15 Training is single stage (cf. R-CNN) Fine-tuning by Multi-task Loss Bounding box Regression + Detection