R-CNN By Zhang Liliang.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Lecture 6: Classification & Localization

Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.

DeepID-Net: deformable deep convolutional neural network for generic object detection Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

OverFeat Part1 Tricks on Classification

Large-Scale Object Recognition with Weak Supervision

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

On the Relationship between Visual Attributes and Convolutional Networks Paper ID - 52.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Spatial Pyramid Pooling in Deep Convolutional

From R-CNN to Fast R-CNN

Generic object detection with deformable part-based models

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Detection, Segmentation and Fine-grained Localization

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Fully Convolutional Networks for Semantic Segmentation

Deep Convolutional Nets

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

Feedforward semantic segmentation with zoom-out features

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Unsupervised Visual Representation Learning by Context Prediction

Cascade Region Regression for Robust Object Detection

Lecture 4a: Imagenet: Classification with Localization

Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.

Spatial Localization and Detection

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Lecture 4b Data augmentation for CNN training

City Forensics: Using Visual Elements to Predict Non-Visual City Attributes Sean M. Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala Presented.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Faster R-CNN – Concepts

Object Detection based on Segment Masks

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

Lecture 24: Convolutional neural networks

Training Techniques for Deep Neural Networks

CS6890 Deep Learning Weizhen Cai

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Project Implementation for ITCS4122

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Image Question Answering

Object detection.

Introduction to Neural Networks

Image Classification.

Counting in Dense Crowds using Deep Learning

Object Detection + Deep Learning

On-going research on Object Detection *Some modification after seminar

Im2Calories: towards an automated mobile vision food diary

Faster R-CNN By Anthony Martinez.

Lecture: Deep Convolutional Neural Networks

Outline Background Motivation Proposed Model Experimental Results

RCNN, Fast-RCNN, Faster-RCNN

Abnormally Detection

Deep Object Co-Segmentation

Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

End-to-End Facial Alignment and Recognition

An introduction to Machine Learning (ML)

Presentation transcript:

R-CNN By Zhang Liliang

Main idea: good features are no enough VOC07: mAP:35.1% -> 58.5%

Overview (1) the model of R-CNN (2) the result of R-CNN (3) some discussions Visualizing learned feature in CNN Ablation Studies Object proposal transformations Bounding-box regression Positive vs. Negative examples and softmax

Model OverView (1) Use Selective Search(IJCV13) to get the region proposals (2) for each proposal, use CNN to get the Features(from the fc7 of AlexNet) (3) input the feature to all of the cls-specified SVM to predict scores (4) Rank the scores to get the final solution (5) post-processing: class-specific regression on detection results

Why using segmentation and multi-strategy grouping can find Objs? (a) an image is intrinsically hierarchical (b) color helps to distinguish the objs (c) texture helps to distinguish the objs (d) “enclosed relation” helps to distinguish the objs

Selective Search Selective Search (1) Use [13] to get the smallest segment (2) Use multi-strategy grouping to merge the two segments with the max similarity to form a big segment (3) on and on until all segments merge to the whole image. (4) rank all the segments and generate the bboxes [13] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient GraphBased Image Segmentation. IJCV, 59:167–181, 2004. 1, 3, 4, 5, 7

multi-strategy grouping (1)multiply color space (2) multiply measurements of distances between two segments: color, texture, enclose relation….

How to get a score to rank the segments? Randomness on hierarchy:

Feature Extraction: fc7 at AlexNet warp the region proposal to 227*227( add 16 pixels border as context padding)

Feature Extraction: fc7 at AlexNet use ImageNet classification dataset(120w) to pre-train a model, with 1000 way softmax outputs. Use the pretrain model to fine-tuning a domain-specific CNN, with (N+1) way softmax outputs.(“1” for the background) IoU > 0.5 as positive samples, otherwise as negative samples Learn rate set as 0.001(1/10 of in initial pre-training rate) Batch=128, 32 positive samples and 96 negative samples

Cls-specific L-SVM For each class, train a Linear-SVM to distinguish it from the background. IoU>0.5 as the positive samples IoU<0.3 as the negative samples(why not 0.5? For mAP fell down by 5%)

Results on PASCAL 35.1% -> 58.5%

Result on ImageNet 24.3% -> 31.4%

Discuss1:Visualizing learned feature in CNN 1 neturon at pool5 “see” 195*195 pixel in the image of 227*227

Discuss1:Visualizing learned feature in CNN Select a spesical neutron at pool5 Compute the activations of 10M proposals Rank and select the highest ones of NMS

Discuss2: Ablation studies Without fine-tuning: “Much of the CNN representational power comes from its conv layers” With fine-tuning “most of the improvement is gained from learning domain-specific non-linear classifiers on top of them”

Discuss3:Object proposal transformations Add some context can helps classification(3~5% mAP)

Discuss4: Bounding-box regression After getting the detection bbox, process a regression to local the obj more precisely. (3~4% mAP) It make sense for the 16-pixel context padding in the bbox.

Discuss 5:Positive vs. Negative examples and softmax To train CNN, IoU<0.5 as negative samples, while to train SVM, IoU<0.3 as negative samples. (5% mAP) Softmax vs SVM, 50.9% vs 54.2%, for SVM can use hard negative samples specific for each class.

Thanks