R-CNN By Zhang Liliang.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Lecture 6: Classification & Localization
Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.
DeepID-Net: deformable deep convolutional neural network for generic object detection Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
OverFeat Part1 Tricks on Classification
Large-Scale Object Recognition with Weak Supervision
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
On the Relationship between Visual Attributes and Convolutional Networks Paper ID - 52.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Spatial Pyramid Pooling in Deep Convolutional
From R-CNN to Fast R-CNN
Generic object detection with deformable part-based models
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Detection, Segmentation and Fine-grained Localization
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Fully Convolutional Networks for Semantic Segmentation
Deep Convolutional Nets
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Feedforward semantic segmentation with zoom-out features
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Unsupervised Visual Representation Learning by Context Prediction
Cascade Region Regression for Robust Object Detection
Lecture 4a: Imagenet: Classification with Localization
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
Spatial Localization and Detection
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Lecture 4b Data augmentation for CNN training
City Forensics: Using Visual Elements to Predict Non-Visual City Attributes Sean M. Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala Presented.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Faster R-CNN – Concepts
Object Detection based on Segment Masks
Object detection with deformable part-based models
Data Mining, Neural Network and Genetic Programming
Lecture 24: Convolutional neural networks
Training Techniques for Deep Neural Networks
CS6890 Deep Learning Weizhen Cai
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Project Implementation for ITCS4122
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Image Question Answering
Object detection.
Introduction to Neural Networks
Image Classification.
Counting in Dense Crowds using Deep Learning
Object Detection + Deep Learning
On-going research on Object Detection *Some modification after seminar
Im2Calories: towards an automated mobile vision food diary
Faster R-CNN By Anthony Martinez.
Lecture: Deep Convolutional Neural Networks
Papers 15/08.
Outline Background Motivation Proposed Model Experimental Results
RCNN, Fast-RCNN, Faster-RCNN
Abnormally Detection
Deep Object Co-Segmentation
Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
End-to-End Facial Alignment and Recognition
Jiahe Li
An introduction to Machine Learning (ML)
Presentation transcript:

R-CNN By Zhang Liliang

Main idea: good features are no enough VOC07: mAP:35.1% -> 58.5%

Overview (1) the model of R-CNN (2) the result of R-CNN (3) some discussions Visualizing learned feature in CNN Ablation Studies Object proposal transformations Bounding-box regression Positive vs. Negative examples and softmax

Model OverView (1) Use Selective Search(IJCV13) to get the region proposals (2) for each proposal, use CNN to get the Features(from the fc7 of AlexNet) (3) input the feature to all of the cls-specified SVM to predict scores (4) Rank the scores to get the final solution (5) post-processing: class-specific regression on detection results

Why using segmentation and multi-strategy grouping can find Objs? (a) an image is intrinsically hierarchical (b) color helps to distinguish the objs (c) texture helps to distinguish the objs (d) “enclosed relation” helps to distinguish the objs

Selective Search Selective Search (1) Use [13] to get the smallest segment (2) Use multi-strategy grouping to merge the two segments with the max similarity to form a big segment (3) on and on until all segments merge to the whole image. (4) rank all the segments and generate the bboxes [13] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient GraphBased Image Segmentation. IJCV, 59:167–181, 2004. 1, 3, 4, 5, 7

multi-strategy grouping (1)multiply color space (2) multiply measurements of distances between two segments: color, texture, enclose relation….

How to get a score to rank the segments? Randomness on hierarchy:

Feature Extraction: fc7 at AlexNet warp the region proposal to 227*227( add 16 pixels border as context padding)

Feature Extraction: fc7 at AlexNet use ImageNet classification dataset(120w) to pre-train a model, with 1000 way softmax outputs. Use the pretrain model to fine-tuning a domain-specific CNN, with (N+1) way softmax outputs.(“1” for the background) IoU > 0.5 as positive samples, otherwise as negative samples Learn rate set as 0.001(1/10 of in initial pre-training rate) Batch=128, 32 positive samples and 96 negative samples

Cls-specific L-SVM For each class, train a Linear-SVM to distinguish it from the background. IoU>0.5 as the positive samples IoU<0.3 as the negative samples(why not 0.5? For mAP fell down by 5%)

Results on PASCAL 35.1% -> 58.5%

Result on ImageNet 24.3% -> 31.4%

Discuss1:Visualizing learned feature in CNN 1 neturon at pool5 “see” 195*195 pixel in the image of 227*227

Discuss1:Visualizing learned feature in CNN Select a spesical neutron at pool5 Compute the activations of 10M proposals Rank and select the highest ones of NMS

Discuss2: Ablation studies Without fine-tuning: “Much of the CNN representational power comes from its conv layers” With fine-tuning “most of the improvement is gained from learning domain-specific non-linear classifiers on top of them”

Discuss3:Object proposal transformations Add some context can helps classification(3~5% mAP)

Discuss4: Bounding-box regression After getting the detection bbox, process a regression to local the obj more precisely. (3~4% mAP) It make sense for the 16-pixel context padding in the bbox.

Discuss 5:Positive vs. Negative examples and softmax To train CNN, IoU<0.5 as negative samples, while to train SVM, IoU<0.3 as negative samples. (5% mAP) Softmax vs SVM, 50.9% vs 54.2%, for SVM can use hard negative samples specific for each class.

Thanks