Outline Background Motivation Proposed Model Experimental Results

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Regionlets for Generic Object Detection

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Limin Wang, Yu Qiao, and Xiaoou Tang

Steerable Part Models Hamed Pirsiavash and Deva Ramanan

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Large-Scale Object Recognition with Weak Supervision

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Spatial Pyramid Pooling in Deep Convolutional

What, Where & How Many? Combining Object Detectors and CRFs

From R-CNN to Fast R-CNN

Generic object detection with deformable part-based models

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Detection, Segmentation and Fine-grained Localization

Object Detection with Discriminatively Trained Part Based Models

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Fully Convolutional Networks for Semantic Segmentation

Face Alignment at 3000fps via Regressing Local Binary Features CVPR14 Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun Presented by Sung Sil Kim.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Recognition Using Visual Phrases

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

Regionlets for Generic Object Detection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 10, OCTOBER 2015 Xiaoyu Wang, Ming.

Cascade Region Regression for Robust Object Detection

Fine-grained Fine-grained Recognition( 细粒度分类 ) 沈志强.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Spatial Localization and Detection

Deep Residual Learning for Image Recognition

1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Big data classification using neural network

Deep Residual Learning for Image Recognition

The Relationship between Deep Learning and Brain Function

Object Detection based on Segment Masks

Compact Bilinear Pooling

Object detection with deformable part-based models

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Learning Mid-Level Features For Recognition

Regularizing Face Verification Nets To Discrete-Valued Pain Regression

YOLO9000:Better, Faster, Stronger

Compositional Human Pose Regression

Huazhong University of Science and Technology

CS6890 Deep Learning Weizhen Cai

Deep Residual Learning for Image Recognition

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Object detection.

A Convolutional Neural Network Cascade For Face Detection

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

SBNet: Sparse Blocks Network for Fast Inference

Object Detection + Deep Learning

Oral presentation for ACM International Conference on Multimedia, 2014

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

Object Tracking: Comparison of

RCNN, Fast-RCNN, Faster-RCNN

Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Human-object interaction

Deep Object Co-Segmentation

Natalie Lang Tomer Malach

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning and Memorization

Object Detection Implementations

Deep Structured Scene Parsing by Learning with Image Descriptions

Presentation transcript:

Joint Deep Feature and Dictionary Pair Classifier Learning for Object Detection

Outline Background Motivation Proposed Model Experimental Results Conclusion

Outline Background Motivation Proposed Model Experimental Results Conclusion

Background Object Recognition Which objects and where? Horse Person

Challenges The large variation of object appearances Scene clutter Changes in shape, scale and viewpoint Photometric effects

Detect object via Deep Convolutional Neural Networks Girshick et al.[4] proposed to convert object detection into region classification, and employ CNNs to extract features.

Spatial Pyramid Pooling (SPP)-Net Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” PAMI, 2015.

Outline Background Motivation Proposed Model Experimental Results Conclusion

Existing Technical Issue Recent Studies all focus on developing novel DNNs Improving classification performance may also be a effective way Sufficient training data is required Annotating object is bothering and boring Semi or even weakly supervised learning strategy is needed to enable object detection for practical use.

Joint deep feature and dictionary pair classifier learning Motivation Joint deep feature and dictionary pair classifier learning Regard the efficient and effective dictionary pair learning Model [6] as a classifier layer for CNNs. Propose a joint training mechanism to benefit both feature learning and classifier training

Motivation Propose a structure model to recognize object in a coarse-and-fine manner Objectness: Distinguish objects of any category from the background coarsely Categoryness: Calculate the probability belonging to certain object category in a fine manner

Dictionary Pair Classifier Driven Accept By CVPR 2016 Dictionary Pair Classifier Driven Architecture Joint deep feature and dictionary pair classifier learning Recognize object in a coarse-and-fine manner The four values inside 2*2 grids are corresponding to the four input regions

Dictionary Pair Classifier Layer 𝐗=[𝐗 𝟏 ,…, 𝐗 k ,…, 𝐗 𝐊 ] denotes a set of the previous layer’s d-dimensional outputs for 𝐊 object categories; 𝐖 k is the importance weight for k-th category training samples. The goal: Find an analysis dictionary 𝐏 and a synthesis dictionary 𝐃 to analytically code and reconstruct 𝐗. 𝐗 k denotes the complementary data matrix of 𝐗 k from the whole training samples 𝐗.

Dictionary Pair Classifier Layer Learning of the Dictionary Pair 𝐗 k denotes the complementary data matrix of 𝐗 k from the whole training samples 𝐗.

Dictionary Pair Classifier Layer Dictionary Pair Back Propagation 𝐗 k denotes the complementary data matrix of 𝐗 k from the whole training samples 𝐗.

Objectness Dictionary Pair Layer The probability of x covering an object of any category { 𝐃 𝑜 , 𝐏 𝑜 } denotes dictionary pairs for object while { 𝐃 𝑏 , 𝐏 𝑏 } for background. k ∈{𝑜, 𝑏}

Categoryness Dictionary Pair Layer Representing how likely the feature x belongs to k-th category The final object detection score:

The cost function for training network parameters 𝜔 is Model Fine-tuning The cost function for training network parameters 𝜔 is Standard Back Propagation is employed Let 𝜙 denote the function of the CNN layers and 𝐈 𝑖 denote the input region with the category label 𝑦 𝑖 , we have the feature x = 𝜙( 𝐈 𝑖 ; 𝜔).

Experimental Setting Datasets PASCAL VOC 2007/2012 20 object categories Challenging: unbalance, large deformations

Experiment Results ODP -- Objectness Dictionary Pair layer CDP -- Categoryness Dictionary Pair layer -- BB -- Bounding Box Regression -- R-CNN (Alex) --R-CNN framework with Alex Net R-CNN (VGG) -- R-CNN framework with VGG Net

Experiment Results The significance of dictionary pair back propagation algorithm (DPBP) for fine-tuning the networks.

Qualitative Comparison Our model can localize the bounding box with less background. R-CNN[4] Ours

Qualitative Comparison Our method can recognize more objects R-CNN[4] Ours

Qualitative Comparison Our model can obtain better detection results. R-CNN[4] Ours

Empirical Analysis The effectiveness of predefined weights for training samples 1) To clarify significance of the proposed dictionary pair back propagation algorithm (DPBP) for fine-tuning the networks, we fix all the dictionary pairs in ODP and CDP. Under this circumstance, only the parameters of the networks are fine-tuned via the standard back propagation algorithm (BP). Then we denote this model as “BP”, and compare it with the original version “DPBP” on the test error rate, which is defined as one minus the classification accuracy. In Figure. 4, we visualize the test error rates on validation sets with different iterations of the two models from the same initialization. We observe that the error rates of DPBP decreases faster and reaches to the lower error rate after updating the dictionary pair in the 100-th iteration, compared with the dictionary pair fixed. 2) To demonstrate the effectiveness of predefined weights for training samples, we set all weights equal to 1 in our model. That is, the training samples have the same weights during the dictionary pair training inside ODP and CDP. We denote this model as “Ours (equal weights)”, and compare it with the original version “Ours (full)” by test error rate with the same initialization. Every 100 iteration, we optimize the dictionary pairs in both ODP and CDP. As illustrated in Figure , the test error rates of “Ours (without weights)” increase significantly when training dictionary pairs on the 200-th and 300-th iterations, compared with the original version. This reason is that the category specific dictionary pair tries to represent the parts of other category or background inside the training examples. By means of regarding the IoU as the predefined weights of training samples, this phenomenon can be suppressed to some extent. From Figure. 5, one can see that the introduced weights can make the training phase stable and achieve a lower error rate.

Conclusion Present a joint deep CNN feature and dictionary pair classifier learning framework. Develop a deep structure model to detect objects in a coarse(objectness) and fine (categoryness) manner.

Thank you!

Reference [1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained partbased models,” PAMI, 2010. [2] K. E. van de Sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders, “Segmentation as selective search for object recognition,” in ICCV, 2011. [3] X. Wang, M. Yang, S. Zhu, and Y. Lin, “Regionlets for generic object detection,” in ICCV, 2013. [4] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in CVPR, 2014. [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” PAMI, 2015. [6] S. Gu, L. Zhang, W. Zuo, and X. Feng. “Projective dictionary pair learning for pattern classification,” In Advances in neural information processing systems, 2014.