Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Limin Wang, Yu Qiao, and Xiaoou Tang

3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.

ImageNet Classification with Deep Convolutional Neural Networks

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

Discriminative and generative methods for bags of features

Recognition: A machine learning approach

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Generic object detection with deformable part-based models

Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

Nonparametric Part Transfer for Fine-grained Recognition Presenter Byungju Kim.

End-to-End Text Recognition with Convolutional Neural Networks

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

A Codebook-Free and Annotation-free Approach for Fine-Grained Image Categorization Authors Bangpeng Yao et al. Presenter Hyung-seok Lee ( 이형석 ) CVPR 2012.

School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.

Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Object detection, deep learning, and R-CNNs

Deep Convolutional Nets

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Introduction to Deep Learning

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

City Forensics: Using Visual Elements to Predict Non-Visual City Attributes Sean M. Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala Presented.

Recent developments in object detection

Big data classification using neural network

Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images

The Relationship between Deep Learning and Brain Function

Compact Bilinear Pooling

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

Data Driven Attributes for Action Detection

Article Review Todd Hricik.

Lecture 24: Convolutional neural networks

Lecture 5 Smaller Network: CNN

Training Techniques for Deep Neural Networks

Deep Belief Networks Psychology 209 February 22, 2013.

CS6890 Deep Learning Weizhen Cai

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

State-of-the-art face recognition systems

A Convolutional Neural Network Cascade For Face Detection

By: Kevin Yu Ph.D. in Computer Engineering

Computer Vision James Hays

Image Classification.

Deep Learning Hierarchical Representations for Image Steganalysis

Fine-Grained Visual Categorization

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Lecture: Deep Convolutional Neural Networks

Outline Background Motivation Proposed Model Experimental Results

Visualizing and Understanding Convolutional Networks

RCNN, Fast-RCNN, Faster-RCNN

Model generalization Brief summary of methods

Heterogeneous convolutional neural networks for visual recognition

Convolutional Neural Network

Introduction to Neural Networks

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Image recognition.

Presentation transcript:

Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by: Paritosh 1

Problem addressed  Authors address the problem of Fine-Grained Recognition  Fine-Grained Recognition – distinguish at subordinate level – intraclass classification  Examples: Bird species, car models, cat breeds, etc.  what specie that bird belongs to? Is that car Mclaren P1 or Mclaren MP4-12C ? 2 (All the images are taken from the paper being presented)

How do we do Fine-Grained (FG) recognition?  By noticing distinguishing parts – parts which are different  In the previous example grills and lights are some parts that help in distinguishing, visually 3

Why not use popular descriptors ?  Popular descriptors:  SIFT  HOG  They are not expressive enough  Doesn’t, reliably, capture intricacies Almost similar HOG patterns 4

Go for very expressive framework  Convolutional Neural Networks or ConvNets or CNN’s are the State-of-the-Art  Very rich descriptions  Highly scalable  Consistently, best performance on the ImageNet 5

Deep Learning  Deep learning is a kind of hierarchical learning paradigm  For example, a neural network with 5 hidden layers; shallow (typical) neural nets are the ones with 1 or 2 hidden layers  Layers and their learned representation:  Input layer – raw data  Hidden layer 1 – learns very low level features from raw data - edges can be a good example  Hidden layer 2 – learns features that are of higher level than Hidden layer 1 – may be learns to identify squares  Level of features increase in a similar fashion as we go on to subsequent hidden layers 6

Deep Learning – CNN’s  CNN’s are deep learners  Combination of  Input layer  Convolutional layers  Activation layers (ReLU layers)  Sub-sampling/pooling layers  Fully connected layers  Contains several of these layers 7

CNN’s 8

back to the paper… The Big picture  Find discriminative parts  Describe these parts using rich descriptions, like the ones obtained from the CNN’s  Rich descriptions should be able to capture the subtle differences  Train a multiclass classifier 9

Dataset, ‘Cars’ - some images 10

Dataset ( annotations - BB’s, labels ) 11

The pipeline for FG recognition  Training:  Step 1: Detect parts using trained part detectors  Step 2: determine the appearance of these parts using CNN’s features. Concatenate these features. Concatenated features form what is known as ELLF (Ensemble of Localized Learned Features)  Step 3: Train Linear SVM’s for classification 12

Step 1: Part discovery & detection  For the purpose of scalability, go for unsupervised learning  Remove the human intervention (in the form of annotations) from the loop  Observation made by the authors:  Objects with same pose can be discovered using local low-level cues  Assumptions:  Images within a set are well aligned  Implies that same parts will have similar locations – for example, given a small set of car images, probability of finding wheels, grills, etc, in similar positions, is very high 13

Part discovery contd…  Discovering aligned images  Step 1: Randomly pick a seed image  Step 2: Perform Grabcut – removes background efficiently – we are left with just car, no background  Step 3: Find HOG features for the car images  Step 4: find nearest neighbors, based on HOG features  Repeat the above process multiple times  Results in set of images with more or less same pose 14

Part selection  Step 1: randomly sample large no. of regions of various sizes  Step 2: select parts which high variance in their HOG features  Step 3: keep the parts which have high variance in HOG features (these are the ones useful for discrimination)  Step 4: get rid of redundant parts  At last, they have 10 parts, started with

Detector learning  Equivalently, find template that minimize hinge loss (a loss function) – difference between true label and the outcome of the classifier  Essentially, this is telling the detector, what is positive and what is negative, based on HOG features  We start this learning under the assumption that images are well aligned  Once trained, we retrain, only this time we relax the assumption that images are well aligned  We do this by allowing the usage of true positive patch location – because, remember, we assumed that the parts will be occupying similar positions and not the same  If true positions are too far away, they are penalized  In effect, this is Gaussian distribution about the original location (the one under the well alignment assumption)  Helps in neglecting parts with appearance similar to positive patches 16

17

Ensemble of parts  Remember that there are multiple parts as an outcome part discovery  Learn detector for every part  Consequently, we would have a collection pf part detector, each detecting a specific – in cars, wheels, grills, etc.  Authors call this Ensemble of parts. 18

Feature Learning  Take AlexNet, CNN implementation of Alex Krizhevsky et al.  Tweak (for details refer the paper) to account for smaller dataset  Train the modified CNN – inputs are whole images from the dataset  Retain only the first two convolutional layers  Unlike regular CNN’s which do pooling at fixed positions, we do max pooling, of the descriptors, in the regions marked by bounding boxes (BB’s), which are obtained from part detectors  This yields us Ensemble of Local Learned Features (ELLF)  Feed these ELLF’s to a linear SVM – final classifier  Overfitting is taken care of 19

Experiments  Some of intuitive details were included in the previous slides  For minute details, CNN parameters and their tuning, please refer the paper  One important observation regarding number of parts used in ELLF features (different from part discovery)  100 parts is where ELLF significantly gets ahead of CNN  1000 parts – ELLF almost reaches saturation  Performance is hurt when GrabCut performs poorly 20