Learning Convolutional Feature Hierarchies for Visual Recognition

Slides:

Advertisements

Similar presentations

Active Appearance Models

Advertisements

Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.

Spatial Filtering (Chapter 3)

Topic 6 - Image Filtering - I DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

ImageNet Classification with Deep Convolutional Neural Networks

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Computer Vision Lecture 16: Texture

Patch to the Future: Unsupervised Visual Prediction

Multimedia communications EG 371Dr Matt Roach Multimedia Communications EG 371 and EE 348 Dr Matt Roach Lecture 6 Image processing (filters)

What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.

Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Image classification by sparse coding.

Fast intersection kernel SVMs for Realtime Object Detection

BEYOND SIMPLE FEATURES: A LARGE-SCALE FEATURE SEARCH APPROACH TO UNCONSTRAINED FACE RECOGNITION Nicolas Pinto Massachusetts Institute of Technology David.

Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 6: Low-level features 1 Computational Architectures in Biological.

Image Analysis Preprocessing Arithmetic and Logic Operations Spatial Filters Image Quantization.

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Overview of Back Propagation Algorithm

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Bag of Visual Words for Image Representation & Visual Search Jianping Fan Dept of Computer Science UNC-Charlotte.

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.

EFFICIENT ROAD MAPPING VIA INTERACTIVE IMAGE SEGMENTATION Presenter: Alexander Velizhev CMRT’09 ISPRS Workshop O. Barinova, R. Shapovalov, S. Sudakov,

Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011

Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

CSC508 Convolution Operators. CSC508 Convolution Arguably the most fundamental operation of computer vision It’s a neighborhood operator –Similar to the.

Autonomous Robots Vision © Manfred Huber 2014.

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

A Theoretical Analysis of Feature Pooling in Visual Recognition Y-Lan Boureau, Jean Ponce and Yann LeCun ICML 2010 Presented by Bo Chen.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Digital Image Processing Lecture 16: Segmentation: Detection of Discontinuities May 2, 2005 Prof. Charlene Tsai.

Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Spatial Filtering (Chapter 3) CS474/674 - Prof. Bebis.

Convolutional Neural Network

- photometric aspects of image formation gray level images

Summary of “Efficient Deep Learning for Stereo Matching”

Data Mining, Neural Network and Genetic Programming

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Scale Invariant Feature Transform (SIFT)

Jure Zbontar, Yann LeCun

CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.

Learning Mid-Level Features For Recognition

CS6890 Deep Learning Weizhen Cai

Non-linear classifiers Neural networks

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Computer Vision James Hays

Image Classification.

Level Set Tree Feature Detection

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Mihir Patel and Nikhil Sardana

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Semantic Segmentation

Learned Convolutional Sparse Coding

Presentation transcript:

Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann LeCun NIPS 2010 Presented by Bo Chen

Outline 1. Drawbacks in the Traditional Convolutional Methods 2. The Proposed Algorithm and Some Details 3. Experimental Results 4. Conslusions

Convolutional Sparse Coding Negative: 1. The representations of whole images are highly redundant because the training and the inference are performed at the patch level. 2. The inference for a whole image is computationally expensive.

Solutions 1. Introducing Convolution Operator 2. Introducing Nonlinear Encoder Module

Learning Convolutional Dictionaries 1. The Boundary Effects Due to Convolutions Apply a mask on the derivatives of the reconstruction error: where mask is a term-by-term multiplier that either puts zeros or gradually scales down the boundaries. 2. Computational Efficient Derivative

Learning an Efficient Encoder 1. A New Smooth Shrinkage Operator: 2. To aid faster convergence, use stochastic diagonal Levenberg-Marquardt method to calculate a positive diagonal approximation to the hessian.

Patch Based vs Convolutional Sparse Modeling The convolution operator enables the system to model local structures that appear anywhere in the signal. The convolutional dictionary does not waste resources modeling similar filter structure at multiple locations. Instead, it Models more orientations, frequencies, and different structures including center-surround filters, double center-surround filters, and corner structures at various angles.

Multi-Stage Architecture The convolutional encoder can be used to replace patch-based sparse coding modules used in multistage object recognition architectures. Building on the previous findings, for each stage, the encoder is followed by and absolute value rectification, contrast normalization and average subsampling. Absolute Value Rectification: a simple pointwise absolute value function applied on the output of the encoder. Contrast Normalization: reduce the dependencies between components (feature maps). When used in between layers, the mean and standard deviation is calculated across all feature maps with a 9 × 9 neighborhood in spatial dimensions. Average Pooling: a spatial pooling operation that is applied on each feature map independently.

Experiments 1: Object Recognition Using Caltech 101 Dataset Preprocess: 1. 30/30 training/testing; 2. Resize: 151x143; 3. Local Contrast Normalization Unsupervised Training: Berkeley segmentation dataset Architecture: First Layer: 64 9x9; Pooling: 10 × 10 area with 5 pixel stride. Second Layer: 256 9x9, where each dictionary elementis constrained to connect 16 dictionary elements from the first layer; 6 × 6 area with stride 4.

Recognition Accuracy One Layer Two Layers Ours: 65.8% (0.6)

Pedestrian Detection(1) Original dataset: positive=2416; negative=1218 Augmented: positive= 11370 (1000); negative=9001(1000) Layer-1: 32 7x7; Layer-2: 64 7x7; Pooling: 2x2

Pedestrian Detection(2)

Conclusions 1. Convolutional training of feature extractors reduces the redundancy among filters compared with those obtained from patch based models. 2. Introduced two different convolutional encode functions for performing efficient feature extraction which is crucial for using sparse coding in real world applications. 3. The proposed sparse modeling systems has been applied through a successful multi-stage architecture on object recognition and pedestrian detection problems and performed comparably to similar systems.