What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.

Slides:

Advertisements

Similar presentations

CS590M 2008 Fall: Paper Presentation

Advertisements

Advanced topics.

Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009

Deconvolutional Networks

Learning Deep Energy Models

Learning Convolutional Feature Hierarchies for Visual Recognition

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Distributed Representations of Sentences and Documents

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Curriculum Learning Yoshua Bengio, U. Montreal Jérôme Louradour, A2iA

Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.

Mentor Prof. Amitabha Mukerjee Deepak Pathak Kaustubh Tapi 10346

A Theoretical Analysis of Feature Pooling in Visual Recognition Y-Lan Boureau, Jean Ponce and Yann LeCun ICML 2010 Presented by Bo Chen.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Convolutional Neural Network

Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Yann LeCun Learning Invariant Feature Hierarchies Learning Invariant Feature Hierarchies Yann LeCun The Courant Institute of Mathematical Sciences Center.

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Big data classification using neural network

Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images

Learning Deep Generative Models by Ruslan Salakhutdinov

Convolutional Neural Network

Deep Learning Amin Sobhani.

Energy models and Deep Belief Networks

ECE 5424: Introduction to Machine Learning

Learning Deep L0 Encoders

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Learning Mid-Level Features For Recognition

Article Review Todd Hricik.

Matt Gormley Lecture 16 October 24, 2016

Restricted Boltzmann Machines for Classification

Combining CNN with RNN for scene labeling (segmentation)

Deep Learning Yoshua Bengio, U. Montreal

Intelligent Information System Lab

Machine Learning Basics

Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.

ECE 6504 Deep Learning for Perception

Deep learning and applications to Natural language processing

Dipartimento di Ingegneria «Enzo Ferrari»

Computer Vision James Hays

Convolutional Neural Networks for sentence classification

Introduction to Neural Networks

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

Logistic Regression & Parallel SGD

Deep learning Introduction Classes of Deep Learning Networks

Convolutional neural networks Abin - Roozgard.

On Convolutional Neural Network

Outline Background Motivation Proposed Model Experimental Results

Problems with CNNs and recent innovations 2/13/19

Neural networks (3) Regularization Autoencoder

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Presentation transcript:

What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.(Cited by 396 till )

Usual architecture of the neural networks Each part of the neural networks Unsupervised learning conception Experiment Contribution of this paper

Deep learning methods aims at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features[2] Neural Networks with many hidden layers Graphical Models with many levels of hidden layers Other methods Deep Learning Methods [2]Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. Why Does Unsupervised Pre-training Help Deep Discriminant Learning?.

Usual architecture of neural networks Non-linear Operation: Quantization, Winner-take-all, Sparsification, Normalization, S-function Pooling Operation: Max, average, histogramming operator Classifier: Neural Networks(NN), k-Nearest Neighbor(KNN), Support Vector Machine(SVM), Logistic Regression(LR)

This paper addresses three questions: How does the non-linearities that follow the filter banks influence the recognition accuracy? Does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? Is there any advantage to using an architecture with two stages of feature extraction, rather than one? Questions to address

To address these three questions, they experimented with various combinations of architectures: One stage or two stages of feature extraction Different types of non-linearities Different types of filters Different filter learning methods(random, unsupervised and supervised) Test Dataset: Caltech-101 dataset; NORB object dataset; MNIST dataset Experiments Architecture

Filter Bank Layers(F CSG ) Local Contrast Normalization Layer(N) Pooling and Subsampling Layer(PA or PM) Model Architecture

The module computes: Filter Bank Layer(F CSG ) * is the convolution operater, tanh is hyperbolic tangent non-linearity, g is a trainable scalar coefficient. Output size: assume each map is n1 x n2, each kernel is l1 x l2, then the output y is (n1-l1+1) x (n2-l2+1) The kernel here could be either supervised trained or unsupervised pre- trained

Local Contrast Normalization Layer(N) C is the mean( ) I am not quiet understand this part W pq is Gaussian weighting window

Local Contrast Normalization Layer(N) The result of this module: It seems like this module is doing edge extraction

Pooling and Subsampling Layer(PA or PM) For each of the small area: Where is a uniform weighting window or max weighting window Each output feature map is then subsampled spatially by a factory S horizontally and vertically

Combine Modules There could be three types of architectures of this network: F CSG ---- PA F CSG ---- N ---- PA F CSG ---- PM

Training Protocol Random Features and Supervised Classifier – R and RR Unsupervised Features, Supervised Classifier - U and UU Random Features, Global Supervised Refinement - R+ and R+R+ Unsupervised Feature, Global Supervised Refinement U+ and U+U+

Unsupervised Training of Filter Banks For a given input X, a matrix W whose columns are the dictionary elements, feature vector Z ∗ is obtained by minimizing the following energy function where λ is a sparsity hyper-parameter. For any input X, one needs to run a rather expensive optimization algorithm to find Z ∗, To alleviate the problem, the PSD method is imported.

Predictive Sparse Decomposition(PSD)[3] [3] Kavukcuoglu, Koray, Marc'Aurelio Ranzato, and Yann LeCun. "Fast inference in sparse coding algorithms with applications to object recognition." arXiv preprint arXiv: (2010).(cited by 94) where S ∈ R m×n is a ﬁlter matrix, D ∈ R m is a vector of biases

Result

Why does Unsupervised Pre-training Help Deep Discriminant Learning?[2]

Reference of the graph [2]Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. Why Does Unsupervised Pre-training Help Deep Discriminant Learning?. [3] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554. [4] Zhu, L., Chen, Y., & Yuille, A. (2009). Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 114–128. [5] Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi- supervised embedding. Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08) (pp. 1168–1175). New York, NY, USA: ACM. [6]LeCun, Yann, et al. "Gradient-based learning applied to document recognition."Proceedings of the IEEE (1998): [7] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554.

non-convex function In deep learning, the objective function is usually a highly non-convex function of the parameters, so there must be many local minima in the model parameter space Supervised Learning use a fix point or a random point as the initialization. So in some or most situations, it converges at a local minima

Local Minima

Random Initialization

Unsupervised Pre-training

Reason There are a few reasonable hypotheses why pre-training might work. One possibility that unsupervised pre-training acts as a kind of regularizer, putting the parameter values in the appropriate range for discriminant training Another possibility, is that pre-training initializes the model to a point in parameter space that somehow renders the optimization process more effective, in the sense of achieving a lower minimum of the empirical cost function.

Conclusion Future work should clarify this hypothesis. Understanding and improving deep architectures remains a challenge. This work helps with such understanding via extensive simulations and puts forward and confirms a hypothesis explaining the mechanisms behind the effect of unsupervised pre-training for the final discriminant learning task.

Reference [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.(Cited by 396 till ) [2]Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. Why Does Unsupervised Pre-training Help Deep Discriminant Learning?. [3] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554. [4] Zhu, L., Chen, Y., & Yuille, A. (2009). Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 114–128. [5] Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08) (pp. 1168–1175). New York, NY, USA: ACM. [6]LeCun, Yann, et al. "Gradient-based learning applied to document recognition."Proceedings of the IEEE (1998): [7] Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation,18, 1527–1554.

Thank You!