AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1.

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Scalable Learning in Computer Vision
Deep Learning Bing-Chen Tsai 1/21.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
POSTER TEMPLATE BY: Multi-Sensor Health Diagnosis Using Deep Belief Network Based State Classification Prasanna Tamilselvan.
Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,
What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Image Classification using Sparse Coding: Advanced Topics
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk
CSE 185 Introduction to Computer Vision Pattern Recognition.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
A shallow introduction to Deep Learning
Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Introduction to Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Learning Hierarchical Features for Scene Labeling
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Learning Amin Sobhani.
Energy models and Deep Belief Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Learning Mid-Level Features For Recognition
Article Review Todd Hricik.
Restricted Boltzmann Machines for Classification
Multimodal Learning with Deep Boltzmann Machines
Deep learning and applications to Natural language processing
Unsupervised Learning and Autoencoders
Machine Learning: The Connectionist
Computer Vision James Hays
Introduction to Neural Networks
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Goodfellow: Chapter 14 Autoencoders
Introduction of MATRIX CAPSULES WITH EM ROUTING
network of simple neuron-like computing elements
Age and Gender Classification using Convolutional Neural Networks
Visualizing and Understanding Convolutional Networks
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/2014 1

Outline 2  Introduction  Framework for feature learning  Unsupervised feature learning algorithms  Effect of some parameters  Experiments and analysis on the results

Introduction 3  1. Much work focused on employing complex unsupervised feature learning algorithm.  2. Simple factors, such as the number of hidden nodes may be more important to achieving high performance than the learning algorithm or the depth of the model.  3. Using only one single layer network can get very good feature learning results.

Unsupervised feature learning framework 4 1>. extract random patches from unlabeled training images (choose image as example) 2>. apply a pre-processing stage to the patches 3>. learn a feature-mapping using an unsupervised feature learning algorithm 4>. extract features from equally spaced sub-patches covering the input images 5>. pool features together to reduce the number of feature values 6>. train a linear classifier to predict the labels given the feature vectors

Unsupervised learning algorithm 5  1. Sparse autoencoder  2. Sparse restricted Boltzmann machine  3. K-means clustering  4. Gaussian mixture models clustering

Sparse auto-encoder 6  Objective function (minimize):  Feature mapping function:

Sparse restricted Boltzman machine 7  Energy function of an RBM is :  The same type of sparsity penalty can be added like in the sparse autoencoder  Sparse RBMs can be trained using a contrastive divergence approximation [7]  Feature mapping function:

K-means clustering 8  Object function for learning K centroids  Feature mapping function  1> hard-assignment  2> soft-assignment

GMM clustering 9  Gaussian mixture models: A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.

GMM(Gaussian mixture models) 10

EM algorithm 11  EM(expectation-maximization) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models.  E-step : assign points to clusters  M-step : estimate model parameters

Gaussian mixtures 12  Feature mapping function:

Feature extraction and classification 13  Convolutional feature extraction and pooling(sum)  Classification : (L2) SVM

Data 14  1. CIFAR-10 (this data is used to tune the parameters)  2. NORB  3. downsampled STL(96*96 --> 32*32)

CIFAR10 dataset 15 The CIFAR-10 dataset consists of x32 colour images in 10 classes, with 6000 images per class. There are training images and test images [3]

NORB dataset 16  This dataset is intended for experiments in 3D object recognition from shape. It contains images of 50 toys belonging to 5 generic categories: animals, human figures, airplanes, trucks, and cars. 24,300 training image pairs (96*96), test image pairs. [4]

STL-10 dataset 17 The STL-10 dataset consists of x64 color images and 3200 test images in 4 classes, airplane, cat, car and dog. There are training images and test images. [5]

Effect elements 18  1. with or without whitening  2. number of features  3. stride(spacing between patches)  4. receptive field size

Effect of whitening 19  Result of whitening: 1. the features are less correlated with each other 2. the features all have the same variance  For sparse autoencoder and sparse RBM when using only 100 features, significant benefit from whitening preprocessing when the number of features getting bigger, the advantage disappeared  For clustering algorithms The whitening is a must have step because they cannot handle the correlations in the data.

Effect of number of features 20  Num of features used: 100, 200, 400, 800, 1600  All algorithms generally achieved higher performance by learning more features

Effect of stride 21  Stride is the spacing between patches where feature values will be extracted  Downward performance with increasing step size

Effect of receptive field size 22  Receptive field size is the patch size.  Overall, the 6 pixel receptive field size worked best.

Classification results 23 AlgorithmAccuracy Raw pixels 3-way factored RBM (3 layers) Mean-covariance RBM (3 layers) Improved Local Coord. Coding Conv. Deep Belief Net (2 layers) 37.3% 65.3% 71.0% 74.5% 78.9% Sparse auto-encoder Sparse RBM K-means (Hard) K-means (Triangle, 1600 features) k-means (Triangle, 4000 features) 73.4% 72.4% 68.6% 77.9% 79.6% Table 1: Test recognition accuracy on CIFAR-10 stride = 1, receptive field = 6, with whitening, large number of features

Classification results 24 AlgorithmAccuracy(error) Conv. Neural Network Deep Boltzmann Machine Deep Belief Network Best result of [6] Deep neural network 93.4% (6.6%) 92.8% (7.2%) 95.0% (5.0%) 94.4% (5.0%) 97.13% (2.87%) Sparse auto-encoder Sparse RBM K-means (Hard) K-means (Triangle, 1600 features) k-means (Triangle, 4000 features) 96.9% (3.1%) 96.2% (3.8%) 96.9% (3.1%) 97.0% (3.0%) 97.21% (2.79%) Table 2: Test recognition accuracy (and error) for NORB (normalized-uniform) stride = 1, receptive field = 6, with whitening, large number of features

Classification results 25 AlgorithmAccuracy Raw pixels K-means (Triangle 1600 features) 31.8% ( 0.62%) 51.5% ( 1.73%) Table 3: Test recognition accuracy on STL-10 The method proposed is strongest when we have large labeled training sets.

Conclusion 26  Best performance is based on k-means clustering.  Easy and fast.  No hypermeters to tune.  One layer network can get good result.  Using more features and dense extraction.

Reference 27 [1] Coates, Adam, Andrew Y. Ng, and Honglak Lee. "An analysis of single-layer networks in unsupervised feature learning." International Conference on Artificial Intelligence and Statistics [2] [3]A. Krizhevsky. Learning multiple layers of features form Tiny Images. Master’s thesis, Dept. of Comp. Sci., University of Toronto, 2009 [4] LeCun, Yann, Fu Jie Huang, and Leon Bottou. "Learning methods for generic object recognition with invariance to pose and lighting." Computer Vision and Pattern Recognition, CVPR Proceedings of the 2004 IEEE Computer Society Conference on. Vol. 2. IEEE, [5] [6] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, [7] Goh, Hanlin, Nicolas Thome, and Matthieu Cord. "Biasing restricted Boltzmann machines to manipulate latent selectivity and sparsity." NIPS workshop on deep learning and unsupervised feature learning

THANK YOU ! 28