Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming
Deep Learning and Transfer Learning for Object Recognition Yi Mei

Outline Object Recognition Deep Learning
Overview Recipes of deep learning Automated ANN Architecture Design Transfer Learning DLID DeCAF DAN

Object Recognition Object recognition usually refers to object classification. Sometimes refers to the whole procedure that finds objects in large pictures. Object detection also has other meanings or interpretations. Object Recognition Object Classification Object Localization One-class object detection Multi-class object detection

Methods for Object Detection
Object Recognition Neural Methods Genetic Methods Genetic Algorithms Genetic Programming Classification only Classification and localisation Classification only Classification and localisation Feed Forward Networks Share Weight Networks SOMs High Order Networks Deep Learning ……

Deep Learning Machine learning algorithms based on learning multiple levels of representation / abstraction. Fig: I. Goodfellow

Deep Learning Has been successful in many areas: object recognition, object detection, speech recognition, natural language processing, …

Deep Learning LeNet: 7 layers [LeCun et al. 1998] Subsampling
(Max-pooling)

Deep Learning Subsampling will not change the object Over-subsampling

Deep Learning AlexNet: 8 layers [Krizhevsky et al. 2012]
Similar to LeNet but Bigger model (7 hidden layers, 650k units, 60M params) Error: % for ImageNet 2012 challenge (No. 1) image-net.org

Deep Learning VGGNet: 19 layers [Simonyan and Zisserman 2014]
Error: 7.32% for ImageNet 2014 challenge

Deep Learning GoogLeNet: 22 layers [Szegedy et al. 2014]
Error: 6.67% for ImageNet 2014 challenge (No. 1)

Deep Learning ResNet: 152 layers [He et al. 2015]
Error: 3.57% for ImageNet 2015 challenge (No. 1)

Deep Learning

Automated CNN Architecture Design
Manually designing CNN architecture requires a lot of domain knowledge and trial-and-errors Use genetic programming to automatically evolve an architecture

Cartisian GP

Functions (Operators) CovBlock (shift 1 with padding, keep input size) ResBlock (shift 1 with padding, keep input size) Max (Average) pooling (2x2 filter, shift 2) Summation (element-wise addition) Constrain the architecture Sum two feature maps with the same size

Results ConvSet much better than VGG ResSet much better than ResNet

Using ResSet can evolve much simpler architecture

Why Deep Learning is Hard?
Vanishing Gradient

Recipes of Deep Learning
Mini-batch (online learning) Proper loss function: Cross entropy New activation function: ReLU Adaptive learning rate Regularisation (Weight decay) Dropout Data augmentation

Mini-batch Offline learning updates weights once after using all the training examples in one epoch Online learning updates weight after using each training examples Mini-batch splits the training examples into a number of batches, and updates weights after using each batch E.g. 100 examples in a mini-batch Online learning Mini-batch Offline learning

Mini-batch Objective function (error) changes from one batch to another We are not minimising the real error Can get better performance (cross-validation?)

Proper Loss Function 2-layer network (1 hidden layer) Cross entropy
Square error

Rectified Linear Unit (ReLU)
Fast to compute Fix gradient (1) when positive No learning when negative

Rectified Linear Unit (ReLU)

Adaptive Learning Rate
NN performance heavily relies on the learning rate (step size) Even with the correct direction (gradient), it is unknown how far we should go along that direction

Adaptive Learning Rate
Decrease learning rate over time At the beginning, we are far from the destination, so we use larger learning rate After several epochs, we are close to the destination, so we reduce the learning rate Adagrad Smaller derivatives, larger learning rate, and vice versa

Regularisation L2 (Weight decay) Prevent weights from going too large
Make more weights near zero, and thus can be ignored 0.99~

Dropout Each time before updating the weights, each neuron has p% to dropout The network structure is changed For each mini-batch, we resample the dropout neurons

Dropout Train a set (ensemble) of different networks
Test by averaging the output of all trained networks Cannot get y1, y2, y3, y4, …

Dropout Approximation: Test with full network
If a neuron has p% to dropout, then at test time its weights are multiplied by (1-p)% Approximation

Data Augmentation Create synthetic training images by transformations
Rotation, scaling, flipping, cropping, noise, …

Transfer Learning/Domain Adaptation
Use knowledge learned from past (source domain) to help solve the problem at hand (target domain)

Standard supervised learning assumes that the training and test examples (x, y) are drawn i.i.d from a distribution In domain adaptation, the source and target domains have different but related distributions and One can extract/use knowledge from more than one source domains at the same time Unsupervised: labelled and unlabelled source examples + unlabelled target examples Semi-supervised: also consider a small set of labelled target examples Supervised: all examples are labelled

DLID (Deep Learning by Interpolation between Domain) DeCAF DAN (Deep Adaptation Networks)

Dataset Amazon Dslr Webcam

DLID [Chopra et al. 2013] Discrete interpolation from source domain to target domain Different proportions of training examples from source and target domains for each interpolation point Feature extraction using unsupervised trainer New feature repn by combining all extracted features

DeCAF [Donahue et al. ICML 2014]
1. Take the AlexNet trained on ImageNet 2. Do feed-forward operation using AlexNet on new images 3. Get the 6th or 7th layer activations as “features” (DeCAF6 and DeCAF7) 4. Apply any classifier (e.g. SVM and logistic regression)

DAN [Long et al. ICML 2015 ] Fine tune the AlexNet trained on ImageNet
Freeze layers 1-3 Fine-tune layers 4-5 Similar distribution in the source and target domain under the hidden representations in layers 6-8 Regulariser, computed by MK-MMD

Summary Deep learning is hard (Standard BP cannot work well)
Vanishing gradient problem Many good ideas in deep learning (e.g. mini-batch, cross-entropy, ReLU, adaptive learning rate, regularisation, dropout, data augmentation) Transfer learning/Domain adaptation is a trending direction (similar to human learning) Strategies in transfer learning Learn shared hidden representations (e.g. DLID) Share features (e.g. DeCAF, DAN)

Data Mining, Neural Network and Genetic Programming

Similar presentations

Presentation on theme: "Data Mining, Neural Network and Genetic Programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Mining, Neural Network and Genetic Programming

Similar presentations

Presentation on theme: "Data Mining, Neural Network and Genetic Programming"— Presentation transcript:

Similar presentations

About project

Feedback