Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining, Neural Network and Genetic Programming

Similar presentations


Presentation on theme: "Data Mining, Neural Network and Genetic Programming"— Presentation transcript:

1 Data Mining, Neural Network and Genetic Programming
Deep Learning and Transfer Learning for Object Recognition Yi Mei

2 Outline Object Recognition Deep Learning
Overview Recipes of deep learning Automated ANN Architecture Design Transfer Learning DLID DeCAF DAN

3 Object Recognition Object recognition usually refers to object classification. Sometimes refers to the whole procedure that finds objects in large pictures. Object detection also has other meanings or interpretations. Object Recognition Object Classification Object Localization One-class object detection Multi-class object detection

4 Methods for Object Detection
Object Recognition Neural Methods Genetic Methods Genetic Algorithms Genetic Programming Classification only Classification and localisation Classification only Classification and localisation Feed Forward Networks Share Weight Networks SOMs High Order Networks Deep Learning ……

5 Deep Learning Machine learning algorithms based on learning multiple levels of representation / abstraction. Fig: I. Goodfellow

6 Deep Learning Has been successful in many areas: object recognition, object detection, speech recognition, natural language processing, …

7 Deep Learning LeNet: 7 layers [LeCun et al. 1998] Subsampling
(Max-pooling)

8 Deep Learning Subsampling will not change the object Over-subsampling

9 Deep Learning AlexNet: 8 layers [Krizhevsky et al. 2012]
Similar to LeNet but Bigger model (7 hidden layers, 650k units, 60M params) Error: % for ImageNet 2012 challenge (No. 1) image-net.org

10 Deep Learning VGGNet: 19 layers [Simonyan and Zisserman 2014]
Error: 7.32% for ImageNet 2014 challenge

11 Deep Learning GoogLeNet: 22 layers [Szegedy et al. 2014]
Error: 6.67% for ImageNet 2014 challenge (No. 1)

12 Deep Learning ResNet: 152 layers [He et al. 2015]
Error: 3.57% for ImageNet 2015 challenge (No. 1)

13 Deep Learning

14 Automated CNN Architecture Design
Manually designing CNN architecture requires a lot of domain knowledge and trial-and-errors Use genetic programming to automatically evolve an architecture

15 Automated CNN Architecture Design
Cartisian GP

16 Automated CNN Architecture Design
Functions (Operators) CovBlock (shift 1 with padding, keep input size) ResBlock (shift 1 with padding, keep input size) Max (Average) pooling (2x2 filter, shift 2) Summation (element-wise addition) Constrain the architecture Sum two feature maps with the same size

17 Automated CNN Architecture Design
Results ConvSet much better than VGG ResSet much better than ResNet

18 Automated CNN Architecture Design
Using ResSet can evolve much simpler architecture

19 Why Deep Learning is Hard?
Vanishing Gradient

20 Recipes of Deep Learning
Mini-batch (online learning) Proper loss function: Cross entropy New activation function: ReLU Adaptive learning rate Regularisation (Weight decay) Dropout Data augmentation

21 Mini-batch Offline learning updates weights once after using all the training examples in one epoch Online learning updates weight after using each training examples Mini-batch splits the training examples into a number of batches, and updates weights after using each batch E.g. 100 examples in a mini-batch Online learning Mini-batch Offline learning

22 Mini-batch Objective function (error) changes from one batch to another We are not minimising the real error Can get better performance (cross-validation?)

23 Proper Loss Function 2-layer network (1 hidden layer) Cross entropy
Square error

24 Rectified Linear Unit (ReLU)
Fast to compute Fix gradient (1) when positive No learning when negative

25 Rectified Linear Unit (ReLU)

26 Rectified Linear Unit (ReLU)

27 Adaptive Learning Rate
NN performance heavily relies on the learning rate (step size) Even with the correct direction (gradient), it is unknown how far we should go along that direction

28 Adaptive Learning Rate
Decrease learning rate over time At the beginning, we are far from the destination, so we use larger learning rate After several epochs, we are close to the destination, so we reduce the learning rate Adagrad Smaller derivatives, larger learning rate, and vice versa

29 Regularisation L2 (Weight decay) Prevent weights from going too large
Make more weights near zero, and thus can be ignored 0.99~

30 Dropout Each time before updating the weights, each neuron has p% to dropout The network structure is changed For each mini-batch, we resample the dropout neurons

31 Dropout Train a set (ensemble) of different networks
Test by averaging the output of all trained networks Cannot get y1, y2, y3, y4, …

32 Dropout Approximation: Test with full network
If a neuron has p% to dropout, then at test time its weights are multiplied by (1-p)% Approximation

33 Data Augmentation Create synthetic training images by transformations
Rotation, scaling, flipping, cropping, noise, …

34 Transfer Learning/Domain Adaptation
Use knowledge learned from past (source domain) to help solve the problem at hand (target domain)

35 Transfer Learning/Domain Adaptation
Standard supervised learning assumes that the training and test examples (x, y) are drawn i.i.d from a distribution In domain adaptation, the source and target domains have different but related distributions and One can extract/use knowledge from more than one source domains at the same time Unsupervised: labelled and unlabelled source examples + unlabelled target examples Semi-supervised: also consider a small set of labelled target examples Supervised: all examples are labelled

36 Transfer Learning/Domain Adaptation
DLID (Deep Learning by Interpolation between Domain) DeCAF DAN (Deep Adaptation Networks)

37 Dataset Amazon Dslr Webcam

38 DLID [Chopra et al. 2013] Discrete interpolation from source domain to target domain Different proportions of training examples from source and target domains for each interpolation point Feature extraction using unsupervised trainer New feature repn by combining all extracted features

39 DeCAF [Donahue et al. ICML 2014]
1. Take the AlexNet trained on ImageNet 2. Do feed-forward operation using AlexNet on new images 3. Get the 6th or 7th layer activations as “features” (DeCAF6 and DeCAF7) 4. Apply any classifier (e.g. SVM and logistic regression)

40 DAN [Long et al. ICML 2015 ] Fine tune the AlexNet trained on ImageNet
Freeze layers 1-3 Fine-tune layers 4-5 Similar distribution in the source and target domain under the hidden representations in layers 6-8 Regulariser, computed by MK-MMD

41 Summary Deep learning is hard (Standard BP cannot work well)
Vanishing gradient problem Many good ideas in deep learning (e.g. mini-batch, cross-entropy, ReLU, adaptive learning rate, regularisation, dropout, data augmentation) Transfer learning/Domain adaptation is a trending direction (similar to human learning) Strategies in transfer learning Learn shared hidden representations (e.g. DLID) Share features (e.g. DeCAF, DAN)


Download ppt "Data Mining, Neural Network and Genetic Programming"

Similar presentations


Ads by Google