Download presentation
Published byAllen Webster Modified over 10 years ago
1
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, NIPS 2012 Eunsoo Oh (오은수) 1
2
ILSVRC ImageNet Large Scale Visual Recognition Challenge
An image classification challenge with 1,000 categories (1.2 million images) Processing… Deep Convolutional Neural Network (ILSVRC-2012 Winner) reference :
3
Why Deep Learning? “Shallow” vs. “deep” architectures
Learn a feature hierarchy all the way from pixels to classifier reference :
4
Background A neuron x1 w1 x2 w2 x3 w3 f … wd xd Input Weights
(raw pixel) Weights x1 w1 x2 w2 Output: f(w·x+b) f x3 w3 … wd xd reference :
5
Background Multi-Layer Neural Networks Nonlinear classifier
Learning can be done by gradient descent Back-Propagation algorithm Input Layer Hidden Layer Output Layer
6
Background Convolutional Neural Networks Kernel (Convolution Matrix)
Variation of multi-layer neural networks Kernel (Convolution Matrix) reference :
7
Background Input Feature Map Convolutional Filter ...
reference :
8
Proposed Method Deep Convolutional Neural Network
5 convolutional and 3 fully connected layers 650,000 neurons, 60 million parameters Some techniques for boosting up performance ReLU nonlinearity Training on Multiple GPUs Overlapping max pooling Data Augmentation Dropout
9
Rectified Linear Units (ReLU)
reference :
10
Training on Multiple GPUs
Spread across two GPUs GTX 580 GPU with 3GB memory Particularly well-suited to cross-GPU parallelization Very efficient implementation of CNN on GPUs
11
Pooling Spatial Pooling Non-overlapping / overlapping regions
Sum or max Max Sum reference :
12
Data Augmentation Training Images Enlarge the dataset! Training Image
Horizontal Flip Training Images 224x224 224x224 224x224 224x224 256x256 Enlarge the dataset! 224x224 224x224 Training Image
13
Dropout Independently set each hidden unit activity to zero with 0.5 probability Used in the two globally-connected hidden layers at the net's output reference :
14
Overall Architecture Trained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days) 650,000 neurons, 60 million parameters, 630 million connections The last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.
15
Results ILSVRC-2010 test set ILSVRC-2010 winner Previous best
published result Proposed Method
16
Results ILSVRC-2012 results Runner-up Top-5 error rate : 26.172%
reference : Results ILSVRC-2012 results Runner-up Top-5 error rate : % Proposed method Top-5 error rate : %
17
Qualitative Evaluations
18
Qualitative Evaluations
19
ILSVRC-2013 Classification
reference :
20
ILSVRC-2014 Classification
22 Layers 19 Layers
21
Conclusion Large, deep convolutional neural networks for large scale image classification was proposed 5 convolutional layers, 3 fully-connected layers 650,000 neurons, 60 million parameters Several techniques for boosting up performance Several techniques for reducing overfitting The proposed method won the ILSVRC-2012 Achieved a winning top-5 error rate of 15.3%, compared to 26.2% achieved by the second-best entry
22
Q & A ? ? ?
23
Quiz 1. The proposed method used hand-designed features, thus there is no need to learn features and feature hierarchies. (True / False) 2. Which technique was not used in this paper? ① Dropout ② Rectified Linear Units nonlinearity ③ Training on multiple GPUs ④ Local contrast normalization
24
Appendix Feature Visualization
96 learned low-level(1st layer) filters
25
Appendix Visualizing CNN
reference : M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. arXiv preprint arXiv: , 2013.
26
Appendix Local Response Normalization
: the activity of a neuron computed by applyuing kernel i at position (x, y) The response-normalized activity is given by N : the total # of kernels in the layer n : hyper-parameter, n=5 k : hyper-parameter, k=2 α : hyper-parameter, α=10^(-4) This aids generalization even though ReLU don’t require it. This reduces top-5 error rate by 1.2%
27
Appendix Another Data Augmentation
Alter the intensities of the RGB channels in training images Perform PCA on the set of RGB pixel values To each training image, add multiples of the found principal components To each RGB image pixel add the following quantity , : i-th eigenvector and eigenvalue : random variable drawn from a Gaussian with mean 0 and standard deviation 0.1 This reduces top-1 error rate by over 1%
28
Appendix Details of Learning
Use stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weigh decay of The update rule for weight w was i : the iteration index : the learning rate, initialized at 0.01 and reduced three times prior to termination : the average over the i-th batch Di of the derivative of the objective with respect to w Train for 90 cycles through the training set of 1.2 million images
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.