Alex Krizhevsky University of Toronto ImageNet Classification with Deep Convolutional Neural Networks Ilya Sutskever University of Toronto Geoffrey E. Hinton University of Toronto Presenter : Aydin Ayanzadeh Email: Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
Outline ●Introduction ●Dataset ●Architecture of the Network ●Reducing over-fitting ●Result 2
ImageNet ●About 15M Labeled High resolution Images ●Roughly 22K Categories ●Collected from the web and labeled by Amazon Mechanical Turk 3 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
ILSVRC ImageNet Large Scale Visual Recognition Challenge Task: 1.2M, 50K Validation, 150K testset,1k categories Goal: Top-5 error 4 NEC-UIUC,Lin Top 5 error= 28% 2010 XRCE-Perronnin Top 5 error= 28% 2011 Supervision-Krizhevsky: Top 5-error: 16% 2012 ZF-net Top5 error: 12% L 2013 GoogLeNet-Szegedy Top 5= 7% 2014 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
5 Task in ImageNet
Rectified Linear Units (ReLUs) ●Very faster than rather than the classical activation functions such as Tanh. ●Very computationally efficient ●Converges fast(it converges six time faster than tanh) 6 Fig2.A four-layer convolutional neural network with ReLUs (solid line) reaches a 25% training error rate on CIFAR-10 six times faster than an equivalent network with tanh neurons (dashed line). The learning rates for each net- work were chosen independently to make train- ing as fast as possible. No regularization of any kind was employed. The magnitude of the effect demonstrated here varies with network architecture, but networks with ReLUs consistently learn several times faster than equivalents with saturating neurons.
AlexNet General Feature ●650K neuron ●60M Parameters ●630M connections ●7 hidden weight layers ●Rectified Linear Units(Relu) ●Dropout trick, ●Randomly extracted patches with the size of (224*224) 7 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
Architecture 8 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
Architecture 9 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 Input image size can not be 224*224 ((224−11+2(0))/4)+1=54.25 !!! ((227−11+2(0))/4)+1=55
10 Architecture Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 Full (simplified) AlexNet architecture: [227x227x3] INPUT [55x55x96]CONV1 : 96 11x11 filters at stride 4, pad 0 [27x27x96] MAX POOL1: 3x3 filters at stride 2 [27x27x96] NORM1: Normalization layer [27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2 [13x13x256] MAX POOL2: 3x3 filters at stride 2 [13x13x256] NORM2: Normalization layer [13x13x384] CONV3 : 384 3x3 filters at stride 1, pad 1 [13x13x384] CONV4 : 384 3x3 filters at stride 1, pad 1 [13x13x256] CONV5 : 256 3x3 filters at stride 1, pad 1 [6x6x256] MAX POOL3 : 3x3 filters at stride 2 [4096] FC6: 4096 neurons [4096] FC7:4096 neurons with F=1 [1000] FC8:1000 neurons (class scores)
Local Response Normalization ●reduces top-1 and top-5 error rates by 1.4% and 1.2% ●k = 2, n = 5, α = 10e-4, and β = 0.75. ●It applies before ReLU nonlinearity in certain layers 11
Data Augmentation ●Reduce Over-fitting ○Artificially enlarge dataset ●Type of Data augmentation ○Extract 5 patches with the size of 224*224 (four corner patch and center patch) and horizontal reflection ○Altering the intensity of RGB channel in training image(perform PCA on rgb pixels) ○This approach reduce top-1 error by 1% 12 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 =
Dropout 13 1-Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958. ●Reducing over-fitting ●zero the output of each hidden neuron with specific probability. ● Double the number of iteration to converge ●Learning more robust features ●Applied in the first two fully connected layers
Stochastic Gradient Descent ●SGD with a batch size of 128 ●Learning rate is setted 0.01 (equal for all layers but, it divided based on validation error), ●Neuron biases in 2,4,5 layers and Fc layers ●NVIDIA GTX 580 (3GB GPUs) ●Weight initialization based on N(0,0.1) 14 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
Results 15 ModelTop-1(Val)Top-5(Val)Top-5(test) SIFT+FVs18.2%26.2% 1 CNN40.7%18.2% 5 CNN38.1%16.4% 1 CNN*39.0%16.6% 7 CNNs*36.7%15.4%15.3% Table 2: Comparison of error rates on ILSVRC-2012 validation and test sets. In italics are best results achieved by others. Models with an asterisk were “pre-trained” to classify the entire ImageNet 2011 Fall release. See Section 6 for details. Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 ●Averaging the predictions of two CNNs that were pre-trained on the entire release with 5CNNs has 15.3%.
Conclusion AlexNet ●Rectified Linear Units(Relu) ●Dropout trick ●Data augmentation ●Trained the model using batch stochastic gradient descent ●Top5-error rate=15.4% 16
Qualitative Evaluations 17 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018
Visualizing First Layer 18 Computer vision-Dr.-Ing. Hazım Kemal EKENEL, Spring 2018 Fig5. 96 convolutional kernels of size 11×11×3 learned by the first convolutional layer on the 224×224×3 input images. The top 48 kernels were learned on GPU 1 while the bottom 48 kernels were learned on GPU 2. See Section 6.1 for details. ●Top 48 kernels on GPU 1 : color-agnostic ●bottom 48 kernels on GPU 2: color-specific.
