Presentation is loading. Please wait.

Presentation is loading. Please wait.

Training Techniques for Deep Neural Networks

Similar presentations


Presentation on theme: "Training Techniques for Deep Neural Networks"— Presentation transcript:

1 Training Techniques for Deep Neural Networks
9/22/2018 Training Techniques for Deep Neural Networks Deep Learning Seminar School of Electrical Engineering – Tel Aviv University Yuval YaacobY & Tal Shapira

2 Presentation Based on the Paper
9/22/2018 Training Techniques for Deep Neural Networks

3 Outline Background Paper Scenarios Training and target datasets
9/22/2018 Background Fisher vector Structure of Convolutional Neural Networks Paper Scenarios Training and target datasets Paper results Training Techniques for Deep Neural Networks

4 Generic Visual Categorization Using Shallow Methods
9/22/2018 Patch detection: interest points, segments, regular patches… Feature extraction: SIFT, color statistics, moments… Visual dictionary: Kmeans, GMM, Random Forest… Image representations: BOV, FV… Classification: SVM, Softmax… Training Techniques for Deep Neural Networks

5 SIFT Scale Invariant Feature Transform
Distinctive Image Features from Scale-Invariant Keypoints, 2004, D.Lowe, University of British Columbia Scale Invariant Feature Transform Extract keypoints and compute its descriptors 9/22/2018 Training Techniques for Deep Neural Networks

6 Fisher Vector(1) Feature vector is derivative w.r.t probabilistic model Measure Similarity using the Fisher Kernel Fisher Information Matrix Learning a classifier on Fisher Kernel equals learning a linear classifier on with 9/22/2018 Training Techniques for Deep Neural Networks

7 Fisher Vector(2) choose uλ where to be a Gaussian mixture model (GMM)
Let X be a D-dimensional local features from an image Fisher vector is the concatenation of for i = K, and is therefore 2KD-dimensional 9/22/2018 Training Techniques for Deep Neural Networks

8 Improved Fisher Vector
Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge S´anchez, and Thomas Mensink, 2010 L2 Normalization Remove FV dependence on the amount of image specific information / background information Power Normalization “Unsparsify” the representation of the FV Spatial Pyramids 9/22/2018 Training Techniques for Deep Neural Networks

9 IFV 1: L2 Normalization 9/22/2018 By construction the Fisher Vector discards descriptors which are likely to occur in any image The FV focus on image specific features However, the FV depends on the amount of image specific information / background information L2 Normalization to remove this dependence Training Techniques for Deep Neural Networks

10 IFV 2: Power Normalization
9/22/2018 As the number of Gaussians increase, the FV becomes sparser fewer descriptors are assigned with a significant probability to each Gaussian Power normalization to “unsparsify”: Training Techniques for Deep Neural Networks

11 IFV 3: Spatial Pyramids Multi-level recursive image decomposition
Take rough geometry into account Repeatedly subdividing an image and computing histograms of local features at increasingly fine resolutions by pooling descriptor-level statistics 9/22/2018 Training Techniques for Deep Neural Networks

12 Structure of CNN General CNN 9/22/2018
Training Techniques for Deep Neural Networks

13 Structure of Alexnet ImageNet Classification with Deep Convolutional Neural Networks Krizhevsky et al. 1.2M images in 1K categories 5 Convolutional Layers and 3 Fully Connected Layers 9/22/2018 Training Techniques for Deep Neural Networks

14 Convolutional Layer (1)
Accepts a volume of size W1×H1×D1 Requires four hyper-parameters: Number of filters K their spatial extent F (receptive field size) the stride S the amount of zero padding P Produces a volume of size W2×H2×D2 where: W2=(W1−F+2P)/S+1 H2=(H1−F+2P)/S+1 D2=K 9/22/2018 Training Techniques for Deep Neural Networks

15 Convolutional Layer (2)
Receptive Field Size – 11 Stride – 4 Zero Padding – 0 (3) Conv Layer output: 55x55x96 55*55*96 = 290,400 neurons each has 11*11*3 = 363 weights Parameter sharing 9/22/2018 Training Techniques for Deep Neural Networks

16 RELU Nonlinearity Non-saturating nonlinearity (RELU) Quick to learn
9/22/2018 Non-saturating nonlinearity (RELU) Quick to learn Training Techniques for Deep Neural Networks

17 Pooling Layers Max-Pooling Overlapping Pooling
9/22/2018 Max-Pooling Overlapping Pooling We generally observe during training that models with overlapping pooling find it slightly more difficult to overfit Training Techniques for Deep Neural Networks

18 Local Response Normalization (LRN)
9/22/2018  “lateral inhibition” Normalize across channels “brightness normalization” Reduces Alexnet top-1 and top-5 error rates by 1.4% and 1.2% respectively Training Techniques for Deep Neural Networks

19 Fully-Connected Layer
9/22/2018 Penultimate Layer Dropout layer Softmax Layer Training Techniques for Deep Neural Networks

20 One-Vs-The-Rest SVM Classifier
9/22/2018 Training Techniques for Deep Neural Networks

21 Paper Scenarios Scenario 1: Shallow representation (IFV)
9/22/2018 Scenario 1: Shallow representation (IFV) Scenario 2: Deep representation (CNN) with pre training Scenario 3: Deep representation (CNN) with pre training and fine tuning Training Techniques for Deep Neural Networks

22 Training Set ILSVRC-2012 Data augmentation RGB color jittering
9/22/2018 ILSVRC-2012 1000 object categories from ImageNet 1.2M training images Using gradient descent with momentum Data augmentation RGB color jittering Training Techniques for Deep Neural Networks

23 Data augmentation Generates additional examples of the class
9/22/2018 Generates additional examples of the class 3 strategies: No augmentation (cropping if needed) Flip augmentation – mirroring images C+F augmentation – cropping and flipping Training Techniques for Deep Neural Networks

24 Target Dataset ILSVRC-2012 – top-5 classification error
9/22/2018 ILSVRC-2012 – top-5 classification error PASCAL VOC (2007 and 2012) – mAP Caltech-101 and Caltech-256 – mean class accuracy Training Techniques for Deep Neural Networks

25 Scenario 1: IFV Modifications for IFV:
9/22/2018 Modifications for IFV: Intra-normalization of descriptor block Spatially-extended local descriptors Use of color features with SIFT descriptors Training Techniques for Deep Neural Networks

26 Scenario 2: CNN with pre training
9/22/2018 3 CNN architectures with different accuracy/speed trade-off Fast (CNN-F) medium( (CNN-M) Also with lower dimensional image representation (full7) Slow (CNN-S) Colored vs. grayscale images Training Techniques for Deep Neural Networks

27 Paper CNN Structures 9/22/2018
Training Techniques for Deep Neural Networks

28 Scenario 3: CNN with pre training and fine tuning
9/22/2018 Fine tuning on the target dataset – the last layer has output dimensionality equal to number of classes (CNN-S) VOC-2007 and VOC-2012 – multi-label dataset One-vs-rest classification loss function Ranking hinge loss Caltech-101 – single label dataset softmax regression Training Techniques for Deep Neural Networks

29 Results(1) 9/22/2018 Training Techniques for Deep Neural Networks

30 Results(2) 9/22/2018 Data augmentation improves performance by ~3% for both IFV and CNN Color descriptors yields worse performance Combination of SIFT and color descriptors improves performance by ~1% for IFV For CNN, grayscale input drops performance by ~3% CNN-based methods outperforms the shallow encodings – improvement of ~10% Also smaller dimensional output features Training Techniques for Deep Neural Networks

31 Results(3) Intra-normalization improves performance by ~1% for IFV
9/22/2018 Intra-normalization improves performance by ~1% for IFV Both CNN-M and CNN-S outperform the CNN-F by 2-3% margin CNN-M is simpler and marginally faster We can reduce output dimensionality from to 128 with only a drop of ~2% The fine tuning on VOC-2007 using ranking hinge loss improves 2.7% Training Techniques for Deep Neural Networks

32 Results(4) 9/22/2018 Training Techniques for Deep Neural Networks

33 Conclusions 9/22/2018 Rigorous empirical evaluation of CNN-based methods for image classification and comparison with shallow feature encoding methods Performance of shallow representation can improved by adopting data augmentation Deep architectures outperformance the shallow methods Fine tuning can improve results Training Techniques for Deep Neural Networks

34 Thank you for your attention. Questions?
9/22/2018 Training Techniques for Deep Neural Networks


Download ppt "Training Techniques for Deep Neural Networks"

Similar presentations


Ads by Google