Tuning CNN: Tips & Tricks

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Three kinds of learning
Classification and Prediction: Regression Analysis
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
CLASSIFICATION: Ensemble Methods
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Lecture 3b: CNN: Advanced Layers
Lecture 4b Data augmentation for CNN training
UCSpv: Principled Voting in UCS Rule Populations Gavin Brown, Tim Kovacs, James Marshall.
Introduction to Machine Learning, its potential usage in network area,
Cancer Metastases Classification in Histological Whole Slide Images
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Classify A to Z Problem Statement Technical Approach Results Dataset
Machine Learning Supervised Learning Classification and Regression
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Neural networks and support vector machines
Welcome deep loria !.
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Learning to Compare Image Patches via Convolutional Neural Networks
Automatic Lung Cancer Diagnosis from CT Scans (Week 1)
Oral Presentation Applied Machine Learning Course YOUR NAME
Energy models and Deep Belief Networks
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
Observations by Dance Move
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Ajita Rattani and Reza Derakhshani,
COMP61011 : Machine Learning Ensemble Models
ECE 6504 Deep Learning for Perception
Deep Learning Convoluted Neural Networks Part 2 11/13/
Bird-species Recognition Using Convolutional Neural Network
Data Mining Practical Machine Learning Tools and Techniques
Hyperparameters, bias-variance tradeoff, validation
Recurrent Neural Networks
INF 5860 Machine learning for image classification
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Pose Estimation for non-cooperative Spacecraft Rendevous using CNN
Evaluating Models Part 2
Declarative Transfer Learning from Deep CNNs at Scale
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
SVM-based Deep Stacking Networks
TGS Salt Identification Challenge
Overfitting and Underfitting
RCNN, Fast-RCNN, Faster-RCNN
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Model generalization Brief summary of methods
Life Long Learning Hung-yi Lee 李宏毅
Neural networks (3) Regularization Autoencoder
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Shih-Yang Su Virginia Tech
Department of Computer Science Ben-Gurion University of the Negev
Introduction to Neural Networks
Master of Science in Data Science
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Object Detection Implementations
Adrian E. Gonzalez , David Parra Department of Computer Science
Outlines Introduction & Objectives Methodology & Workflow
Machine Learning for Cyber
Do Better ImageNet Models Transfer Better?
Evaluation David Kauchak CS 158 – Fall 2019.
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Deep CNN for breast cancer histology Image Analysis
Presentation transcript:

Tuning CNN: Tips & Tricks Dmytro Panchenko Machine learning engineer, Altexsoft

Workshop setup Clone code from https://github.com/hokmund/cnn-tips-and-tricks Download data and checkpoints from http://tiny.cc/4flryy Extract them from the archive and place under src/ in the source code folder Run pip install –r requirements.txt

Agenda Workshop setup Transfer learning Learning curves interpretation Learning rate management & cyclic learning rate Augmentations Dealing with imbalanced classification TTA Pseudolabeling

Exploratory data analysis data-analysis.ipynb

Exploratory data analysis Real-world images of various goods. Different occlusions, illumination, etc. Most of items are centered on the picture. There are extremely close classes.

Exploratory data analysis

Dataset split Validation set is used for hyperparameter tuning. Test set is used for the final evaluation of the tuned model. Train set – 37184 samples (imbalanced). Validation set – 12800 samples (balanced). Test set – 25600 samples (balanced).

Transfer learning Transfer learning – usage of a pre-trained on a very large dataset CNN instead of training from scratch.

Datasets are different Transfer learning Your have little data You have a lot of data Datasets are similar Train a classifier (usually, logistic regression or MLP) on bottleneck features Fine-tune several or all layers Datasets are different Train a classifier on deep features of the CNN Fine-tune all layers (use pre-trained weights as an initialization for your CNN)

Fine-tuning pre-trained CNN fine-tuning.ipynb

Learning curve Underfitting (accuracy still improves, so you probably need higher learning rate and more training epochs)

Learning curve Underfitting (accuracy doesn’t improve so you need a deeper network)

Learning curve Overfitting (train accuracy increases while validation get worse, so you need to add regularization or increase dataset if possible)

Learning curve Overfitting with oscillations (network became unstable after several epochs; you need to decrease learning rate during training)

Learning curve Almost perfect learning curve

Tuning more layers fine-tuning.ipynb

Learning rate strategies Time-based decay: 𝑙𝑟=𝑙𝑟 ∗ 1 1+𝑑𝑒𝑐𝑎𝑦∗𝑒𝑝𝑜𝑐ℎ This decay is used by default in Keras optimizers.

Learning rate strategies Step decay: 𝑙𝑟= 𝑙 𝑟 𝑠𝑡𝑎𝑟𝑡 1 1 −𝑑𝑒𝑐𝑎𝑦∗𝑑𝑟𝑜𝑝 𝑑𝑟𝑜𝑝= 𝑒𝑝𝑜𝑐ℎ 𝑠𝑡𝑒𝑝

Reducing learning rate on plateau Reducing learning rate whenever validation metric stops improving (can be combined with previously discussed strategies). Keras implementation – ReduceLROnPlateau callback.

Original paper - https://arxiv.org/abs/1506.01186 Cyclic learning rate Learning rate increases and decreases in a cycle. Upper bound of the cycle can be static or can decrease with time. Upper bound is selected by LR finder algorithm. Lower bound is chosen to be 1-2 orders of magnitude less than upper bound. Original paper - https://arxiv.org/abs/1506.01186

Learning rate finder Select reasonably small lower bound (e.g. 1e-6) Usually, 1e0 is a good choice for an upper bound Increase learning rate exponentially Plot smoothed loss vs LR Select a point slightly lower than the global minimum

Source - https://arxiv.org/pdf/1704.00109.pdf Snapshot ensemble Source - https://arxiv.org/pdf/1704.00109.pdf

Learning rate finder and CLR fine-tuning.ipynb

Augmentation Augmentation increases dataset size by applying natural transformations to images. Useful strategy: Start with soft augmentation. Make it harsher with time. If the dataset is big enough, finish training with several epochs with soft augmentation / without any. Implementation: https://github.com/albu/albumentations

Tuning whole network fine-tuning.ipynb

Dealing with imbalanced train set Common ways to deal with it imbalanced classification are upsampling and downsampling. In case of deep learning there is also weighted loss. Weighted loss example: Class A has 1000 samples. Class B has 2000 samples. Class C has 400 samples. Overall loss: 𝑙𝑜𝑠𝑠= 𝑐𝑙𝑎𝑠𝑠=0 𝑛 𝑙𝑜𝑠 𝑠 𝑐𝑙𝑎𝑠𝑠 𝑐𝑙𝑎𝑠𝑠=0 𝑛 𝑤𝑒𝑖𝑔ℎ 𝑡 𝑐𝑙𝑎𝑠𝑠 = 2∗𝑙𝑜𝑠 𝑠 𝐴 +𝑙𝑜𝑠 𝑠 𝐵 +5∗𝑙𝑜𝑠 𝑠 𝐶 8

Weighted loss fine-tuning.ipynb

Test-time augmentation One way to apply TTA is to use augmentations similar to training but softer. Simpler strategies: Only flips Flips + crops Caution: TTA increases inference time!

Predictions with TTA fine-tuning.ipynb

Semi-supervised approach Deep layers of a CNN learn very generic features. You can refine such feature extractors by training on unlabeled data. Most popular approach for such training is called pseudolabeling.

Pseudolabeling Train classifier on the initial training set. Predict validation / test set with your classifier. Optional: remove images with low- confidence labels. Add pseudolabeled data to your training set. Use it to train CNN from scratch (some kind of a warmup) or to refine your previous classifier. Source - https://www.analyticsvidhya.com/blog/2017/09/pseudo-labelling-semi-supervised-learning-technique/

Pseudolabeling constraints Test dataset has reasonable size (at least comparable to the training set). Network which is trained on pseudolabels is deep enough (especially when pseudolabels are generated by an ensemble of models). Training data and pseudolabeled data are mixed in 1:2 – 1:4 proportions respectively.

Using pseudolabeling In competitions: Label test set with your ensemble; Train new model; Add it to the final ensemble. In production: Collect as much data as possible (both labeled and unlabeled); Train model on labeled data; Apply pseudolabeling.

Pseudolabeling pseudolabeling.ipynb

Summary Train network’s head Add head to the convolutional part Add augmentations and learning rate scheduling / CLR Select appropriate loss Predict with test-time augmentations If you don’t have enough training data, apply pseudolabeling Good luck!

Other tricks (out of scope) How to select network architecture (size, regularization, pooling type, classifier structure) How to select an optimizer (Adam, RMSprop, etc.) Training on the bigger resolution Hard samples mining Ensembling

Thank you for your attention Questions are welcomed