Incremental Training of Deep Convolutional Neural Networks

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

NEURAL NETWORKS Backpropagation Algorithm

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

Deep Learning and Neural Nets Spring 2015

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.

The back-propagation training algorithm

Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.

CS 4700: Foundations of Artificial Intelligence

Neural networks.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Artificial Neural Networks

Cascade Correlation Architecture and Learning Algorithm for Neural Networks.

Artificial Neural Networks

Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.

Artificial Neural Network Supervised Learning دكترمحسن كاهاني

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.

Non-Bayes classifiers. Linear discriminants, neural networks.

EE459 Neural Networks Backpropagation

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.

Hand-written character recognition

BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.

Deep Residual Learning for Image Recognition

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.

Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning Supervised Learning Classification and Regression

Neural networks.

Deep Residual Learning for Image Recognition

Convolutional Sequence to Sequence Learning

Fall 2004 Backpropagation CS478 - Machine Learning.

CS 388: Natural Language Processing: Neural Networks

Deep Feedforward Networks

Artificial Neural Networks

CSC321 Lecture 18: Hopfield nets and simulated annealing

Randomness in Neural Networks

Learning with Perceptrons and Neural Networks

Extreme Learning Machine

Computer Science and Engineering, Seoul National University

Real Neurons Cell structures Cell body Dendrites Axon

AlphaGo with Deep RL Alpha GO.

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Neural Networks CS 446 Machine Learning.

Intelligent Information System Lab

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

CSC 578 Neural Networks and Deep Learning

Machine Learning Today: Reading: Maria Florina Balcan

CSC 578 Neural Networks and Deep Learning

Artificial Intelligence 13. Multi-Layer ANNs

Tips for Training Deep Network

Artificial Neural Networks

Neural Networks Geoff Hulten.

Outline Background Motivation Proposed Model Experimental Results

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Neural networks (1) Traditional multi-layer perceptrons

ImageNet Classification with Deep Convolutional Neural Networks

COSC 4335: Part2: Other Classification Techniques

Inception-v4, Inception-ResNet and the Impact of

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Learning and Memorization

CSC 578 Neural Networks and Deep Learning

A Neural Network for Car-Passenger matching in Ride Hailing Services.

Presentation transcript:

Incremental Training of Deep Convolutional Neural Networks R. Istrate, A. C. I. Malossi, C. Bekas, and D. Nikolopoulos ArXiv:1803.10232v1 2018. 03. 27.

Depth trade-off Depth of Deep Neural network shows its capacity – How to choose the depth of our Network? Deep Network has high capacity, but they need too much resources. Shallow Network has a limited capacity, but they can converge fastly. Existing solution : Grid search Disadvantage : Only too late in the process we learn whether the network is not well suited for the dataset.

Methodology Consider a generic CNN 𝒩 composed of 𝑛 layers. Partition 𝒩 into 𝐾 sub-networks 𝑆 𝑘 , 𝑘=1,…𝐾 and 𝐾≤𝑛. Each sub-network contains learnable parameters There are no sub-network composed by just pooling and dropout.

Methodology The training process starts with sub-network 𝑆 1 . To determine when is the optimal time to add the second sub-network 𝑆 2 between 𝑆 1 and classifier, we compute every window size(ws) epochs the improvement in the validation accuracy. When the improvement observed is below a threshold (fixed), stop the training and increase the network depth by adding the next sub-network.

Methodology

Criteria for end training Every ws epochs, compute the angle 𝛼 between the linear approximation of the last ws accuracy points and the x-axis. The training is stopped when 𝛼 𝑖 ≤ 𝛾 𝛼 𝑖−1 , where 𝛾 is a predefined threshold and 𝛼 𝑖 is the angle characterizing the accuracy for the i-th window.

Look-ahead initialization When a new sub-network 𝑆 𝑘+1 is inserted in the current architecture, its weights need to be initialized Random initialization shows empirically bad performance Look-ahead initialization : fix former sub-networks 𝑆 1 , …, 𝑆 𝑘 and learn 𝑆 𝑘+1 only. for a few epochs. The depth of the look-ahead tends to be comparably smaller than the depth of the final network, therefore the training of the look-ahead is not considered expensive.

Experiments Datasets : CIFAR10 Basic Networks : VGGNet and ResNet

Experiments Look-ahead initialization reduce the decrease of validation accuracy when the new sub-net is added If we use same resource, incremental learning shows better performance than baseline model.

Experiments

Experiments

Conclusion Incremental learning in this thesis is not a learning for stream dataset, but a learning for the depth of network. It can be easily applied to the online learning setting. Maximum depth of the model does not need to predefined before we start learning. If we use equivalent block for ResNet or VGGNet, then we can attach a new sub-network until the model converges.

Online Deep Learning: Learning Deep Neural Networks on the Fly D. Sahoo, Q. Pham, J. Lu, S .C .H. Hoi Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) arXiv:1711.03705 2017. 11. 10.

Online deep learning In many cases, data arrives sequentially in a stream, any may be too large to be stored in memory. Moreover, the data may exhibit concept drift. Online learning : class of learning algorithms that learn to optimize predictive models over a stream of data instances sequentially.

Online deep learning for deep learning Previous online learning methods are focus on a linear and kernel (2-layers) models. The point is, in online learning, data is small at first and increase gradually. Depth trade-off How to choose depth? Explicit and Implicit methods There prediction is just perform by the last hidden layer. This hinder the learning of lower layer’s weights.

Proposed model Model Parameters : 𝑊, Θ, 𝛼 Prediction is the weighted sum of each hidden layers.

Hedge Algorithm Algorithm for learning 𝛼 (Freund and Schapire, 1997). – Adaboost Loss function ℒ 𝐹 𝑥 , 𝑦 = 𝑙=0 𝐿 𝛼 𝑙 ℒ( 𝑓 𝑙 𝑥 , 𝑦)) Initialize 𝛼 0 (𝑙) with uniformly distributed, i.e. 𝛼 0 (𝑙) = 1 𝐿+1 At every 𝑡 iteration, update 𝛼 (𝑙) as 𝛼 𝑡+1 (𝑙) ← 𝛼 𝑡 (𝑙) 𝛽 ℒ 𝑓 𝑙 𝑥 , 𝑦 where 𝛽∈(0,1) is the discount rate parameter, and ℒ 𝑓 𝑙 𝑥 , 𝑦 ∈(0,1). Normalize 𝛼 𝑙 s.t. 𝑙=0 𝐿 𝛼 𝑡+1 (𝑙) =1.

Hedge Algorithm Hedge enjoys a regret of 𝑅 𝑇 ≤ 𝑇 𝑙𝑛𝑁 , where 𝑁 is the number of experts (Freund and Schapire, 1999), which in this case is the network depth. Since shallower models tends to converge faster than deeper models, using a hedging strategy would lower 𝛼 weights of deeper classifiers to a very small value. To alleviate this, use smoothing parameter 𝑠 ∈(0,1) which is used to set a minimum weight for each classifier. 𝛼 𝑡 (𝑙) ⇠max⁡( 𝛼 𝑡 𝑙 , 𝑠 𝐿 )

Online Deep Learning using HBP Learning 𝑊, Θ is based on the basic backpropagation.

Contribution Dynamic Objective : Having a dynamically adaptive objective function mitigates the impact of vanishing gradient and helps escape saddle points and local minima. Student-teacher learning Ensemble Concept drifting Convolutional Networks

Experiments - Datasets

Experiments – Traditional Online BP

Experiments – Comparison

Experiments – Convergence speed

Experiments – Evolution of weight 𝜶

Experiments – Robust to the Base Net