Recent Advances in Neural Architecture Search

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Bayesian Belief Propagation
ImageNet Classification with Deep Convolutional Neural Networks
Sparse vs. Ensemble Approaches to Supervised Learning
Genetic Algorithms and Ant Colony Optimisation
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.
Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.
Chapter 6 Neural Network.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Deep Residual Learning for Image Recognition
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Recent developments in object detection
Neural networks and support vector machines
Convolutional Sequence to Sequence Learning
Machine Learning for Computer Security
Convolutional Neural Network
Stochastic tree search and stochastic games
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Summary of “Efficient Deep Learning for Stereo Matching”
Online Multiscale Dynamic Topic Models
Deep Learning Amin Sobhani.
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Randomness in Neural Networks
2 Research Department, iFLYTEK Co. LTD.
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Adversarial Learning for Neural Dialogue Generation
Matt Gormley Lecture 16 October 24, 2016
Generative Adversarial Networks
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Deep reinforcement learning
Training Techniques for Deep Neural Networks
CS6890 Deep Learning Weizhen Cai
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
"Playing Atari with deep reinforcement learning."
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Deep Learning based Machine Translation
CSC 578 Neural Networks and Deep Learning
Incremental Training of Deep Convolutional Neural Networks
INF 5860 Machine learning for image classification
Tips for Training Deep Network
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Double Dueling Agent for Dialogue Policy Learning
Neural Networks Geoff Hulten.
Boltzmann Machine (BM) (§6.4)
RCNN, Fast-RCNN, Faster-RCNN
Designing Neural Network Architectures Using Reinforcement Learning
Iterative Crowd Counting
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Designing architectures by hand is hard
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Course Recap and What’s Next?
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Neural Architecture Search: Basic Approach, Acceleration and Tricks
Unsupervised Perceptual Rewards For Imitation Learning
Human-object interaction
Introduction to Neural Networks
Feature Selective Anchor-Free Module for Single-Shot Object Detection
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Modeling IDS using hybrid intelligent systems
Object Detection Implementations
Noah’s Ark Lab, Huawei Inc. (华为诺亚方舟实验室)
CSC 578 Neural Networks and Deep Learning
Fast neural architecture search for faster semantic segmentation
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Recent Advances in Neural Architecture Search Hao Chen hao.chen01@Adelaide.edu.au

Introduction to NAS Automate the design of artificial neural networks Manually designing of neural network can be time-consuming and error-prone Growing interest in neural architecture search (NAS) for Image processing Language modelling

NAS Process Search Space defines which architectures can be represented. Search Strategy details how to explore the search space Performance Estimation Strategy refers to the process of estimating the performance on unseen data [Elsken et al. 2018]

Search Space Design Sequential Multi-branch Cell Search Space [Elsken et al. 2018]

Sequential Search Space ProxylessNAS [Cai et al. 2019] Choose different kernel sizes for Mobilenet v2 blocks Uniform [Guo et al. 2019]/DetNAS [Chen et al. 2019] Choose different kernel sizes for shufflenet v2 blocks

Multi-Branch Search Space NAS with RL [Zoph and Le 2017] Sequential layers with skip connections learned with RNN self-attention DPC [Chen et al. 2018] 5 branches of conv3x3 with different dilation rates NAS-FPN [Ghaisi et al. 2019]

Cell Search Space NASNet [Zoph et al. 2018] Normal + reduction cell 7x speed-up with better performance DARTS-like

Search Strategies Random Search Evolutionary Algorithm SMASH [Brock et al. 2017] / One-Shot [Bender et al. 2018] Evolutionary Algorithm Regularized evolution [Real et al. 2018] Reinforcement Learning REINFORCE: [Zoph and Le 2017], [Pham et al. 2018 1] Proximal policy optimization (PPO): [Zoph et al. 2018] Bayesian Optimization GP with string kernel Vizier [Chen et al. 2018] Guided ES [Liu et al. 2018 2] Hyperband Bayesian optimization [Wang et al. 2018] BO and optimal transport [Kandasamy et al. 2018] Gradient Based Optimization SMASH [Brock et al. 2017] ENAS [Pham et al. 2018 2] DARTS [Liu et al. 2018 1] …

Policy Gradient RL Architecture is generated sequentially with an RNN controller The gradient of the controller is estimated with REINFORCE or PPO REINFORCE is unbiased PPO has smaller variance, in practice, performs better

Evolutionary Strategy AmoebaNet [Real et al. 2018] Aging evolution to explore the space more. Connection/op mutation DetNAS [Chen et al. 2019] Single path training of supernet Samples are single paths of supernet

Comparison of the strategies In the field of hyperparameter optimization and AutoML, a cleverly designed random search algorithm can be very reliable [Bergstra 2012]. According to the case studies comparing EA, RL and BO, there is no guarantees that one search strategy that is strictly better than the others [Liu et al. 2018 2, Kandasamy et al. 2018]. However, they are all constantly better than random search. [Real at al. 2018]

Computation limitation The original NAS paper, 800 GPU x 28 days on 32x32 CIFAR10 images With cell search space and PPO, reduced to 2000 GPU-days

Speed-up Evaluation Proxy task Performance Prediction Smaller network: thinner & shallower (for cell structures or decoder search) Less iterations: 10 -> 50 epochs on COCO Lower resolution: CIFAR -> ImageNet Caching fix structure features Performance Prediction Loss curve interpolation

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells Progressive stages Stage 1: Quick decoder weights adaptation fix backbone output features and train decoder with very large batch size Stage 2: Progressive early stopping after each training stage, if the reward is lower than the expectation, stop training this model with probability p. Training acceleration Weight averaging Knowledge distillation Auxiliary with classifier (cls) or cell Total search time: 8 GPU-days The average time to evaluate one architecture is 7min on 1 GPU [Nekrasov et al. CVPR19]

One-Shot NAS DARTS [Liu et al. 2018 1] Gradient based bi-level optimization

Fixing DARTS The softmax relaxation is biased Reparameterization trick Variational optimization Gap between search and evaluation Vulnerable to co-adaptation Proxy task and network different from main one Droppath Pruning during training Supernet consumes too much memory Single path Modify search space during training

Better Gradient Estimate? Temperature dependent sigmoid [Noy et al. 2019] Gumbel-softmax trick [Xie et al. 2019] As t -> 0, the bias becomes smaller but the variance goes to infinity REBAR/RELAX unbiased, low-variance estimator

Dealing with Evaluation Gap One-Shot [Bender et al. 2018] Linear scheduled droppath ASAP [Noy et al. 2019] Pruning during training with threshold 0.4/N on connection weights PDARTS [Chen et al. 2019 2] Growing the search space OTF Architecture parameters learning is difficult Variational optimization or Bayesian network

Single Path Training Cell search space Sequential search space ENAS [Pham et al. 2018 2] Unstable batch feature statistics Sequential search space ProxylessNAS STE/REINFORCE Uniform/DetNAS Uniform sampling of paths Evolution search w/o proxy task -> direct hardware Limited in search space

https://stanstarks.github.io/tw5/#NAS%20Fact%20Sheet

[Anonymous 2018] Single Shot Neural Architecture Search Via Direct Sparse Optimization [Bender et al. 2018] Understanding and Simplifying One-Shot Architecture Search [Bergstra 2012] Random search Hyperparameter Optimization, JMLR 2012 [Brock et al. 2017] SMASH: One-shot model architecture search through hypernetworks [Cai et al. 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware, ICLR 2019 [Chen et al. 2018] Searching for Efficient Multi-Scale Architectures for Dense Image Prediction, NIPS 2018 [Chen et al. 2019] DetNAS: Neural Architecture Search on Object Detection [Chen et al. 2019 2] Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Elsken et al. 2018] Neural architecture search: A survey, arXiv preprint arXiv:1808.05377 (2018). [Ghaisi et al. 2019] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, CVPR 2019 [Guo et al. 2019] Single Path One-Shot Neural Architecture Search with Uniform Sampling [Kandasamy et al. 2018] Neural Architecture Search with Bayesian Optimization and Optimal Transport [Liu et al. 2018 1] DARTS: Differentiable Architecture Search [Liu et al. 2018 2] Progressive Neural Architecture Search [Nekrasov et al. 2018] Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells [Noy et al. 2019] ASAP: Architecture Search, Anneal and Prune [Pham et al. 2018 1] Faster Discovery of Neural Architectures by Searching for Paths in a Large Model, ICLR 2018 [Pham et al. 2018 2] Efficient Neural Architecture Search via Parameter Sharing [Real et al. 2018] Regularized Evolution for Image Classifier Architecture Search [Schulman et al. 2017] Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017). [Wang et al. 2018] Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning [Xie et al. 2019] SNAS: STOCHASTIC NEURAL ARCHITECTURE SEARCH, ICLR 2019 [Zoph and Le 2017] Neural Architecture Search with Reinforcement Learning, ICLR 2017 [Zoph et al. 2018] Learning Transferable Architectures for Scalable Image Recognition