Recent Advances in Neural Architecture Search

Recent Advances in Neural Architecture Search
Hao Chen

Introduction to NAS Automate the design of artificial neural networks
Manually designing of neural network can be time-consuming and error-prone Growing interest in neural architecture search (NAS) for Image processing Language modelling

NAS Process Search Space defines which architectures can be represented. Search Strategy details how to explore the search space Performance Estimation Strategy refers to the process of estimating the performance on unseen data [Elsken et al. 2018]

Search Space Design Sequential Multi-branch Cell Search Space
[Elsken et al. 2018]

Sequential Search Space
ProxylessNAS [Cai et al. 2019] Choose different kernel sizes for Mobilenet v2 blocks Uniform [Guo et al. 2019]/DetNAS [Chen et al. 2019] Choose different kernel sizes for shufflenet v2 blocks

Multi-Branch Search Space
NAS with RL [Zoph and Le 2017] Sequential layers with skip connections learned with RNN self-attention DPC [Chen et al. 2018] 5 branches of conv3x3 with different dilation rates NAS-FPN [Ghaisi et al. 2019]

Cell Search Space NASNet [Zoph et al. 2018] Normal + reduction cell
7x speed-up with better performance DARTS-like

Search Strategies Random Search Evolutionary Algorithm
SMASH [Brock et al. 2017] / One-Shot [Bender et al. 2018] Evolutionary Algorithm Regularized evolution [Real et al. 2018] Reinforcement Learning REINFORCE: [Zoph and Le 2017], [Pham et al ] Proximal policy optimization (PPO): [Zoph et al. 2018] Bayesian Optimization GP with string kernel Vizier [Chen et al. 2018] Guided ES [Liu et al ] Hyperband Bayesian optimization [Wang et al. 2018] BO and optimal transport [Kandasamy et al. 2018] Gradient Based Optimization SMASH [Brock et al. 2017] ENAS [Pham et al ] DARTS [Liu et al ] …

Policy Gradient RL Architecture is generated sequentially with an RNN controller The gradient of the controller is estimated with REINFORCE or PPO REINFORCE is unbiased PPO has smaller variance, in practice, performs better

Evolutionary Strategy
AmoebaNet [Real et al. 2018] Aging evolution to explore the space more. Connection/op mutation DetNAS [Chen et al. 2019] Single path training of supernet Samples are single paths of supernet

Comparison of the strategies
In the field of hyperparameter optimization and AutoML, a cleverly designed random search algorithm can be very reliable [Bergstra 2012]. According to the case studies comparing EA, RL and BO, there is no guarantees that one search strategy that is strictly better than the others [Liu et al , Kandasamy et al. 2018]. However, they are all constantly better than random search. [Real at al. 2018]

Computation limitation
The original NAS paper, 800 GPU x 28 days on 32x32 CIFAR10 images With cell search space and PPO, reduced to 2000 GPU-days

Speed-up Evaluation Proxy task Performance Prediction
Smaller network: thinner & shallower (for cell structures or decoder search) Less iterations： 10 -> 50 epochs on COCO Lower resolution: CIFAR -> ImageNet Caching fix structure features Performance Prediction Loss curve interpolation

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Progressive stages Stage 1: Quick decoder weights adaptation fix backbone output features and train decoder with very large batch size Stage 2: Progressive early stopping after each training stage, if the reward is lower than the expectation, stop training this model with probability p. Training acceleration Weight averaging Knowledge distillation Auxiliary with classifier (cls) or cell Total search time: 8 GPU-days The average time to evaluate one architecture is 7min on 1 GPU [Nekrasov et al. CVPR19]

One-Shot NAS DARTS [Liu et al. 2018 1]
Gradient based bi-level optimization

Fixing DARTS The softmax relaxation is biased
Reparameterization trick Variational optimization Gap between search and evaluation Vulnerable to co-adaptation Proxy task and network different from main one Droppath Pruning during training Supernet consumes too much memory Single path Modify search space during training

Better Gradient Estimate?
Temperature dependent sigmoid [Noy et al. 2019] Gumbel-softmax trick [Xie et al. 2019] As t -> 0, the bias becomes smaller but the variance goes to infinity REBAR/RELAX unbiased, low-variance estimator

Dealing with Evaluation Gap
One-Shot [Bender et al. 2018] Linear scheduled droppath ASAP [Noy et al. 2019] Pruning during training with threshold 0.4/N on connection weights PDARTS [Chen et al ] Growing the search space OTF Architecture parameters learning is difficult Variational optimization or Bayesian network

Single Path Training Cell search space Sequential search space
ENAS [Pham et al ] Unstable batch feature statistics Sequential search space ProxylessNAS STE/REINFORCE Uniform/DetNAS Uniform sampling of paths Evolution search w/o proxy task -> direct hardware Limited in search space

https://stanstarks.github.io/tw5/#NAS%20Fact%20Sheet

[Anonymous 2018] Single Shot Neural Architecture Search Via Direct Sparse Optimization
[Bender et al. 2018] Understanding and Simplifying One-Shot Architecture Search [Bergstra 2012] Random search Hyperparameter Optimization, JMLR 2012 [Brock et al. 2017] SMASH: One-shot model architecture search through hypernetworks [Cai et al. 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware, ICLR 2019 [Chen et al. 2018] Searching for Efficient Multi-Scale Architectures for Dense Image Prediction, NIPS 2018 [Chen et al. 2019] DetNAS: Neural Architecture Search on Object Detection [Chen et al ] Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Elsken et al. 2018] Neural architecture search: A survey, arXiv preprint arXiv: (2018). [Ghaisi et al. 2019] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, CVPR 2019 [Guo et al. 2019] Single Path One-Shot Neural Architecture Search with Uniform Sampling [Kandasamy et al. 2018] Neural Architecture Search with Bayesian Optimization and Optimal Transport [Liu et al ] DARTS: Differentiable Architecture Search [Liu et al ] Progressive Neural Architecture Search [Nekrasov et al. 2018] Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells [Noy et al. 2019] ASAP: Architecture Search, Anneal and Prune [Pham et al ] Faster Discovery of Neural Architectures by Searching for Paths in a Large Model, ICLR 2018 [Pham et al ] Efficient Neural Architecture Search via Parameter Sharing [Real et al. 2018] Regularized Evolution for Image Classifier Architecture Search [Schulman et al. 2017] Proximal policy optimization algorithms. arXiv preprint arXiv: (2017). [Wang et al. 2018] Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning [Xie et al. 2019] SNAS: STOCHASTIC NEURAL ARCHITECTURE SEARCH, ICLR 2019 [Zoph and Le 2017] Neural Architecture Search with Reinforcement Learning, ICLR 2017 [Zoph et al. 2018] Learning Transferable Architectures for Scalable Image Recognition

Recent Advances in Neural Architecture Search

Similar presentations

Presentation on theme: "Recent Advances in Neural Architecture Search"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recent Advances in Neural Architecture Search

Similar presentations

Presentation on theme: "Recent Advances in Neural Architecture Search"— Presentation transcript:

Similar presentations

About project

Feedback