Compression of CNNs Mooyeol Baek Xiangyu Zhang, Jianhua Zou, Xiang Ming, Kaiming He, Jian Sun: Efficient and Accurate Approximations of Nonlinear Convolutional Networks. Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, Dongjun Shin: Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications.
Motivation It’s practically important to accelerate the test-time computation of CNNs. CNN filters can be approximately decomposed into a series of smaller filters by row-rank approximation.
Approaches Zhang et al. Kim et al. m m c n d k k c m m c 1 1 c m m c’ n n d’ 1 1 n d n n 1 1 k k c’ n n
Efficient and Accurate Approximations of Nonlinear Convolutional Networks. Xiangyu Zhang, Jianhua Zou, Xiang Ming, Kaiming He, Jian Sun
Contribution Low-rank approximation minimizing the reconstruction error of nonlinear responses. Asymmetric reconstruction to reduce the accumulated error of multiple approximated layers. Empirical observation of PCA energy to select proper rank.
Low-rank Approximation m m c n d k k c n n d’ 1 1 n
Low-rank Approximation Relaxation
Asymmetric Reconstruction Uses non-approximate responses to reduce the accumulated error of multiple approximated layers. OriginalApproximated
Rank Selection
Experiments [1] Linear vs. Nonlinear
Experiments [2] Symmetric vs. Asymmetric
Experiments [3] Rank selection
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, Dongjun Shin
Contribution One-shot whole network compression scheme which consists of simple three steps: 1.Rank selection (Variational Bayesian matrix factorization) 2.Low-rank tensor decomposition (Tucker decomposition) 3.Fine-tuning.
Tensor Decomposition Tucker decomposition
Tensor Decomposition Zhang et al. Kim et al. m m c n d k k c m m c 1 1 c m m c’ n n d’ 1 1 n d n n 1 1 k k c’ n n
Fine-tuning
Experiments [1]
Experiments [2]