Learning Deep L0 Encoders Qing Ling Department of Automation, University of Science and Technology of China (USTC) Joint work with Zhangyang Wang and Thomas Huang (UIUC) The 2016 AAAI Conference on Artificial Intelligence (AAAI 2016) The 2015 Youth Symposium of Scientific and Engineering Computation (YSSEC 2015) 2015/12/11 1
Starter: A Joke about Deep Learning The way to do machine learning research 5 years ago Collect Data Analyze Data Design Feature Build Model Verify Model Optimize Model Evaluate Model The way to do machine learning research now Collect Data Tune Network Collect Data Tune Network Collect Data Tune Network ……
Theme of This Talk Behind the success of deep learning are difficulties in Structure design & network initialization & parameter tuning Incorporation of problem-level prior & interpretation From engineering (or art) to science Statistic bounds Convergence analysis Bridging deep (big data) & shallow (small data) models Our goal: connection between deep learning & sparse coding 3
Outline A brief introduction to deep learning Connection between deep learning & sparse coding Deep L0-regularized encoder Deep M-sparse L0 encoder Numerical experiments Conclusions 4 4
Learning Deep Representations/Features Example: a feed-forward network Train with BIG input & output Inference from input to output In the train stage Learn nonlinear features Fi Linear weight + nonlinear neuron Stochastic (sub)gradient In the inference stage Transform with learned features Fast end-to-end inference 5 5
Power of Being Deep: Example of ILSVRC Human 5.1% Data Algorithm System 6 6
Sparse Coding Revisited Train in sparse coding Given (X,Y) where Y is a sparse representation of X Learn a dictionary D such that X = DY by some approach Inference in sparse coding Y = argminY ||X-DY||2 + r(Y) Regularization r(Y): enforce sparsity of Y View sparse coding from the perspective of deep learning Train & inference are done over different architectures Iterative algorithm for inference that is often slow Not end-to-end (classification, etc) 7 7
Connect Sparse Coding & Deep Learning Idea: truncate iterative algorithm for train & inference Train & inference are done in the same architecture Fast & end-to-end inference (add a new operator/neuron) O1 O2 + X Y O3 Example: unfolded & truncated up to the second iteration O1 X Y O2 O3 O2 O3 O2 + + 8 8
Case Study: Deep L0-Regularized Encoder L0-regularized least squares Y = argminY ||X-DY||2 + c2||Y||0 IHT: Yk+1 = hc(DTX+(I-DTD)Yk) = hc(DTX+WYk) O1 O2 DT hc + + X Y X Y O3 W Trained as a deep network; fast & end-to-end inference DT X Y hc W hc W hc + + 9 9
HELU: A New Nonlinear Neuron hc: tolerate large values, strongly penalize small values HELU: compared with logistic, sigmoid & ReLU Discontinuous & hard to train with stochastic (sub)gradient HELU HELUd HELUd: close to HELU when d goes to 0; dynamic during train 10 10
Case Study: Deep M-Sparse L0 Encoder M-sparse constrained least squares Y = argminY ||X-DY||2, s.t. ||Y||0 ≤ M PGD: Yk+1 = pM(DTX+(I-DTD)Yk) = pM(DTX+WYk) O1 O2 DT pM + + X Y X Y O3 W Similar train & inference as in deep L0-regularized encoder DT X Y pM W pM W pM + + 11 11
Interpreting Max-M Pooling/Unpooling pM: keep coefficients with the top M largest absolute values Indeed, the well-known max-M pooling/unpooling operator Explains its success in deep learning: sparse representation Max-2 Pooling Unpooling Compare Max-M pooling/unpooling & HELU Different sparsification approaches Exact sparsity level or trained through samples 12 12
Implementation Issues Use (small scale) sparse coding to initialize deep learning Simplify initialization of deep learning Make sparse coding scalable Train & test data follow the same distribution: no magic here 13 13
Numerical Experiments MNIST dataset: 60,000 for train & 10,000 for test Outperform iterative sparse coding & existing deep networks L0-regularized encoder learns regularization parameter M-sparse L0 encoder incorporates prior of sparsity level M 14 14
Concluding Remarks Bridge deep learning & sparse coding Explain & exploit structure design of deep learning Incorporate problem-level prior & interpret neurons Give an effective initialization strategy for deep learning A general coding scheme including sparse & nonsparse? Deep L1 encoder (LeCun et al 2012) Learning sparse & low-rank (Sprechmann et al 2015) Laplacian regularization (Wang et al 2015) Design & explanation of more general encoders 15 15
Thank you for your attention 凌青 中国科学技术大学自动化系 http://home.ustc.edu.cn/~qingling qingling@mail.ustc.edu.cn 16