Deep and Sparse Learning in Speech and Language Processing: An Overview Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University +University of Stirling
Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions
Sparse hypothesis Information represented by sparse codes [Northo 2014] Encode difference Excitation-inhibition balance General in brain Sparse coding and long-term memory https://www.youtube.com/watch?v=Jy7JmG3eMnw
Hierarchical hypothesis Abstraction layer-by-layer Ubiquitous in brain N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A. J. Rodriguez-Sanchez, L. Wiskott, "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?", IEEE Transactions on Pattern Analysis & Machine Intelligence. vol. 35, no. , pp. 1847-1871, Aug. 2013
Sparse models in machine learning Discover prominent patterns Easy to interpret Robust against noise 𝐹=F(x;w) s.t. sparse constraints Lasso Sparse discriminative analysis Sparse SVM ...
Hierarchical models in machine learning Probabilistic hierarchical model Hierarchical LDA Deep Boltzmann machines Layer-wised Baysian network Neural hierarchical model Deep neural networks All are called deep models! Very impressive results by deep learning David M.Blei,03, JMLR
Marry sparse and deep models? Sparse in represenation and deep in structure Deep structure is for better representation Physiologically feasible*, but not much in machine learning This is the focus of our overview Honglak Lee et al. “Unsupervised Learning of Hierarchical representations with converlutional deep belief networks”, ICML 2009. * Peter Kloppenburg,Martin Paul Nawrot, Neural Coding: Sparse but On Time, Volume 24, Issue 19, pR957–R959, 6 October 2014
Two marriage approaches Sparsity Sparse deep models Deep models with sparse ingredients Deep sparse models Stacked sparse models
Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions
Sparse deep learning Involing sparse constrains in deep models Unit Sparsity Weight Sparsity Gradient Sparsity
Sparse deep learning: Unit sparsity (1) Various regularizations on activation L0, L1, L1/L2 Training data points Marc’Aurelio Ranzato, M., lan Boureau, Y., Cun, Y.L.: Sparse feature learning for deep belief networks, NIPS’08
Sparse deep learning: Unit sparsity (2) Lee, H., Ekanadham, C., Ng, A.Y.Sparse deep belief net model for visual area v2, NIPS’07
Sparse deep learning: Unit sparsity (3) Luo, H., Shen, R., Niu, C.: Sparse group restricted boltzmann machines
Sparse deep learning: Unit sparsity (4) Sparse activation function Rectifier activatoin Resemble human neurons Easier to train No pre-training required Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectier neural networks. In: Proceedings of the 14th international conference on Articial Intelligence and Statistics (AISTATS). pp. 315-323 (2011)
Sparse deep learning: Unit sparsity (5) Sparse activation function Sparsifying logistic Poultney, C., Chopra, S., Cun, Y.L., et al.: Ecient learning of sparse representations with an energy-based model.NIPS’06.
Sparse deep learning: Unit sparsity (5) Sparse activation function Winner-take-all Lifetime sparsity. Each unit keeps on p% percetage of activation Spatial sparsity. Keep single largest hidden activity within each feature map. Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders.NIPS’2014.
Sparse deep learning: Unit sparsity (6) Unit sparsity by pre-training Under some conditions, pre-training leads to sparse activations Sparsity seems contribute to the effectiveness of pre-training ReLU does not need pre-training! Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems PP(99), 1{14 (2016)
Sparse deep learning: Weight sparsity L2 and L1 norm Sparse matrix factorization Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 806{814 (2015)
Sparse deep learning: Weight sparsity Connection pruning Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014
Sparse deep learning: Gradient sparsity Contractive AE: L2 on graidents leads to sparse units
Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions
Deep sparse learning Heirarchical sparse coding Sparse codes are derived from raw bits The second-level sparse codes derived from the diagonal vectors of the covariance matrices of neighbouring patches Yu, K., Lin, Y., Laerty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. pp. 1713-1720
Deep sparse learning Stacked sparse coding He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv:1312.5783 (2013)
Deep sparse learning Learning sparse code with deep models (predicted sparse decomposition, PSD) Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic lter maps. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009.
Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions
Sparse learning in speech processing Applied to many applications Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Transactions on Signal Processing 61(1), 44-56 (2013)
Deep learning in speech processing ASR, denoising, speaker recognition, language recognition, ...
Deep and sparse model in speech processing Mostly on weight regulation Rare on unit sparsity Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. ICASSP’11.
Deep and sparse model in speech processing Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014
Sparse models in language processing Sparse topic models L1 regularization Hierarchical model Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI 2012
Sparse models in language processing Sprase coding for document clustering Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)
Sparse models in language processing Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Articial Intelligence (2015)
Deep models in language processing language modeling, semantic parsing, paraphrase detection, machine translation, sentiment prediction... Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science (2014)
Deep models in language processing Text generation Qixin Wang, Tianyi Luo, Dong Wang, Chao Xing, "Chinese Song Iambics Generation with Neural Attention-based Model", IJCAI 2016
Deep and sparse models in language processing Structured Sparsity in learning word representation Yogatama, D.: Sparse Models of Natural Language Text. Ph.D. thesis, Carnegie Mellon University (2015)
Deep and sparse models in language processing Sparse word vectors Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using l-1 regularized online learning. In: IJCAI 2016. pp. 2915-2921 Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL 2016. pp. 1187-1197 (2016)
Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions
Conclusions Sparse learning and deep learning are two important aspects of modern machine learning They are correlated but the marriage is still limited Two marriage approaches Sparse codes as information representation, deep learning as framework Deep learning leads to sparse codes How to proceed? Merge different sparsity constraints? Semi-supervised learning? Investigate the resembling to biological neurons There is much space to do
Thanks!