Dong Wang, Qiang Zhou, Amir Hussain+ *CSLT, Tsinghua University

Deep and Sparse Learning in Speech and Language Processing: An Overview
Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University +University of Stirling

Contents Deep learning and sparse learning Sparse deep learning
Deep sparse learning Applications in speech & language processing Conclusions

Sparse hypothesis Information represented by sparse codes [Northo 2014] Encode difference Excitation-inhibition balance General in brain Sparse coding and long-term memory

Hierarchical hypothesis
Abstraction layer-by-layer Ubiquitous in brain N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A. J. Rodriguez-Sanchez, L. Wiskott, "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?", IEEE Transactions on Pattern Analysis & Machine Intelligence. vol. 35, no. , pp , Aug. 2013

Sparse models in machine learning
Discover prominent patterns Easy to interpret Robust against noise 𝐹=F(x;w) s.t. sparse constraints Lasso Sparse discriminative analysis Sparse SVM ...

Hierarchical models in machine learning
Probabilistic hierarchical model Hierarchical LDA Deep Boltzmann machines Layer-wised Baysian network Neural hierarchical model Deep neural networks All are called deep models! Very impressive results by deep learning David M.Blei,03, JMLR

Marry sparse and deep models?
Sparse in represenation and deep in structure Deep structure is for better representation Physiologically feasible*, but not much in machine learning This is the focus of our overview Honglak Lee et al. “Unsupervised Learning of Hierarchical representations with converlutional deep belief networks”, ICML 2009. * Peter Kloppenburg,Martin Paul Nawrot, Neural Coding: Sparse but On Time, Volume 24, Issue 19, pR957–R959, 6 October 2014

Two marriage approaches
Sparsity Sparse deep models Deep models with sparse ingredients Deep sparse models Stacked sparse models

Sparse deep learning Involing sparse constrains in deep models
Unit Sparsity Weight Sparsity Gradient Sparsity

Sparse deep learning: Unit sparsity (1)
Various regularizations on activation L0, L1, L1/L2 Training data points Marc’Aurelio Ranzato, M., lan Boureau, Y., Cun, Y.L.: Sparse feature learning for deep belief networks, NIPS’08

Lee, H., Ekanadham, C., Ng, A.Y.Sparse deep belief net model for visual area v2, NIPS’07

Luo, H., Shen, R., Niu, C.: Sparse group restricted boltzmann machines

Sparse activation function Rectifier activatoin Resemble human neurons Easier to train No pre-training required Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectier neural networks. In: Proceedings of the 14th international conference on Articial Intelligence and Statistics (AISTATS). pp (2011)

Sparse activation function Sparsifying logistic Poultney, C., Chopra, S., Cun, Y.L., et al.: Ecient learning of sparse representations with an energy-based model.NIPS’06.

Sparse activation function Winner-take-all Lifetime sparsity. Each unit keeps on p% percetage of activation Spatial sparsity. Keep single largest hidden activity within each feature map. Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders.NIPS’2014.

Unit sparsity by pre-training Under some conditions, pre-training leads to sparse activations Sparsity seems contribute to the effectiveness of pre-training ReLU does not need pre-training! Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems PP(99), 1{14 (2016)

Sparse deep learning: Weight sparsity
L2 and L1 norm Sparse matrix factorization Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 806{814 (2015)

Sparse deep learning: Weight sparsity
Connection pruning Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

Sparse deep learning: Gradient sparsity
Contractive AE: L2 on graidents leads to sparse units

Deep sparse learning Heirarchical sparse coding
Sparse codes are derived from raw bits The second-level sparse codes derived from the diagonal vectors of the covariance matrices of neighbouring patches Yu, K., Lin, Y., Laerty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. pp

Deep sparse learning Stacked sparse coding
He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv: (2013)

Deep sparse learning Learning sparse code with deep models (predicted sparse decomposition, PSD) Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic lter maps. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009.

Sparse learning in speech processing
Applied to many applications Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Transactions on Signal Processing 61(1), (2013)

Deep learning in speech processing
ASR, denoising, speaker recognition, language recognition, ...

Deep and sparse model in speech processing
Mostly on weight regulation Rare on unit sparsity Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. ICASSP’11.

Deep and sparse model in speech processing
Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

Sparse models in language processing
Sparse topic models L1 regularization Hierarchical model Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI 2012

Sprase coding for document clustering Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)

Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Articial Intelligence (2015)

Deep models in language processing
language modeling, semantic parsing, paraphrase detection, machine translation, sentiment prediction... Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science (2014)

Deep models in language processing
Text generation Qixin Wang, Tianyi Luo, Dong Wang, Chao Xing, "Chinese Song Iambics Generation with Neural Attention-based Model", IJCAI 2016

Deep and sparse models in language processing
Structured Sparsity in learning word representation Yogatama, D.: Sparse Models of Natural Language Text. Ph.D. thesis, Carnegie Mellon University (2015)

Deep and sparse models in language processing
Sparse word vectors Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using l-1 regularized online learning. In: IJCAI pp Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL pp (2016)

Conclusions Sparse learning and deep learning are two important aspects of modern machine learning They are correlated but the marriage is still limited Two marriage approaches Sparse codes as information representation, deep learning as framework Deep learning leads to sparse codes How to proceed? Merge different sparsity constraints? Semi-supervised learning? Investigate the resembling to biological neurons There is much space to do

Thanks!

Dong Wang, Qiang Zhou, Amir Hussain+ *CSLT, Tsinghua University

Similar presentations

Presentation on theme: "Dong Wang, Qiang Zhou, Amir Hussain+ *CSLT, Tsinghua University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University

Similar presentations

Presentation on theme: "Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University"— Presentation transcript:

Similar presentations

About project

Feedback

Dong Wang, Qiang Zhou, Amir Hussain+ *CSLT, Tsinghua University

Presentation on theme: "Dong Wang, Qiang Zhou, Amir Hussain+ *CSLT, Tsinghua University"— Presentation transcript: