Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University

Similar presentations


Presentation on theme: "Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University"— Presentation transcript:

1 Deep and Sparse Learning in Speech and Language Processing: An Overview
Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University +University of Stirling

2 Contents Deep learning and sparse learning Sparse deep learning
Deep sparse learning Applications in speech & language processing Conclusions

3 Sparse hypothesis Information represented by sparse codes [Northo 2014] Encode difference Excitation-inhibition balance General in brain Sparse coding and long-term memory

4 Hierarchical hypothesis
Abstraction layer-by-layer Ubiquitous in brain N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A. J. Rodriguez-Sanchez, L. Wiskott, "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?", IEEE Transactions on Pattern Analysis & Machine Intelligence. vol. 35, no. , pp , Aug. 2013

5 Sparse models in machine learning
Discover prominent patterns Easy to interpret Robust against noise 𝐹=F(x;w) s.t. sparse constraints Lasso Sparse discriminative analysis Sparse SVM ...

6 Hierarchical models in machine learning
Probabilistic hierarchical model Hierarchical LDA Deep Boltzmann machines Layer-wised Baysian network Neural hierarchical model Deep neural networks All are called deep models! Very impressive results by deep learning David M.Blei,03, JMLR

7 Marry sparse and deep models?
Sparse in represenation and deep in structure Deep structure is for better representation Physiologically feasible*, but not much in machine learning This is the focus of our overview Honglak Lee et al. “Unsupervised Learning of Hierarchical representations with converlutional deep belief networks”, ICML 2009. * Peter Kloppenburg,Martin Paul Nawrot, Neural Coding: Sparse but On Time, Volume 24, Issue 19, pR957–R959, 6 October 2014

8 Two marriage approaches
Sparsity Sparse deep models Deep models with sparse ingredients Deep sparse models Stacked sparse models

9 Contents Deep learning and sparse learning Sparse deep learning
Deep sparse learning Applications in speech & language processing Conclusions

10 Sparse deep learning Involing sparse constrains in deep models
Unit Sparsity Weight Sparsity Gradient Sparsity

11 Sparse deep learning: Unit sparsity (1)
Various regularizations on activation L0, L1, L1/L2 Training data points Marc’Aurelio Ranzato, M., lan Boureau, Y., Cun, Y.L.: Sparse feature learning for deep belief networks, NIPS’08

12 Sparse deep learning: Unit sparsity (2)
Lee, H., Ekanadham, C., Ng, A.Y.Sparse deep belief net model for visual area v2, NIPS’07

13 Sparse deep learning: Unit sparsity (3)
Luo, H., Shen, R., Niu, C.: Sparse group restricted boltzmann machines

14 Sparse deep learning: Unit sparsity (4)
Sparse activation function Rectifier activatoin Resemble human neurons Easier to train No pre-training required Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectier neural networks. In: Proceedings of the 14th international conference on Articial Intelligence and Statistics (AISTATS). pp (2011)

15 Sparse deep learning: Unit sparsity (5)
Sparse activation function Sparsifying logistic Poultney, C., Chopra, S., Cun, Y.L., et al.: Ecient learning of sparse representations with an energy-based model.NIPS’06.

16 Sparse deep learning: Unit sparsity (5)
Sparse activation function Winner-take-all Lifetime sparsity. Each unit keeps on p% percetage of activation Spatial sparsity. Keep single largest hidden activity within each feature map. Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders.NIPS’2014.

17 Sparse deep learning: Unit sparsity (6)
Unit sparsity by pre-training Under some conditions, pre-training leads to sparse activations Sparsity seems contribute to the effectiveness of pre-training ReLU does not need pre-training! Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems PP(99), 1{14 (2016)

18 Sparse deep learning: Weight sparsity
L2 and L1 norm Sparse matrix factorization Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 806{814 (2015)

19 Sparse deep learning: Weight sparsity
Connection pruning Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

20 Sparse deep learning: Gradient sparsity
Contractive AE: L2 on graidents leads to sparse units

21 Contents Deep learning and sparse learning Sparse deep learning
Deep sparse learning Applications in speech & language processing Conclusions

22 Deep sparse learning Heirarchical sparse coding
Sparse codes are derived from raw bits The second-level sparse codes derived from the diagonal vectors of the covariance matrices of neighbouring patches Yu, K., Lin, Y., Laerty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. pp

23 Deep sparse learning Stacked sparse coding
He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv: (2013)

24 Deep sparse learning Learning sparse code with deep models (predicted sparse decomposition, PSD) Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic lter maps. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009.

25 Contents Deep learning and sparse learning Sparse deep learning
Deep sparse learning Applications in speech & language processing Conclusions

26 Sparse learning in speech processing
Applied to many applications Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Transactions on Signal Processing 61(1), (2013)

27 Deep learning in speech processing
ASR, denoising, speaker recognition, language recognition, ...

28 Deep and sparse model in speech processing
Mostly on weight regulation Rare on unit sparsity Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. ICASSP’11.

29 Deep and sparse model in speech processing
Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

30 Sparse models in language processing
Sparse topic models L1 regularization Hierarchical model Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI 2012

31 Sparse models in language processing
Sprase coding for document clustering Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)

32 Sparse models in language processing
Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Articial Intelligence (2015)

33 Deep models in language processing
language modeling, semantic parsing, paraphrase detection, machine translation, sentiment prediction... Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science (2014)

34 Deep models in language processing
Text generation Qixin Wang, Tianyi Luo, Dong Wang, Chao Xing, "Chinese Song Iambics Generation with Neural Attention-based Model", IJCAI 2016 

35 Deep and sparse models in language processing
Structured Sparsity in learning word representation Yogatama, D.: Sparse Models of Natural Language Text. Ph.D. thesis, Carnegie Mellon University (2015)

36 Deep and sparse models in language processing
Sparse word vectors Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using l-1 regularized online learning. In: IJCAI pp Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL pp (2016)

37 Contents Deep learning and sparse learning Sparse deep learning
Deep sparse learning Applications in speech & language processing Conclusions

38 Conclusions Sparse learning and deep learning are two important aspects of modern machine learning They are correlated but the marriage is still limited Two marriage approaches Sparse codes as information representation, deep learning as framework Deep learning leads to sparse codes How to proceed? Merge different sparsity constraints? Semi-supervised learning? Investigate the resembling to biological neurons There is much space to do

39 Thanks!


Download ppt "Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University"

Similar presentations


Ads by Google