Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University

Slides:



Advertisements
Similar presentations
Convolutional Neural Networks
Advertisements

Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Recent Developments in Deep Learning Quoc V. Le Stanford University and Google.
Data Visualization STAT 890, STAT 442, CM 462
Deep Learning.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Image Classification using Sparse Coding: Advanced Topics
Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.
Video Tracking Using Learned Hierarchical Features
A shallow introduction to Deep Learning
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.
Deep Visual Analogy-Making
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Convolutional Neural Network
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Understanding Convolutional Neural Networks for Object Recognition
Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)
Attention Model in NLP Jichuan ZENG.
Big data classification using neural network
Convolutional Neural Network
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deeply learned face representations are sparse, selective, and robust
Deep Learning Amin Sobhani.
ECE 5424: Introduction to Machine Learning
Article Review Todd Hricik.
Neural Machine Translation by Jointly Learning to Align and Translate
Restricted Boltzmann Machines for Classification
References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,
Part-Based Room Categorization for Household Service Robots
Deep Learning Yoshua Bengio, U. Montreal
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
Deep Learning Workshop
Dipartimento di Ingegneria «Enzo Ferrari»
Deep Learning: Methodologies and Applications in Medical Imaging
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Department of Electrical and Computer Engineering
Deep Learning based Machine Translation
convolutional neural networkS
Computer Vision James Hays
Distributed Representation of Words, Sentences and Paragraphs
CNNs and compressive sensing Theoretical analysis
Introduction to Neural Networks
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
convolutional neural networkS
Deep learning Introduction Classes of Deep Learning Networks
Goodfellow: Chapter 14 Autoencoders
Hairong Qi, Gonzalez Family Professor
Introduction to Natural Language Processing
SVM-based Deep Stacking Networks
Representation Learning with Deep Auto-Encoder
Neural networks (3) Regularization Autoencoder
Advances in Deep Audio and Audio-Visual Processing
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Visual Grounding 专题报告 Lejian Ren 4.23.
Deep learning: Recurrent Neural Networks CV192
Bidirectional LSTM-CRF Models for Sequence Tagging
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Deep and Sparse Learning in Speech and Language Processing: An Overview Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University +University of Stirling

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Sparse hypothesis Information represented by sparse codes [Northo 2014] Encode difference Excitation-inhibition balance General in brain Sparse coding and long-term memory https://www.youtube.com/watch?v=Jy7JmG3eMnw

Hierarchical hypothesis Abstraction layer-by-layer Ubiquitous in brain N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A. J. Rodriguez-Sanchez, L. Wiskott, "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?", IEEE Transactions on Pattern Analysis & Machine Intelligence. vol. 35, no. , pp. 1847-1871, Aug. 2013

Sparse models in machine learning Discover prominent patterns Easy to interpret Robust against noise 𝐹=F(x;w) s.t. sparse constraints Lasso Sparse discriminative analysis Sparse SVM ...

Hierarchical models in machine learning Probabilistic hierarchical model Hierarchical LDA Deep Boltzmann machines Layer-wised Baysian network Neural hierarchical model Deep neural networks All are called deep models! Very impressive results by deep learning David M.Blei,03, JMLR

Marry sparse and deep models? Sparse in represenation and deep in structure Deep structure is for better representation Physiologically feasible*, but not much in machine learning This is the focus of our overview Honglak Lee et al. “Unsupervised Learning of Hierarchical representations with converlutional deep belief networks”, ICML 2009. * Peter Kloppenburg,Martin Paul Nawrot, Neural Coding: Sparse but On Time, Volume 24, Issue 19, pR957–R959, 6 October 2014

Two marriage approaches Sparsity Sparse deep models Deep models with sparse ingredients Deep sparse models Stacked sparse models

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Sparse deep learning Involing sparse constrains in deep models Unit Sparsity Weight Sparsity Gradient Sparsity

Sparse deep learning: Unit sparsity (1) Various regularizations on activation L0, L1, L1/L2 Training data points Marc’Aurelio Ranzato, M., lan Boureau, Y., Cun, Y.L.: Sparse feature learning for deep belief networks, NIPS’08

Sparse deep learning: Unit sparsity (2) Lee, H., Ekanadham, C., Ng, A.Y.Sparse deep belief net model for visual area v2, NIPS’07

Sparse deep learning: Unit sparsity (3) Luo, H., Shen, R., Niu, C.: Sparse group restricted boltzmann machines

Sparse deep learning: Unit sparsity (4) Sparse activation function Rectifier activatoin Resemble human neurons Easier to train No pre-training required Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectier neural networks. In: Proceedings of the 14th international conference on Articial Intelligence and Statistics (AISTATS). pp. 315-323 (2011)

Sparse deep learning: Unit sparsity (5) Sparse activation function Sparsifying logistic Poultney, C., Chopra, S., Cun, Y.L., et al.: Ecient learning of sparse representations with an energy-based model.NIPS’06.

Sparse deep learning: Unit sparsity (5) Sparse activation function Winner-take-all Lifetime sparsity. Each unit keeps on p% percetage of activation Spatial sparsity. Keep single largest hidden activity within each feature map. Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders.NIPS’2014.

Sparse deep learning: Unit sparsity (6) Unit sparsity by pre-training Under some conditions, pre-training leads to sparse activations Sparsity seems contribute to the effectiveness of pre-training ReLU does not need pre-training! Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems PP(99), 1{14 (2016)

Sparse deep learning: Weight sparsity L2 and L1 norm Sparse matrix factorization Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 806{814 (2015)

Sparse deep learning: Weight sparsity Connection pruning Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

Sparse deep learning: Gradient sparsity Contractive AE: L2 on graidents leads to sparse units

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Deep sparse learning Heirarchical sparse coding Sparse codes are derived from raw bits The second-level sparse codes derived from the diagonal vectors of the covariance matrices of neighbouring patches Yu, K., Lin, Y., Laerty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. pp. 1713-1720

Deep sparse learning Stacked sparse coding He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv:1312.5783 (2013)

Deep sparse learning Learning sparse code with deep models (predicted sparse decomposition, PSD) Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic lter maps. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009.

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Sparse learning in speech processing Applied to many applications Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Transactions on Signal Processing 61(1), 44-56 (2013)

Deep learning in speech processing ASR, denoising, speaker recognition, language recognition, ...

Deep and sparse model in speech processing Mostly on weight regulation Rare on unit sparsity Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. ICASSP’11.

Deep and sparse model in speech processing Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

Sparse models in language processing Sparse topic models L1 regularization Hierarchical model Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI 2012

Sparse models in language processing Sprase coding for document clustering Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)

Sparse models in language processing Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Articial Intelligence (2015)

Deep models in language processing language modeling, semantic parsing, paraphrase detection, machine translation, sentiment prediction... Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science (2014)

Deep models in language processing Text generation Qixin Wang, Tianyi Luo, Dong Wang, Chao Xing, "Chinese Song Iambics Generation with Neural Attention-based Model", IJCAI 2016 

Deep and sparse models in language processing Structured Sparsity in learning word representation Yogatama, D.: Sparse Models of Natural Language Text. Ph.D. thesis, Carnegie Mellon University (2015)

Deep and sparse models in language processing Sparse word vectors Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using l-1 regularized online learning. In: IJCAI 2016. pp. 2915-2921 Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL 2016. pp. 1187-1197 (2016)

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Conclusions Sparse learning and deep learning are two important aspects of modern machine learning They are correlated but the marriage is still limited Two marriage approaches Sparse codes as information representation, deep learning as framework Deep learning leads to sparse codes How to proceed? Merge different sparsity constraints? Semi-supervised learning? Investigate the resembling to biological neurons There is much space to do

Thanks!