Dong Wang, Qiang Zhou, Amir Hussain+ *CSLT, Tsinghua University

Slides:

Advertisements

Similar presentations

Convolutional Neural Networks

Advertisements

Advanced topics.

Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.

Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009

Recent Developments in Deep Learning Quoc V. Le Stanford University and Google.

Data Visualization STAT 890, STAT 442, CM 462

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Image Classification using Sparse Coding: Advanced Topics

Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.

Video Tracking Using Learned Hierarchical Features

A shallow introduction to Deep Learning

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.

Deep Visual Analogy-Making

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Convolutional Neural Network

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Understanding Convolutional Neural Networks for Object Recognition

Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)

Attention Model in NLP Jichuan ZENG.

Big data classification using neural network

Convolutional Neural Network

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Deeply learned face representations are sparse, selective, and robust

Deep Learning Amin Sobhani.

ECE 5424: Introduction to Machine Learning

Article Review Todd Hricik.

Neural Machine Translation by Jointly Learning to Align and Translate

Restricted Boltzmann Machines for Classification

References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,

Part-Based Room Categorization for Household Service Robots

Deep Learning Yoshua Bengio, U. Montreal

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

Deep learning and applications to Natural language processing

Deep Learning Workshop

Dipartimento di Ingegneria «Enzo Ferrari»

Deep Learning: Methodologies and Applications in Medical Imaging

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Department of Electrical and Computer Engineering

Deep Learning based Machine Translation

convolutional neural networkS

Computer Vision James Hays

Distributed Representation of Words, Sentences and Paragraphs

CNNs and compressive sensing Theoretical analysis

Introduction to Neural Networks

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

convolutional neural networkS

Deep learning Introduction Classes of Deep Learning Networks

Goodfellow: Chapter 14 Autoencoders

Hairong Qi, Gonzalez Family Professor

Introduction to Natural Language Processing

SVM-based Deep Stacking Networks

Representation Learning with Deep Auto-Encoder

Neural networks (3) Regularization Autoencoder

Advances in Deep Audio and Audio-Visual Processing

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Visual Grounding 专题报告 Lejian Ren 4.23.

Deep learning: Recurrent Neural Networks CV192

Bidirectional LSTM-CRF Models for Sequence Tagging

Goodfellow: Chapter 14 Autoencoders

Presentation transcript:

Deep and Sparse Learning in Speech and Language Processing: An Overview Dong Wang*, Qiang Zhou*, Amir Hussain+ *CSLT, Tsinghua University +University of Stirling

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Sparse hypothesis Information represented by sparse codes [Northo 2014] Encode difference Excitation-inhibition balance General in brain Sparse coding and long-term memory https://www.youtube.com/watch?v=Jy7JmG3eMnw

Hierarchical hypothesis Abstraction layer-by-layer Ubiquitous in brain N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A. J. Rodriguez-Sanchez, L. Wiskott, "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?", IEEE Transactions on Pattern Analysis & Machine Intelligence. vol. 35, no. , pp. 1847-1871, Aug. 2013

Sparse models in machine learning Discover prominent patterns Easy to interpret Robust against noise 𝐹=F(x;w) s.t. sparse constraints Lasso Sparse discriminative analysis Sparse SVM ...

Hierarchical models in machine learning Probabilistic hierarchical model Hierarchical LDA Deep Boltzmann machines Layer-wised Baysian network Neural hierarchical model Deep neural networks All are called deep models! Very impressive results by deep learning David M.Blei,03, JMLR

Marry sparse and deep models? Sparse in represenation and deep in structure Deep structure is for better representation Physiologically feasible*, but not much in machine learning This is the focus of our overview Honglak Lee et al. “Unsupervised Learning of Hierarchical representations with converlutional deep belief networks”, ICML 2009. * Peter Kloppenburg,Martin Paul Nawrot, Neural Coding: Sparse but On Time, Volume 24, Issue 19, pR957–R959, 6 October 2014

Two marriage approaches Sparsity Sparse deep models Deep models with sparse ingredients Deep sparse models Stacked sparse models

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Sparse deep learning Involing sparse constrains in deep models Unit Sparsity Weight Sparsity Gradient Sparsity

Sparse deep learning: Unit sparsity (1) Various regularizations on activation L0, L1, L1/L2 Training data points Marc’Aurelio Ranzato, M., lan Boureau, Y., Cun, Y.L.: Sparse feature learning for deep belief networks, NIPS’08

Sparse deep learning: Unit sparsity (2) Lee, H., Ekanadham, C., Ng, A.Y.Sparse deep belief net model for visual area v2, NIPS’07

Sparse deep learning: Unit sparsity (3) Luo, H., Shen, R., Niu, C.: Sparse group restricted boltzmann machines

Sparse deep learning: Unit sparsity (4) Sparse activation function Rectifier activatoin Resemble human neurons Easier to train No pre-training required Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectier neural networks. In: Proceedings of the 14th international conference on Articial Intelligence and Statistics (AISTATS). pp. 315-323 (2011)

Sparse deep learning: Unit sparsity (5) Sparse activation function Sparsifying logistic Poultney, C., Chopra, S., Cun, Y.L., et al.: Ecient learning of sparse representations with an energy-based model.NIPS’06.

Sparse deep learning: Unit sparsity (5) Sparse activation function Winner-take-all Lifetime sparsity. Each unit keeps on p% percetage of activation Spatial sparsity. Keep single largest hidden activity within each feature map. Makhzani, A., Frey, B.: A winner-take-all method for training sparse convolutional autoencoders.NIPS’2014.

Sparse deep learning: Unit sparsity (6) Unit sparsity by pre-training Under some conditions, pre-training leads to sparse activations Sparsity seems contribute to the effectiveness of pre-training ReLU does not need pre-training! Li, J., Zhang, T., Luo, W., Yang, J., Yuan, X.T., Zhang, J.: Sparseness analysis in the pretraining of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems PP(99), 1{14 (2016)

Sparse deep learning: Weight sparsity L2 and L1 norm Sparse matrix factorization Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 806{814 (2015)

Sparse deep learning: Weight sparsity Connection pruning Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

Sparse deep learning: Gradient sparsity Contractive AE: L2 on graidents leads to sparse units

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Deep sparse learning Heirarchical sparse coding Sparse codes are derived from raw bits The second-level sparse codes derived from the diagonal vectors of the covariance matrices of neighbouring patches Yu, K., Lin, Y., Laerty, J.: Learning image representations from the pixel level via hierarchical sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. pp. 1713-1720

Deep sparse learning Stacked sparse coding He, Y., Kavukcuoglu, K., Wang, Y., Szlam, A., Qi, Y.: Unsupervised feature learning by deep sparse coding. arXiv preprint arXiv:1312.5783 (2013)

Deep sparse learning Learning sparse code with deep models (predicted sparse decomposition, PSD) Kavukcuoglu, K., Fergus, R., LeCun, Y., et al.: Learning invariant features through topographic lter maps. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009.

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Sparse learning in speech processing Applied to many applications Wang, D., Vipperla, R., Evans, N., Zheng, T.F.: Online non-negative convolutive pattern learning for speech signals. IEEE Transactions on Signal Processing 61(1), 44-56 (2013)

Deep learning in speech processing ASR, denoising, speaker recognition, language recognition, ...

Deep and sparse model in speech processing Mostly on weight regulation Rare on unit sparsity Sivaram, G.S., Hermansky, H.: Multilayer perceptron with sparse hidden outputs for phoneme recognition. ICASSP’11.

Deep and sparse model in speech processing Chao Liu, Dong Wang, Zhiyong Zhang, "Pruning Deep Neural Networks by Optimal Brain Damage", Interspeech 2014

Sparse models in language processing Sparse topic models L1 regularization Hierarchical model Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI 2012

Sparse models in language processing Sprase coding for document clustering Wu, C., Yang, H., Zhu, J., Zhang, J., King, I., Lyu, M.R.: Sparse poisson coding for high dimensional document clustering. In: IEEE International Conference on Big Data (2013)

Sparse models in language processing Liu, H., Yu, H., Deng, Z.: Multi-document summarization based on two-level sparse representation model. In: National Conference on Articial Intelligence (2015)

Deep models in language processing language modeling, semantic parsing, paraphrase detection, machine translation, sentiment prediction... Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science (2014)

Deep models in language processing Text generation Qixin Wang, Tianyi Luo, Dong Wang, Chao Xing, "Chinese Song Iambics Generation with Neural Attention-based Model", IJCAI 2016

Deep and sparse models in language processing Structured Sparsity in learning word representation Yogatama, D.: Sparse Models of Natural Language Text. Ph.D. thesis, Carnegie Mellon University (2015)

Deep and sparse models in language processing Sparse word vectors Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using l-1 regularized online learning. In: IJCAI 2016. pp. 2915-2921 Vyas, Y., Carpuat, M.: Sparse bilingual word representations for cross-lingual lexical entailment. In: NAACL 2016. pp. 1187-1197 (2016)

Contents Deep learning and sparse learning Sparse deep learning Deep sparse learning Applications in speech & language processing Conclusions

Conclusions Sparse learning and deep learning are two important aspects of modern machine learning They are correlated but the marriage is still limited Two marriage approaches Sparse codes as information representation, deep learning as framework Deep learning leads to sparse codes How to proceed? Merge different sparsity constraints? Semi-supervised learning? Investigate the resembling to biological neurons There is much space to do

Thanks!