Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)

Name: Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)
Uploaded: 2018-01-11T10:15:43+00:00
Duration: PTM12S56
Channel: Howard Arnold
Description: Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)

Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)
Many slides are from Bengio and Lecun’s NIPS15 Tutoring A few slides are from Deng’s ICASSP’16 keynote speech

Content Special nets Regularization in deep models Generative model
Achievements

CNN Converlution: An adpative filtering/feature learning
Pooling: A way of invariance constraint A parameter sharing A form of structure knowledge A way of implementing human perceptron

Recurrent net A way of parameter sharing
A way of learning temporal principles A way of learning logical programming (inference) A way of implementing human memory and inference (but mixed)

Achievements

Regularization DNN is a very flexible structure
Easy to overfitting: bias-variance Easy to underfitting: complex Hessian:local minima, saddle point, slow training... Introduce regularization to Balance description & generalization Introduce human knowledge (Bayesian prior?)

Various regularizations
Parameter norm penalty (L1,L2) Data augmentation Noise injection Hidden unit noise injection (drop out) Target noise injection

Parameter sharing CNN RNN NADE

Associated training Semi-supervised learning Multitask learning
Collaborative learning Supervised nets Chen-Yu Lee, Deeply-Supervised Nets, DL workshop 2014.

Sparse representation
L0 and L1 norm Penalty on hidden units Pretraining leads to sparsity CAE, sparse RBM, sparse DBN, denoise auto-encoder Winner-take-all convolution J. Li et al., Sparseness Analysis in the Pretraining of Deep Neural Networks, 2016

Adversial training Find the misclassified examples
Train with these data to avoid boder effect

Manifold regularization
Let the tangent direction of the manifold orthogonal to the gradident of the cost Tanget can be obtained from unsupervised learning

Dark knowledge transfer
Lili Mou, Ge Li, Yan Xu, Lu Zhang, Zhi Jin, DistillingWord Embeddings: An Encoding Approach, 2015. Train a teacher, and let the teacher to teach the child Can be used to train simple model using complex model, e.g., ensemble learning Can be used to train complex model from simple model Long, Mingsheng and Wang, Jianmin, Learning Transferable Features with Deep Adaptation Networks, ICML2015

Achievements

DNN is basicially not very generative
G=(V,E) represents deterministic inference Probabilistic interpolation: Gaussian, Binomial, or MDN. Some ‘randomness’ on input, label, hidden units Basically not generative models

Graphical model is generative
G=(V,E) represent joint probabilities of V, with conditionals or potentials represented by E Probabilistic variables Probabilistic inference Probabilistic generation Graphical models and variational methods: Message-passing and relaxations , ICML-2008 Tutorial

Introducing latent variable for NN
Let NN as a feature transform In the transform space, the variable is simple, e.g., Gaussian The generation will be simple by running the latent variable

Variational Bayesian with Auto-encoder
Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In ICLR. Danilo J. Rezende, Shakir Mohamed, Daan Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICMS 2014.

Variational Auto-encoder

What have been changed? Bayesian perspective Neural model perspective
A encoder (parametric function) is used to map input x to code z, where the variation p(z) is simpler than p(x). With x, p(z|x) keeps simple With z, conditional probability p(x|z) is simpler than x. All seems simpler! Model training becomes parameter adjustment, using BP. Neural model perspective Randomness Can we BP? Using MCMC, on the simple p(x|z). Seems a variational + MCMC

Extend to other encoder-decoder models

Stochastic Recurrent Networks (STORNs)
Bayer, J. and Osendorfer, C. (2014). Learning stochastic recurrent networks. In NIPS Workshop on Advances in Variational Inference

Variational Recurrent AE (VRAE)
Fabius, O. and van Amersfoort, J. R. (2014). Variational recurrent auto- encoders. arXiv: Music generation

Variable Encoder-Decoder RNN
Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio, A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues, 2016/05/20

Variational RNN LM Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. (2015). Generating sentences from a continuous space. arXiv:

Variational image generation
DRAW: A Recurrent Neural Network For Image Generation

Generation using noise corruption
Injection noise, and let the NN recover the input This leads to learning a manifold where the data distributes This manifold can be used to generate X

Denoise AE

DAE learns manifold and scores (gradients)
Guillaume Alain and Yoshua Bengio, What Regularized Auto-Encoders Learn from the Data Generating Distribution

DAE can be used to sampling x

Any corruption + any cost
Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent, Generalized Denoising Auto-Encoders as Generative Models.

Introducing latent variables
Yoshua Bengio et al., Deep Generative Stochastic Networks Trainable by Backprop.

Multi-step generation
Train DAE with random corruption Reconstruct iteratively until converge Equals to get stuck to minimum engergy, or max p(x) It can be proved that with symmetric corruption, the conergence is a stationary point.

Achievements

Speech recognition

Facebook structure

Natural language processing

Joint semantic learning
Combine text knokwledge and graph knowledge Embed word in the co-space Dongxu Zhang, Dong Wang, Rong Liu, "Joint Semantic Relevance Learning with Text Data and Graph Knowledge", ACL 2015, workshop CVSC

Relation classification
Using RNN to classify relatoins

Poem generation

Another generation

Wrap up Deep learning research is most focusing on imposing structures
Deep learning can be used as both predicitve and descriptive. The former is more related to function approximation, and the later is more related to abstraction and memorization. Deep learning achieves brilliant improvement so far. It opens the door of true AI. Certainly, true AI is not only deep learning, as we will see.

Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)

Similar presentations

Presentation on theme: "Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)

Similar presentations

Presentation on theme: "Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)"— Presentation transcript:

Similar presentations

About project

Feedback