Download presentation
Published byHoward Arnold Modified over 7 years ago
1
Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)
Many slides are from Bengio and Lecun’s NIPS15 Tutoring A few slides are from Deng’s ICASSP’16 keynote speech
2
Content Special nets Regularization in deep models Generative model
Achievements
3
CNN Converlution: An adpative filtering/feature learning
Pooling: A way of invariance constraint A parameter sharing A form of structure knowledge A way of implementing human perceptron
6
Recurrent net A way of parameter sharing
A way of learning temporal principles A way of learning logical programming (inference) A way of implementing human memory and inference (but mixed)
16
Content Special nets Regularization in deep models Generative model
Achievements
17
Regularization DNN is a very flexible structure
Easy to overfitting: bias-variance Easy to underfitting: complex Hessian:local minima, saddle point, slow training... Introduce regularization to Balance description & generalization Introduce human knowledge (Bayesian prior?)
18
Various regularizations
Parameter norm penalty (L1,L2) Data augmentation Noise injection Hidden unit noise injection (drop out) Target noise injection
19
Parameter sharing CNN RNN NADE
20
Associated training Semi-supervised learning Multitask learning
Collaborative learning Supervised nets Chen-Yu Lee, Deeply-Supervised Nets, DL workshop 2014.
21
Sparse representation
L0 and L1 norm Penalty on hidden units Pretraining leads to sparsity CAE, sparse RBM, sparse DBN, denoise auto-encoder Winner-take-all convolution J. Li et al., Sparseness Analysis in the Pretraining of Deep Neural Networks, 2016
22
Adversial training Find the misclassified examples
Train with these data to avoid boder effect
23
Manifold regularization
Let the tangent direction of the manifold orthogonal to the gradident of the cost Tanget can be obtained from unsupervised learning
24
Dark knowledge transfer
Lili Mou, Ge Li, Yan Xu, Lu Zhang, Zhi Jin, DistillingWord Embeddings: An Encoding Approach, 2015. Train a teacher, and let the teacher to teach the child Can be used to train simple model using complex model, e.g., ensemble learning Can be used to train complex model from simple model Long, Mingsheng and Wang, Jianmin, Learning Transferable Features with Deep Adaptation Networks, ICML2015
25
Content Special nets Regularization in deep models Generative model
Achievements
26
DNN is basicially not very generative
G=(V,E) represents deterministic inference Probabilistic interpolation: Gaussian, Binomial, or MDN. Some ‘randomness’ on input, label, hidden units Basically not generative models
27
Graphical model is generative
G=(V,E) represent joint probabilities of V, with conditionals or potentials represented by E Probabilistic variables Probabilistic inference Probabilistic generation Graphical models and variational methods: Message-passing and relaxations , ICML-2008 Tutorial
28
Introducing latent variable for NN
Let NN as a feature transform In the transform space, the variable is simple, e.g., Gaussian The generation will be simple by running the latent variable
29
Variational Bayesian with Auto-encoder
Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In ICLR. Danilo J. Rezende, Shakir Mohamed, Daan Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICMS 2014.
30
Variational Auto-encoder
31
What have been changed? Bayesian perspective Neural model perspective
A encoder (parametric function) is used to map input x to code z, where the variation p(z) is simpler than p(x). With x, p(z|x) keeps simple With z, conditional probability p(x|z) is simpler than x. All seems simpler! Model training becomes parameter adjustment, using BP. Neural model perspective Randomness Can we BP? Using MCMC, on the simple p(x|z). Seems a variational + MCMC
32
Extend to other encoder-decoder models
33
Stochastic Recurrent Networks (STORNs)
Bayer, J. and Osendorfer, C. (2014). Learning stochastic recurrent networks. In NIPS Workshop on Advances in Variational Inference
34
Variational Recurrent AE (VRAE)
Fabius, O. and van Amersfoort, J. R. (2014). Variational recurrent auto- encoders. arXiv: Music generation
35
Variable Encoder-Decoder RNN
Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio, A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues, 2016/05/20
36
Variational RNN LM Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. (2015). Generating sentences from a continuous space. arXiv:
37
Variational image generation
DRAW: A Recurrent Neural Network For Image Generation
38
Generation using noise corruption
Injection noise, and let the NN recover the input This leads to learning a manifold where the data distributes This manifold can be used to generate X
39
Denoise AE
40
DAE learns manifold and scores (gradients)
Guillaume Alain and Yoshua Bengio, What Regularized Auto-Encoders Learn from the Data Generating Distribution
41
DAE can be used to sampling x
42
Any corruption + any cost
Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent, Generalized Denoising Auto-Encoders as Generative Models.
43
Introducing latent variables
Yoshua Bengio et al., Deep Generative Stochastic Networks Trainable by Backprop.
44
Multi-step generation
Train DAE with random corruption Reconstruct iteratively until converge Equals to get stuck to minimum engergy, or max p(x) It can be proved that with symmetric corruption, the conergence is a stationary point.
45
Content Special nets Regularization in deep models Generative model
Achievements
46
Speech recognition
47
Facebook structure
53
Natural language processing
58
Joint semantic learning
Combine text knokwledge and graph knowledge Embed word in the co-space Dongxu Zhang, Dong Wang, Rong Liu, "Joint Semantic Relevance Learning with Text Data and Graph Knowledge", ACL 2015, workshop CVSC
59
Relation classification
Using RNN to classify relatoins
63
Poem generation
71
Another generation
74
Wrap up Deep learning research is most focusing on imposing structures
Deep learning can be used as both predicitve and descriptive. The former is more related to function approximation, and the later is more related to abstraction and memorization. Deep learning achieves brilliant improvement so far. It opens the door of true AI. Certainly, true AI is not only deep learning, as we will see.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.