Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)

Similar presentations


Presentation on theme: "Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)"— Presentation transcript:

1 Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)
Many slides are from Bengio and Lecun’s NIPS15 Tutoring A few slides are from Deng’s ICASSP’16 keynote speech

2 Content Special nets Regularization in deep models Generative model
Achievements

3 CNN Converlution: An adpative filtering/feature learning
Pooling: A way of invariance constraint A parameter sharing A form of structure knowledge A way of implementing human perceptron

4

5

6 Recurrent net A way of parameter sharing
A way of learning temporal principles A way of learning logical programming (inference) A way of implementing human memory and inference (but mixed)

7

8

9

10

11

12

13

14

15

16 Content Special nets Regularization in deep models Generative model
Achievements

17 Regularization DNN is a very flexible structure
Easy to overfitting: bias-variance Easy to underfitting: complex Hessian:local minima, saddle point, slow training... Introduce regularization to Balance description & generalization Introduce human knowledge (Bayesian prior?)

18 Various regularizations
Parameter norm penalty (L1,L2) Data augmentation Noise injection Hidden unit noise injection (drop out) Target noise injection

19 Parameter sharing CNN RNN NADE

20 Associated training Semi-supervised learning Multitask learning
Collaborative learning Supervised nets Chen-Yu Lee, Deeply-Supervised Nets, DL workshop 2014.

21 Sparse representation
L0 and L1 norm Penalty on hidden units Pretraining leads to sparsity CAE, sparse RBM, sparse DBN, denoise auto-encoder Winner-take-all convolution J. Li et al., Sparseness Analysis in the Pretraining of Deep Neural Networks, 2016

22 Adversial training Find the misclassified examples
Train with these data to avoid boder effect

23 Manifold regularization
Let the tangent direction of the manifold orthogonal to the gradident of the cost Tanget can be obtained from unsupervised learning

24 Dark knowledge transfer
Lili Mou, Ge Li, Yan Xu, Lu Zhang, Zhi Jin, DistillingWord Embeddings: An Encoding Approach, 2015. Train a teacher, and let the teacher to teach the child Can be used to train simple model using complex model, e.g., ensemble learning Can be used to train complex model from simple model Long, Mingsheng and Wang, Jianmin, Learning Transferable Features with Deep Adaptation Networks, ICML2015

25 Content Special nets Regularization in deep models Generative model
Achievements

26 DNN is basicially not very generative
G=(V,E) represents deterministic inference Probabilistic interpolation: Gaussian, Binomial, or MDN. Some ‘randomness’ on input, label, hidden units Basically not generative models

27 Graphical model is generative
G=(V,E) represent joint probabilities of V, with conditionals or potentials represented by E Probabilistic variables Probabilistic inference Probabilistic generation Graphical models and variational methods: Message-passing and relaxations , ICML-2008 Tutorial

28 Introducing latent variable for NN
Let NN as a feature transform In the transform space, the variable is simple, e.g., Gaussian The generation will be simple by running the latent variable

29 Variational Bayesian with Auto-encoder
Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In ICLR. Danilo J. Rezende, Shakir Mohamed, Daan Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICMS 2014.

30 Variational Auto-encoder

31 What have been changed? Bayesian perspective Neural model perspective
A encoder (parametric function) is used to map input x to code z, where the variation p(z) is simpler than p(x). With x, p(z|x) keeps simple With z, conditional probability p(x|z) is simpler than x. All seems simpler! Model training becomes parameter adjustment, using BP. Neural model perspective Randomness Can we BP? Using MCMC, on the simple p(x|z). Seems a variational + MCMC

32 Extend to other encoder-decoder models

33 Stochastic Recurrent Networks (STORNs)
Bayer, J. and Osendorfer, C. (2014). Learning stochastic recurrent networks. In NIPS Workshop on Advances in Variational Inference

34 Variational Recurrent AE (VRAE)
Fabius, O. and van Amersfoort, J. R. (2014). Variational recurrent auto- encoders. arXiv: Music generation

35 Variable Encoder-Decoder RNN
Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio, A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues, 2016/05/20

36 Variational RNN LM Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. (2015). Generating sentences from a continuous space. arXiv:

37 Variational image generation
DRAW: A Recurrent Neural Network For Image Generation

38 Generation using noise corruption
Injection noise, and let the NN recover the input This leads to learning a manifold where the data distributes This manifold can be used to generate X

39 Denoise AE

40 DAE learns manifold and scores (gradients)
Guillaume Alain and Yoshua Bengio, What Regularized Auto-Encoders Learn from the Data Generating Distribution

41 DAE can be used to sampling x

42 Any corruption + any cost
Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent, Generalized Denoising Auto-Encoders as Generative Models.

43 Introducing latent variables
Yoshua Bengio et al., Deep Generative Stochastic Networks Trainable by Backprop.

44 Multi-step generation
Train DAE with random corruption Reconstruct iteratively until converge Equals to get stuck to minimum engergy, or max p(x) It can be proved that with symmetric corruption, the conergence is a stationary point.

45 Content Special nets Regularization in deep models Generative model
Achievements

46 Speech recognition

47 Facebook structure

48

49

50

51

52

53 Natural language processing

54

55

56

57

58 Joint semantic learning
Combine text knokwledge and graph knowledge Embed word in the co-space Dongxu Zhang, Dong Wang, Rong Liu, "Joint Semantic Relevance Learning with Text Data and Graph Knowledge", ACL 2015, workshop CVSC

59 Relation classification
Using RNN to classify relatoins

60

61

62

63 Poem generation

64

65

66

67

68

69

70

71 Another generation

72

73

74 Wrap up Deep learning research is most focusing on imposing structures
Deep learning can be used as both predicitve and descriptive. The former is more related to function approximation, and the later is more related to abstraction and memorization. Deep learning achieves brilliant improvement so far. It opens the door of true AI. Certainly, true AI is not only deep learning, as we will see.


Download ppt "Deep Learning(II) Dong Wang CSLT ML Summer Seminar (5)"

Similar presentations


Ads by Google