Download presentation
Presentation is loading. Please wait.
1
Welcome deep loria !
2
Deep Loria Mailing list: deeploria@inria.fr
Web site: Git repository:
3
deeploria: involvment
I'm no DL expert !!! (at most a trigger) Deeploria'll be what you make of it: Need volunteers !! Propose anything Organize, participate, animate... Next meeting (please think about it): Coffee & discussion session ? → paper reading group: who's willing to take care of it ? Demo for Yann LeCun's venue ? … ?
4
Outline Motivation Lightning-speed overview of DNN basics
Neuron vs. Random Variable; activations Layers: dense, RNN Vanishing gradient More layers: LSTM, RBM/DBN, CNN, Autoencoder Implementation with Keras/Theano
5
Why all this buzz with DNN ?
Because of Expressive Power cf. “On the Expressive Power of Deep Learning: A Tensor Analysis” by Nadav Cohen, Or Sharir, Amnon Shashua “[...] besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network”
6
Basic neuron
7
Activations sigmoid = logistic relu = rectified linear
8
Dense layer
9
Alternative “neuron” Graphical model: node = random variable
Connection = “dependency” between variables Restricted Boltzmann Machine (RBM)
10
Training Dense: Minimize error Stochastic Gradient Descent (SGD) =
gradient descent ( back-propagation) RBM: Minimize energy Contrastive Divergence = gradient descent ( Gibbs sampling)
11
DNN vs. DBN N x Dense → DNN (Deep Neural Networks)
N x RBM → DBN (Deep Belief Networks) Dense are discriminative = model the “boundary” between classes RBM are generative = model every classe Performances: RBM better (?) Efficiency: RBM much more difficult to train Usage: 90% for Dense
12
Recurrent neural network
Take the past into account to predict the next step Just like HMMs, CRFs...
13
Issue 1: Vanishing gradient
Back-propagation of error E = chain rule: N layers → N factors of gradient of activation Gradient decreases exponentially with N Consequences: The deepest layers are never learnt
14
Vanishing gradient Solutions: More data !
Rectified linear (gradient = 1) Unsupervised pre-training: DBN Autoencoders LSTMs instead of RNNs
15
Autoencoders Re-create the inputs = model of the data with dimensionality reduction = compression
16
LSTM
17
Vanishing gradient
18
Issue 2 : overfitting
19
Overfitting: solutions
Share weights: ex: convolutional layer Regularization: ex: Drop-out, Drop-connect...
20
Time to code, isn't it ?
21
Keras example: Reuters
Trains an MLP to classify texts into 46 topics In the root dir of Keras, run: python examples/reuters_mlp.py
22
Keras example max_words 512 46 topics
23
Tricks for the model Score = categorical cross-entropy
= kind of smooth, continuous classification error Softmax = normalizes the outputs as probas Adam = adaptive gradient ?
24
Tricks for the data X_train = int[# sentences][# words] = word idx
Converts list of word indexes into matrix = #sents X Bag Of Words vector (dim=#words)
25
Plot accuracy as fct of epochs
sudo apt-get install python-matplotlib import matplotlib.pyplot as plt […] plt.plot(history.history['acc']) plt.show()
26
Plot matrix of weights Or
plt.matshow(model.get_weight()[0], cmap=plt.cm.gray) plt.show() Or plt.savefig(“fig.png”)
27
Rules of thumb Check overfitting: plot training acc vs. test acc
Check vanishing gradient: plot weights or gradients Normalize your inputs & outputs Try to automatically augment your training set: add noise, rotate/translate images...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.