Welcome deep loria !.

Welcome deep loria !

Deep Loria Mailing list: deeploria@inria.fr
Web site: Git repository:

deeploria: involvment
I'm no DL expert !!! (at most a trigger) Deeploria'll be what you make of it: Need volunteers !! Propose anything Organize, participate, animate... Next meeting (please think about it): Coffee & discussion session ? → paper reading group: who's willing to take care of it ? Demo for Yann LeCun's venue ? … ?

Outline Motivation Lightning-speed overview of DNN basics
Neuron vs. Random Variable; activations Layers: dense, RNN Vanishing gradient More layers: LSTM, RBM/DBN, CNN, Autoencoder Implementation with Keras/Theano

Why all this buzz with DNN ?
Because of Expressive Power cf. “On the Expressive Power of Deep Learning: A Tensor Analysis” by Nadav Cohen, Or Sharir, Amnon Shashua “[...] besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network”

Basic neuron

Activations sigmoid = logistic relu = rectified linear

Dense layer

Alternative “neuron” Graphical model: node = random variable
Connection = “dependency” between variables Restricted Boltzmann Machine (RBM)

Training Dense: Minimize error Stochastic Gradient Descent (SGD) =
gradient descent ( back-propagation) RBM: Minimize energy Contrastive Divergence = gradient descent ( Gibbs sampling)

DNN vs. DBN N x Dense → DNN (Deep Neural Networks)
N x RBM → DBN (Deep Belief Networks) Dense are discriminative = model the “boundary” between classes RBM are generative = model every classe Performances: RBM better (?) Efficiency: RBM much more difficult to train Usage: 90% for Dense

Recurrent neural network
Take the past into account to predict the next step Just like HMMs, CRFs...

Issue 1: Vanishing gradient
Back-propagation of error E = chain rule: N layers → N factors of gradient of activation Gradient decreases exponentially with N Consequences: The deepest layers are never learnt

Vanishing gradient Solutions: More data !
Rectified linear (gradient = 1) Unsupervised pre-training: DBN Autoencoders LSTMs instead of RNNs

Autoencoders Re-create the inputs = model of the data with dimensionality reduction = compression

Vanishing gradient

Issue 2 : overfitting

Overfitting: solutions
Share weights: ex: convolutional layer Regularization: ex: Drop-out, Drop-connect...

Time to code, isn't it ?

Keras example: Reuters
Trains an MLP to classify texts into 46 topics In the root dir of Keras, run: python examples/reuters_mlp.py

Keras example max_words 512 46 topics

Tricks for the model Score = categorical cross-entropy
= kind of smooth, continuous classification error Softmax = normalizes the outputs as probas Adam = adaptive gradient ?

Tricks for the data X_train = int[# sentences][# words] = word idx
Converts list of word indexes into matrix = #sents X Bag Of Words vector (dim=#words)

Plot accuracy as fct of epochs
sudo apt-get install python-matplotlib import matplotlib.pyplot as plt […] plt.plot(history.history['acc']) plt.show()

Plot matrix of weights Or
plt.matshow(model.get_weight()[0], cmap=plt.cm.gray) plt.show() Or plt.savefig(“fig.png”)

Rules of thumb Check overfitting: plot training acc vs. test acc
Check vanishing gradient: plot weights or gradients Normalize your inputs & outputs Try to automatically augment your training set: add noise, rotate/translate images...

Welcome deep loria !.

Similar presentations

Presentation on theme: "Welcome deep loria !."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Welcome deep loria !.

Similar presentations

Presentation on theme: "Welcome deep loria !."— Presentation transcript:

Similar presentations

About project

Feedback