Welcome deep loria !
Deep Loria Mailing list: deeploria@inria.fr Web site: http://deeploria.gforge.inria.fr/ Git repository: https://gforge.inria.fr/projects/deeploria
deeploria: involvment I'm no DL expert !!! (at most a trigger) Deeploria'll be what you make of it: Need volunteers !! Propose anything Organize, participate, animate... Next meeting (please think about it): Coffee & discussion session ? → paper reading group: who's willing to take care of it ? Demo for Yann LeCun's venue ? … ?
Outline Motivation Lightning-speed overview of DNN basics Neuron vs. Random Variable; activations Layers: dense, RNN Vanishing gradient More layers: LSTM, RBM/DBN, CNN, Autoencoder Implementation with Keras/Theano
Why all this buzz with DNN ? Because of Expressive Power cf. “On the Expressive Power of Deep Learning: A Tensor Analysis” by Nadav Cohen, Or Sharir, Amnon Shashua “[...] besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network”
Basic neuron
Activations sigmoid = logistic relu = rectified linear
Dense layer
Alternative “neuron” Graphical model: node = random variable Connection = “dependency” between variables Restricted Boltzmann Machine (RBM)
Training Dense: Minimize error Stochastic Gradient Descent (SGD) = gradient descent ( back-propagation) RBM: Minimize energy Contrastive Divergence = gradient descent ( Gibbs sampling)
DNN vs. DBN N x Dense → DNN (Deep Neural Networks) N x RBM → DBN (Deep Belief Networks) Dense are discriminative = model the “boundary” between classes RBM are generative = model every classe Performances: RBM better (?) Efficiency: RBM much more difficult to train Usage: 90% for Dense
Recurrent neural network Take the past into account to predict the next step Just like HMMs, CRFs...
Issue 1: Vanishing gradient Back-propagation of error E = chain rule: N layers → N factors of gradient of activation Gradient decreases exponentially with N Consequences: The deepest layers are never learnt
Vanishing gradient Solutions: More data ! Rectified linear (gradient = 1) Unsupervised pre-training: DBN Autoencoders LSTMs instead of RNNs
Autoencoders Re-create the inputs = model of the data with dimensionality reduction = compression
LSTM
Vanishing gradient
Issue 2 : overfitting
Overfitting: solutions Share weights: ex: convolutional layer Regularization: ex: Drop-out, Drop-connect...
Time to code, isn't it ?
Keras example: Reuters Trains an MLP to classify texts into 46 topics In the root dir of Keras, run: python examples/reuters_mlp.py
Keras example max_words 512 46 topics
Tricks for the model Score = categorical cross-entropy = kind of smooth, continuous classification error Softmax = normalizes the outputs as probas Adam = adaptive gradient ?
Tricks for the data X_train = int[# sentences][# words] = word idx Converts list of word indexes into matrix = #sents X Bag Of Words vector (dim=#words)
Plot accuracy as fct of epochs sudo apt-get install python-matplotlib import matplotlib.pyplot as plt […] plt.plot(history.history['acc']) plt.show()
Plot matrix of weights Or plt.matshow(model.get_weight()[0], cmap=plt.cm.gray) plt.show() Or plt.savefig(“fig.png”)
Rules of thumb Check overfitting: plot training acc vs. test acc Check vanishing gradient: plot weights or gradients Normalize your inputs & outputs Try to automatically augment your training set: add noise, rotate/translate images...