Mahdi Kalayeh David Hill Learn to Comment Week 6 Mahdi Kalayeh David Hill
Overview Quick introduction to LSTMs and BPTT Results for this week GPU implementation
Introduction to LSTM’s: RNN’s Output Output Who Hidden Hidden Whh Wih Input Input
Unrolling RNN’s Output0 Output1 Output2 Hidden-1 Hidden0 Hidden1 Hidden state initialized to neutral value at t-1 Hidden-1 Hidden0 Hidden1 Hidden2 Input0 Input1 Input2
LSTM Unit Designed to eliminate exploding/disappearing gradient problem Learns with greater temporal depth
LSTM Unit
LSTM Backprop
Experiment Results Previously: Tested several 256 unit models This week: Test full sized 512 unit model also on flickr-8k concatenated features same learning parameters
Experiment Results LSTM Size Bleu-1 Bleu-2 Bleu-3 Bleu-4 256 57.6 37.3 Ours: GoogLeNet 256 57.6 37.3 23.6 15.1 Ours: Places 52.8 32.4 19.5 11.7 Ours: GoogLeNet + Places 59.4 39.9 26.3 17.3 512 59.3 39.6 25.5 16.2 Google: NIC 63 ... Human 70
Result Analysis Test scaling learning rate over epochs Early stopping on Bleu Consider Dropout
GPU Implementation Working on GPU: Needs work: Forwarding, single-backprop Needs work: Backprop over a batch: