Presentation is loading. Please wait.

Presentation is loading. Please wait.

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Similar presentations


Presentation on theme: "Image Captions With Deep Learning Yulia Kogan & Ron Shiff"— Presentation transcript:

1 Image Captions With Deep Learning Yulia Kogan & Ron Shiff

2 Lecture outline Part 1 – NLP and RNN Introduction “The Unreasonable Effectiveness of Recurrent Neural Networks” Basic Recurrent Neural Network NLP example Long Short Term Memory RNN’s Part 2 – Image Captioning Algorithms using RNN’s

3 The Unreasonable Effectiveness of Recurrent Neural Networks
Taken from Andrej Karpathy’s blog So far – “Old School” Neural Networks – fixed length inputs and outputs RNN’s - operate over sequences of vectors (input or output) Image Captions Sentiment Analysis Machine Translation “Word Prediction”

4 The Unreasonable Effectiveness of Recurrent Neural Networks
Algebraic Geometry-Latex

5 The Unreasonable Effectiveness of Recurrent Neural Networks
Shakespeare:

6

7 Word Vectors Classical Word Representation is “one hot”:
Each word is represented by a sparse vector

8 Word Vectors A more modern approach: Represent words in a dense vector
( ) “Semantically” close vectors are close In the Vector Space. Semantic Relations are preserved in Vector Space: “king”+”woman”-”man”=“queen”

9 Word Vectors A word Vector can be written as :
where is a “one hot” vector, Beneficial for most Deep learning tasks

10 RNN– Language Model (Based on Richard Socher’s lecture – Deep Learning in NLP Stanford) A language model computes a probability for a sequence of words: Examples: Word ordering: Word Choice: Useful for machine translation and speech recognition

11 Recurrent Neural Networks Language Model
Each output depends on all previous inputs

12 RNN– Language Model Input : Word Vectors – At each time, compute:
Output:

13 Recurrent Neural Networks-Language Model
Total Objective is to maximize the log-likelihood w.r.t parameters “one hot” vector containing the true word log-likelihood:

14 RNN’s – HARD TO TRAIN!

15 Vanishing/Exploding gradient problem
For Stochastic Gradient Descent we calculate the derivative of the loss w.r.t the Parameters: Reminder: where: Applying Chain Rule:

16 Vanishing/Exploding gradient problem
Update equation: By Chain rule:

17 Vanishing/Exploding gradient problem
Gradients can be very large or very small – “small W” – vanishing gradient Long time dependencies “Large W” (bad for optimization)

18 LSTM’s Long Short term memory
Invented in 1991 by Hochreiter and Schmidnbaur Solved vanishing and exploding gradients using gating Taken from Christopher Olah’s blog

19 LSTM’s Equations: Different notations. H(t) instead of y(t). C(t) instead of h(t)

20 LSTM’s “Forget Gate” Ft = 0 forget, ft = 1. keep Examples: Period “.”
New Subject gender

21 LSTM’s “Input gate layer”
What information ngoes into the new Candidate Ctilda. “input gate layer” decides which values we’ll update

22 LSTM’s Updating memory cell No longer Exp.: Information Can Flow:
No exponential Ct = f(t-1)*f(t-2)*Ct-2

23 LSTM’s Finally, Setting the output
we need to decide what we’re going to output

24 Conclusions RNN’s are very powerful RNN’s are hard to train
Nowadays - gating (LSTM’s) is the way to go! Acknowledgments: Andrej Karpathy - effectiveness/ Richard Socher - Christopher Olah - LSTMs/


Download ppt "Image Captions With Deep Learning Yulia Kogan & Ron Shiff"

Similar presentations


Ads by Google