Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recurrent Neural Networks

Similar presentations


Presentation on theme: "Recurrent Neural Networks"— Presentation transcript:

1 Recurrent Neural Networks
deeplearning.ai Gated Recurrent Unit (GRU)

2 Motivation Not all problems can be converted into one with fixed-length inputs and outputs Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information Simple case: Output YES if the number of 1s is even, else NO – YES, – NO, … Hard/Impossible to choose a fixed context window There can always be a new sample longer than anything seen

3 Recurrent Neural Networks (RNNs)
Recurrent Neural Networks take the previous output or hidden states as inputs. The composite input at time t has some historical information about the happenings at time T < t RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori

4 Sample Feed-forward Network
y1 h1 x1 t = 1

5 Sample RNN y3 y2 h3 y1 h2 x3 h1 t = 3 x2 t = 2 x1 t = 1

6 Sample RNN y3 y2 h3 y1 h2 x3 h1 t = 3 x2 h0 t = 2 x1 t = 1

7 Sentiment Classification
Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative Inputs: Multiple words, one or more sentences Outputs: Positive / Negative classification “The food was really good” “The chicken crossed the road because it was uncooked”

8 Sentiment Classification
RNN h1 The

9 Sentiment Classification
RNN RNN h1 h2 The food

10 Sentiment Classification
hn RNN RNN RNN h1 h2 hn-1 The food good

11 Sentiment Classification
Linear Classifier hn RNN RNN RNN h1 h2 hn-1 The food good

12 Sentiment Classification
Ignore Ignore Linear Classifier h1 h2 hn RNN RNN RNN h1 h2 hn-1 The food good

13 Sentiment Classification
h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good

14 Sentiment Classification
Linear Classifier h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good

15

16

17

18

19

20

21

22

23

24

25

26 RNN unit 𝑎 <𝑡> =𝑔( 𝑊 𝑎 𝑎 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑎 )
𝑎 <𝑡> =𝑔( 𝑊 𝑎 𝑎 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑎 ) 𝑎 <0> 𝑥 <1> 𝑦 <1> 𝑎 <1> Output activation value

27

28

29

30

31

32

33 ( c ) GRU

34

35 Same dimension, for example 100
Element-wise Dot product

36 GRU (simplified) The cat, which already ate …, was full.
[Cho et al., On the properties of neural machine translation: Encoder-decoder approaches] [Chung et al., Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling]

37 Relevant gate

38 Recurrent Neural Networks
deeplearning.ai LSTM (long short term memory) unit

39 Introduction RNN (Recurrent neural network) is a form of neural networks that feed outputs back to the inputs during operation LSTM (Long short-term memory) is a form of RNN. It fixes the  vanishing gradient problem of the original RNN. Application: Sequence to sequence model based using LSTM for machine translation Materials are mainly based on links found in RNN, LSTM v.7.c

40

41 LSTM (Long short-term memory)
Standard RNN Input concatenate with output then feed to input again LSTM The repeating structure is more complicated RNN, LSTM v.7.c

42

43

44 GRU and LSTM More powerful and general version GRU LSTM 𝑐 <𝑡> = tanh ( 𝑊 𝑐 Γ 𝑟 ∗ 𝑐 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑐 ) Γ 𝑢 = 𝜎 ( 𝑊 𝑢 𝑐 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑢 ) 𝑐 <𝑡> = Γ 𝑢 ∗ 𝑐 <𝑡> + 1− Γ 𝑢 ∗ 𝑐 <𝑡−1> Γ 𝑟 = 𝜎 ( 𝑊 𝑟 𝑐 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑟 ) 𝑎 <𝑡> = 𝑐 <𝑡> [Hochreiter & Schmidhuber Long short-term memory]

45 LSTM in pictures 𝑐 <𝑡−1> 𝑎 <𝑡−1> 𝑐 <𝑡> 𝑥 <𝑡> forget gate update gate tanh output gate 𝑎 <𝑡> softmax 𝑐 <𝑡> 𝑜 <𝑡> 𝑓 <𝑡> 𝑖 <𝑡> 𝑦 <𝑡> - * 𝑐 <𝑡> = Γ 𝑢 ∗ 𝑐 <𝑡> + Γ 𝑓 ∗ 𝑐 <𝑡−1> 𝑐 <𝑡> = tanh ( 𝑊 𝑐 𝑎 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑐 ) Γ 𝑢 = 𝜎 ( 𝑊 𝑢 𝑎 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑢 ) Γ 𝑓 = 𝜎 ( 𝑊 𝑓 𝑎 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑓 ) Γ 𝑜 = 𝜎 ( 𝑊 𝑜 𝑎 <𝑡−1> , 𝑥 <𝑡> + 𝑏 𝑜 ) 𝑎 <𝑡> = Γ 𝑜 ∗ 𝑐 <𝑡> 𝑐 <0> 𝑎 <0> 𝑐 <1> 𝑥 <1> 𝑎 <1> softmax 𝑦 <1> - * 𝑐 <1> 𝑎 <1> 𝑥 <2> 𝑎 <2> softmax 𝑦 <2> 𝑐 <2> - * 𝑐 <2> 𝑎 <2> 𝑥 <3> 𝑎 <3> softmax 𝑦 <3> 𝑐 <3> - *

46 Core idea of LSTM C= State
Using gates it can add or remove information to avoid the long term dependencies problem Bengio, et al. (1994) Ct-1 = State of time t-1 Ct = State of time t A gate controlled by  : The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero means “let nothing through,” while a value of one means “let everything through!” An LSTM has three of these gates, to protect and control the cell state =a sigmoid function. RNN, LSTM v.7.c

47 First step: forget gate layer
Decide what to throw away from the cell state “For the language model example.. the cell state might include the gender of the present subject, so that the correct pronouns can be used. When we see a new subject, we want to forget the gender of the old subject.” What to be kept/forget “It looks at ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct−1. A 1 represents “completely keep this” while a 0 represents “completely get rid of this.” ” RNN, LSTM v.7.c

48 Second step (a): input gate layer
Decide what information to store in the cell state What to be kept/forget New information added to become the state Ct “For the language model example .. In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the old one we’re forgetting.” “Next, a tanh layer creates a vector of new candidate values, ~Ct, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.” RNN, LSTM v.7.c

49 Second step (b): update the old cell state
“We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it ∗ ~Ct. This is the new candidate values, scaled by how much we decided to update each state value.” Ct-1  Ct “For the language model example.. this is where we’d actually drop the information about the old subject’s gender and add the new information, as we decided in the previous steps.” RNN, LSTM v.7.c

50 Third step: output layer
“Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.” Decide what to output (ht). “For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows next.” RNN, LSTM v.7.c

51 Size( Xt(nx1) append ht-1(mx1) )=(n+m)x1
X is of size nx1 h is of size mx1 Ct(mx1) Forget gate Ct-1(mx1) U(mx1) i(mx1) ot(mx1) ft(mx1) ht(mx1) ht-1(mx1) Size( Xt(nx1) append ht-1(mx1) )=(n+m)x1 X is of size nx1 RNN, LSTM v.7.c

52 Recurrent Neural Networks
deeplearning.ai Bidirectional RNN

53

54 Getting information from the future
He said, “Teddy bears are on sale!” He said, “Teddy Roosevelt was a great President!” 𝑦 <7> 𝑥 <2> 𝑥 <7> 𝑦 <1> 𝑦 <2> 𝑥 <3> 𝑦 <3> 𝑎 <0> 𝑥 <1> 𝑎 <1> 𝑎 <2> 𝑎 <3> 𝑎 <7> 𝑥 <5> 𝑦 <4> 𝑦 <5> 𝑥 <6> 𝑦 <6> 𝑎 <5> 𝑎 <6> 𝑎 <4> 𝑥 <4> He said, “Teddy bears are on sale!”

55

56

57 Bidirectional RNN (BRNN)

58

59


Download ppt "Recurrent Neural Networks"

Similar presentations


Ads by Google