Download presentation
Published byMae Booker Modified over 9 years ago
1
Introduction to Recurrent neural networks (RNN), Long short-term memory (LSTM)
Wenjie Pei In this coffee talk, I would like to present you some basic knowledge about the RNN and LSTM models. I was going to present a paper about RNN model, but I think it is better to give you a introduction to these models first and hence you can have a overview about these models.
2
Artificial Neural Networks
Feedforward neural networks ANNs without cycle connections between nodes (Feedback) Recurrent neural networks ANNs with cycle connections between nodes
3
Feedforward Neural Networks
Multilayer perceptron (MLP) Universal function approximation theory: Sufficient nonlinear hidden units approximate any continuous mapping function For each node here, it first calculate the weighted sum over all the inputs and then process it by the activation function. So can we see that this model is very powerful, right? But the Drawback is that it only Take into account Drawback: Output depends only on the current input No temporal information dependencies
4
Recurrent Neural Networks
Feedback from hidden unit activation of last time step to current time step Universal approximation theory: An RNN with sufficient hidden units Any measurable sequence-to-sequence mapping or dynamic system From MLP to RNNs is a transition from static process to dynamic process which takes into account the time dimension. Advantage: Memory of previous inputs Incorporate contextual information
5
Recurrent Neural Networks
Bidirectional RNNs
6
Recurrent Neural Networks
Vanishing gradient problem This model cannot have a longer memory 2 reasons: 1. activation function 2. diluted by other input Sensitivity decay exponentially over the time
7
Long Short-Term Memory (LSTM)
Input gate [0, 1]: How much information from input could go into the cell Forget gate [0, 1]: How much information from last time step could enter the cell Output gate [0, 1]: How much information to output In this model, it replaces the hidden layer by the memory blocks, in each block, there could be several memory cells. Here is the model for one cell. We can see that it has input and output and three gates: What is the roles of these three gates? All the values of these gate are from 0 to 1.
8
Long Short-Term Memory
Advantage: long-period time memory Conveyed without decay
9
Applications Applications to sequence labeling problems:
Handwritten character recognition Speech recognition Protein secondary structure prediction … Want to know more about the latest papers: Waiting for my next coffee talk
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.