Download presentation
Presentation is loading. Please wait.
1
Attention
2
Recurrent neural networks (RNNs)
Idea: Input is processed sequentially, repeating the same operation. LSTM and GRU use gating variables to allow for modulation of dependence (forgetting) and easier backpropagation (assignment of credit).
3
seq2seq Combine 2 RNNs to map from one sequence to another (e.g., for sentence translation). Encoder RNN compresses the input into a context vector (generalization of a word embedding). Decoder RNN takes the embedding and expands to produce the output.
4
seq2seq: problems Difficult to make long-range associations.
Sequential input makes strongest association between nearest neighbors. Entire content is compressed to a single, usually fixed-length vector. Question: How can we learn to attend to relevant parts of the input when we need them for the output? This would help address both problems above.
5
Attention for translation
Learn to encode multiple pieces of information and use them selectively for the output. Encode the input sentence into a sequence of vectors. Choose a subset of these adaptively while decoding (translating) – choose those vectors most relevant for current output. I.e., learn to jointly align and translate. Question: How can we learn and use a vector to decide where to focus attention? How can we make that differentiable to work with gradient descent? Bahdanau et al., 2015 :
6
Soft attention Use a probability distribution over all inputs.
Classification assigned probability to all possible outputs Attention uses probability to weight all possible inputs – learn to weight more relevant parts more heavily.
7
Soft attention Content-based attention
Each position has a vector encoding content there. Dot product each with a query vector, then softmax.
8
Soft attention Content-based attention allows the NN to associate relevant parts of the input with the current part of the output. Using dot products and softmax makes this attention framework differentiable – fits into the usual SGD learning mechanism.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.