Attention.

Slides:



Advertisements
Similar presentations
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Advertisements

Haitham Elmarakeby.  Speech recognition
Convolutional LSTM Networks for Subcellular Localization of Proteins
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
1 Deep Recurrent Neural Networks for Acoustic Modelling 2015/06/01 Ming-Han Yang William ChanIan Lane.
Attention Model in NLP Jichuan ZENG.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Deep Learning RUSSIR 2017 – Day 3
Convolutional Sequence to Sequence Learning
Deep Learning Methods For Automated Discourse CIS 700-7
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Recurrent Neural Networks for Natural Language Processing
Adversarial Learning for Neural Dialogue Generation
第 3 章 神经网络.
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
An Overview of Machine Translation
Deep Learning: Model Summary
Intro to NLP and Deep Learning
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Intro to NLP and Deep Learning
Neural Machine Translation By Learning to Jointly Align and Translate
Hybrid computing using a neural network with dynamic external memory
Please, Pay Attention, Neural Attention
Attention Is All You Need
RNNs: Going Beyond the SRN in Language Prediction
Advanced Recurrent Architectures
Attention-based Caption Description Mun Jonghwan.
Advanced Artificial Intelligence
Paraphrase Generation Using Deep Learning
RNNs & LSTM Hadar Gorodissky Niv Haim.
Understanding LSTM Networks
Word Embedding Word2Vec.
ECE599/692 - Deep Learning Lecture 14 – Recurrent Neural Network (RNN)
Code Completion with Neural Attention and Pointer Networks
Other Classification Models: Recurrent Neural Network (RNN)
Lecture 16: Recurrent Neural Networks (RNNs)
Machine Translation(MT)
Natural Language to SQL(nl2sql)
Report by: 陆纪圆.
RNNs: Going Beyond the SRN in Language Prediction
RNN Encoder-decoder Architecture
实习生汇报 ——北邮 张安迪.
Advances in Deep Audio and Audio-Visual Processing
Please enjoy.
LSTM: Long Short Term Memory
Attention for translation
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
The Updated experiment based on LSTM
Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.
Neural Machine Translation using CNN
Question Answering System
Neural Machine Translation
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Recurrent Neural Networks
Sequence-to-Sequence Models
CSC 578 Neural Networks and Deep Learning
Neural Machine Translation by Jointly Learning to Align and Translate
Listen Attend and Spell – a brief introduction
A Neural Network for Car-Passenger matching in Ride Hailing Services.
Presentation transcript:

Attention

Recurrent neural networks (RNNs) Idea: Input is processed sequentially, repeating the same operation. LSTM and GRU use gating variables to allow for modulation of dependence (forgetting) and easier backpropagation (assignment of credit). https://distill.pub/2016/augmented-rnns/

seq2seq Combine 2 RNNs to map from one sequence to another (e.g., for sentence translation). Encoder RNN compresses the input into a context vector (generalization of a word embedding). Decoder RNN takes the embedding and expands to produce the output. https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

seq2seq: problems Difficult to make long-range associations. Sequential input makes strongest association between nearest neighbors. Entire content is compressed to a single, usually fixed-length vector. Question: How can we learn to attend to relevant parts of the input when we need them for the output? This would help address both problems above.

Attention for translation Learn to encode multiple pieces of information and use them selectively for the output. Encode the input sentence into a sequence of vectors. Choose a subset of these adaptively while decoding (translating) – choose those vectors most relevant for current output. I.e., learn to jointly align and translate. Question: How can we learn and use a vector to decide where to focus attention? How can we make that differentiable to work with gradient descent? Bahdanau et al., 2015 : https://arxiv.org/pdf/1409.0473.pdf

Soft attention Use a probability distribution over all inputs. Classification assigned probability to all possible outputs Attention uses probability to weight all possible inputs – learn to weight more relevant parts more heavily. https://distill.pub/2016/augmented-rnns/

Soft attention Content-based attention Each position has a vector encoding content there. Dot product each with a query vector, then softmax. https://distill.pub/2016/augmented-rnns/

Soft attention Content-based attention allows the NN to associate relevant parts of the input with the current part of the output. Using dot products and softmax makes this attention framework differentiable – fits into the usual SGD learning mechanism.