Attention.

Slides:

Advertisements

Similar presentations

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Advertisements

Haitham Elmarakeby.  Speech recognition

Convolutional LSTM Networks for Subcellular Localization of Proteins

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

1 Deep Recurrent Neural Networks for Acoustic Modelling 2015/06/01 Ming-Han Yang William ChanIan Lane.

Attention Model in NLP Jichuan ZENG.

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Deep Learning RUSSIR 2017 – Day 3

Convolutional Sequence to Sequence Learning

Deep Learning Methods For Automated Discourse CIS 700-7

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

RNNs: An example applied to the prediction task

End-To-End Memory Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Deep Learning Amin Sobhani.

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Recurrent Neural Networks for Natural Language Processing

Adversarial Learning for Neural Dialogue Generation

第 3 章神经网络.

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

An Overview of Machine Translation

Deep Learning: Model Summary

Intro to NLP and Deep Learning

ICS 491 Big Data Analytics Fall 2017 Deep Learning

Intelligent Information System Lab

Intro to NLP and Deep Learning

Neural Machine Translation By Learning to Jointly Align and Translate

Hybrid computing using a neural network with dynamic external memory

Please, Pay Attention, Neural Attention

Attention Is All You Need

RNNs: Going Beyond the SRN in Language Prediction

Advanced Recurrent Architectures

Attention-based Caption Description Mun Jonghwan.

Advanced Artificial Intelligence

Paraphrase Generation Using Deep Learning

RNNs & LSTM Hadar Gorodissky Niv Haim.

Understanding LSTM Networks

Word Embedding Word2Vec.

ECE599/692 - Deep Learning Lecture 14 – Recurrent Neural Network (RNN)

Code Completion with Neural Attention and Pointer Networks

Other Classification Models: Recurrent Neural Network (RNN)

Lecture 16: Recurrent Neural Networks (RNNs)

Machine Translation(MT)

Natural Language to SQL(nl2sql)

Report by: 陆纪圆.

RNNs: Going Beyond the SRN in Language Prediction

RNN Encoder-decoder Architecture

实习生汇报 ——北邮张安迪.

Advances in Deep Audio and Audio-Visual Processing

LSTM: Long Short Term Memory

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

The Updated experiment based on LSTM

Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.

Neural Machine Translation using CNN

Question Answering System

Neural Machine Translation

Baseline Model CSV Files Pandas DataFrame Sentence Lists

Recurrent Neural Networks

Sequence-to-Sequence Models

CSC 578 Neural Networks and Deep Learning

Neural Machine Translation by Jointly Learning to Align and Translate

Listen Attend and Spell – a brief introduction

A Neural Network for Car-Passenger matching in Ride Hailing Services.

Presentation transcript:

Attention

Recurrent neural networks (RNNs) Idea: Input is processed sequentially, repeating the same operation. LSTM and GRU use gating variables to allow for modulation of dependence (forgetting) and easier backpropagation (assignment of credit). https://distill.pub/2016/augmented-rnns/

seq2seq Combine 2 RNNs to map from one sequence to another (e.g., for sentence translation). Encoder RNN compresses the input into a context vector (generalization of a word embedding). Decoder RNN takes the embedding and expands to produce the output. https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

seq2seq: problems Difficult to make long-range associations. Sequential input makes strongest association between nearest neighbors. Entire content is compressed to a single, usually fixed-length vector. Question: How can we learn to attend to relevant parts of the input when we need them for the output? This would help address both problems above.

Attention for translation Learn to encode multiple pieces of information and use them selectively for the output. Encode the input sentence into a sequence of vectors. Choose a subset of these adaptively while decoding (translating) – choose those vectors most relevant for current output. I.e., learn to jointly align and translate. Question: How can we learn and use a vector to decide where to focus attention? How can we make that differentiable to work with gradient descent? Bahdanau et al., 2015 : https://arxiv.org/pdf/1409.0473.pdf

Soft attention Use a probability distribution over all inputs. Classification assigned probability to all possible outputs Attention uses probability to weight all possible inputs – learn to weight more relevant parts more heavily. https://distill.pub/2016/augmented-rnns/

Soft attention Content-based attention Each position has a vector encoding content there. Dot product each with a query vector, then softmax. https://distill.pub/2016/augmented-rnns/

Soft attention Content-based attention allows the NN to associate relevant parts of the input with the current part of the output. Using dot products and softmax makes this attention framework differentiable – fits into the usual SGD learning mechanism.