Introduction to Text Generation

Slides:

Advertisements

Similar presentations

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Advertisements

Deep Learning and Neural Nets Spring 2015

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Haitham Elmarakeby.  Speech recognition

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer

Attention Model in NLP Jichuan ZENG.

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Convolutional Sequence to Sequence Learning

Statistical Machine Translation Part II: Word Alignments and EM

Unsupervised Learning of Video Representations using LSTMs

Gist of Achieving Human Parity in Conversational Speech Recognition

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

Learning to Compare Image Patches via Convolutional Neural Networks

SUNY Korea BioData Mining Lab - Journal Review

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Automatic Lung Cancer Diagnosis from CT Scans (Week 2)

Recurrent Neural Networks for Natural Language Processing

Adversarial Learning for Neural Dialogue Generation

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

An Overview of Machine Translation

AlphaGo with Deep RL Alpha GO.

Intro to NLP and Deep Learning

Intelligent Information System Lab

Shunyuan Zhang Nikhil Malik

Reinforcement Learning

Attention Is All You Need

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

Deep Learning based Machine Translation

Attention-based Caption Description Mun Jonghwan.

Variational Knowledge Graph Reasoning

RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect

Paraphrase Generation Using Deep Learning

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Vessel Extraction in X-Ray Angiograms Using Deep Learning

Final Presentation: Neural Network Doc Summarization

The Big Health Data–Intelligent Machine Paradox

Neural Networks Geoff Hulten.

Neural Speech Synthesis with Transformer Network

Tuning CNN: Tips & Tricks

Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions

Report by: 陆纪圆.

Sun Yat-sen University

实习生汇报 ——北邮张安迪.

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Advances in Deep Audio and Audio-Visual Processing

Meta Learning (Part 2): Gradient Descent as LSTM

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

Learn to Comment Mentor: Mahdi M. Kalayeh

Jointly Generating Captions to Aid Visual Question Answering

Sequence to Sequence Video to Text

Automatic Handwriting Generation

Presented by: Anurag Paul

Neural Machine Translation using CNN

Neural Machine Translation

Presented By: Harshul Gupta

Sequence-to-Sequence Models

Week 7 Presentation Ngoc Ta Aidean Sharghi

CSC 578 Neural Networks and Deep Learning

Neural Machine Translation by Jointly Learning to Align and Translate

Visual Grounding.

CRCV REU 2019 Aaron Honculada.

Reinforcement Learning

Presentation transcript:

Introduction to Text Generation 杨润琦

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Text Generation A special case of sequence generation:

Loss Negative Log Likelihood/Perplexity 前三个主要是ngram，第四个加入了IDF，第五个考虑了parse tree

Applications Translation Image captioning Summarization Chatbot …… End to end, more expressive

Basic Architecture: seq2seq https://www.tensorflow.org/tutorials/seq2seq

Basic Architecture: im2txt https://github.com/tensorflow/models/tree/master/research/im2txt

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Evaluation of text generation Similarity: generation vs (multiple) reference Translation, image captioning, summarization… Related, diverse and interesting Conversation, news commenting, image commenting…

Similarity Metrics BLEU: Bilingual Evaluation Understudy ROUGE: Recall-Oriented Understudy for Gisting Evaluation METEOR: Metric for Evaluation of Translation with Explicit Ordering CIDEr: Consensus-based Image Description Evaluation SPICE: Semantic Propositional Image Caption Evaluation EmbSim, WMD, … : Embedding based similarity … 前三个主要是ngram，第四个加入了IDF，第五个考虑了parse tree

Diversity Coherence Self-BLEU Fraction of distinct unigrams and bigrams Coherence

The Only Reliable Evaluation Metrics Human Evaluation

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions 大体设计思路和方向

Limitations of RNN-based models Slow due to sequential nature Can’t build very deep LSTMs due to optimization unstability -> capacity of learning is limited

Convolutional seq2seq 20 layers Padding of decoder Positional encoding Convolutional Sequence to Sequence Learning. https://arxiv.org/abs/1705.03122

Transformer 6 layers *2/ *3 Postional encoding Attention is All You Need. https://arxiv.org/abs/1706.03762

Knowing when to look CNN encoder + RNN decoder Selective attention Image text only (sentinel) Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. https://arxiv.org/abs/1612.01887

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Greedy search https://www.tensorflow.org/tutorials/seq2seq

Beam search Store best N hypothesis (N: beam size) Between greedy search and breadth-first search Implementation is REALLY difficult!! Batched beam search Tokens, scores & states reordering Active & finished hypothesis Length normalization Early stopping or not http://opennmt.net/OpenNMT/translation/beam_search/

Beam size selection Larger is not always better! Small beam size serves as a way of regularization

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Constrained beam search Length control Only accept </eos> in the given range of steps Forbidden words Apply word penalty when scoring hypotheses Fewer duplicated words Apply duplication penalty when scoring hypotheses Do not penalize function words (a, the, of …)

Constrained beam search Suggested words: A finite-state machine Words expanded with WordNet lemmas Guided Open Vocabulary Image Captioning with Constrained Beam Search. https://arxiv.org/abs/1612.00576

Template generation Neural baby talk: Generate template with slots Switch words/slots by attention with sentinel Tags can be further processed Neural Baby Talk. https://arxiv.org/abs/1803.09845

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Diversity Promoting Beam Search Penalize siblings (hypotheses in the same beam) Penalty value can be optimized for each instance by reinforcement learning A Simple, Fast Diverse Decoding Algorithm for Neural Generation. https://arxiv.org/pdf/1611.08562.pdf

GAN for text GAN is not directly applicable for text generation argmax in decoding is not differentiable Attempts GAN + reinforcement learning = SeqGAN GAN + auto-encoder = ARAE GAN + approximate embedding = GAN-AEL SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. http://www.aaai.org/Conferences/AAAI/2017/PreliminaryPapers/12-Yu-L-14344.pdf Adversarially Regularized Autoencoders. https://arxiv.org/abs/1706.04223 Neural Response Generation via GAN with an Approximate Embedding Layer. http://aclweb.org/anthology/D17-1065

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Exposure Bias Alleviation Training: ground truth tokens Inference: tokens generated by the model itself Text GAN doesn’t suffer from this problem Scheduled sampling A curriculum learning approach Replace “true” previous tokens by generated ones with a increasing probability Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. https://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks.pdf

Overview Text generation basics Major problems and progress Definition and basic architectures Evaluation metrics Major problems and progress Better network architecture From greedy search to beam search Generation control based on prior knowledge Diversity Enhancement Exposure bias alleviation Tough problems and future directions

Future directions Reliable automatic evaluation Generation with memory for few-shot learning End2end (Long) passage/story generation …

Thanks for listening! Q&A