Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Slides:



Advertisements
Similar presentations
Introduction to Recurrent neural networks (RNN), Long short-term memory (LSTM) Wenjie Pei In this coffee talk, I would like to present you some basic.
Advertisements

Distributed Representations of Sentences and Documents
Kai Sheng-Tai, Richard Socher, Christopher D. Manning
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding.
Predicting the dropouts rate of online course using LSTM method
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Deep Learning Methods For Automated Discourse CIS 700-7
Neural networks and support vector machines
Deep Learning RUSSIR 2017 – Day 3
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Feedforward Networks
Deep Learning Amin Sobhani.
Natural Language and Text Processing Laboratory
Recursive Neural Networks
Recurrent Neural Networks for Natural Language Processing
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
Matt Gormley Lecture 16 October 24, 2016
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Visualizing and Understanding Neural Models in NLP
Deep Learning: Model Summary
Intro to NLP and Deep Learning
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Intro to NLP and Deep Learning
Different Units Ramakrishna Vedantam.
Neural networks (3) Regularization Autoencoder
Neural Networks 2 CS446 Machine Learning.
Shunyuan Zhang Nikhil Malik
Neural Networks and Backpropagation
Recursive Structure.
RNNs: Going Beyond the SRN in Language Prediction
A critical review of RNN for sequence learning Zachary C
Grid Long Short-Term Memory
RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect
Advanced Artificial Intelligence
A First Look at Music Composition using LSTM Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks (RNN)
CS 4501: Introduction to Computer Vision Training Neural Networks II
RNNs & LSTM Hadar Gorodissky Niv Haim.
Understanding LSTM Networks
ECE599/692 - Deep Learning Lecture 14 – Recurrent Neural Network (RNN)
Code Completion with Neural Attention and Pointer Networks
The Big Health Data–Intelligent Machine Paradox
Long Short Term Memory within Recurrent Neural Networks
Other Classification Models: Recurrent Neural Network (RNN)
Lecture 16: Recurrent Neural Networks (RNNs)
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
RNNs: Going Beyond the SRN in Language Prediction
实习生汇报 ——北邮 张安迪.
Please enjoy.
Meta Learning (Part 2): Gradient Descent as LSTM
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Lecture 21: Machine Learning Overview AP Computer Science Principles
Automatic Handwriting Generation
Introduction to Neural Networks
Recurrent Neural Networks
Deep learning: Recurrent Neural Networks CV192
Bidirectional LSTM-CRF Models for Sequence Tagging
LHC beam mode classification
Neural Machine Translation by Jointly Learning to Align and Translate
CS249: Neural Language Model
Lecture 9: Machine Learning Overview AP Computer Science Principles
Andrew Karl, Ph.D. James Wisnowski, Ph.D. Lambros Petropoulos
Presentation transcript:

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Lecture outline Part 1 – NLP and RNN Introduction “The Unreasonable Effectiveness of Recurrent Neural Networks” Basic Recurrent Neural Network NLP example Long Short Term Memory RNN’s Part 2 – Image Captioning Algorithms using RNN’s

The Unreasonable Effectiveness of Recurrent Neural Networks Taken from Andrej Karpathy’s blog So far – “Old School” Neural Networks – fixed length inputs and outputs RNN’s - operate over sequences of vectors (input or output) Image Captions Sentiment Analysis Machine Translation “Word Prediction”

The Unreasonable Effectiveness of Recurrent Neural Networks Algebraic Geometry-Latex

The Unreasonable Effectiveness of Recurrent Neural Networks Shakespeare:

Word Vectors Classical Word Representation is “one hot”: Each word is represented by a sparse vector

Word Vectors A more modern approach: Represent words in a dense vector ( ) “Semantically” close vectors are close In the Vector Space. Semantic Relations are preserved in Vector Space: “king”+”woman”-”man”=“queen”

Word Vectors A word Vector can be written as : where is a “one hot” vector, Beneficial for most Deep learning tasks

RNN– Language Model (Based on Richard Socher’s lecture – Deep Learning in NLP Stanford) A language model computes a probability for a sequence of words: Examples: Word ordering: Word Choice: Useful for machine translation and speech recognition

Recurrent Neural Networks Language Model Each output depends on all previous inputs

RNN– Language Model Input : Word Vectors – At each time, compute: Output:

Recurrent Neural Networks-Language Model Total Objective is to maximize the log-likelihood w.r.t parameters “one hot” vector containing the true word log-likelihood:

RNN’s – HARD TO TRAIN!

Vanishing/Exploding gradient problem For Stochastic Gradient Descent we calculate the derivative of the loss w.r.t the Parameters: Reminder: where: Applying Chain Rule:

Vanishing/Exploding gradient problem Update equation: By Chain rule:

Vanishing/Exploding gradient problem Gradients can be very large or very small – “small W” – vanishing gradient Long time dependencies “Large W” (bad for optimization)

LSTM’s Long Short term memory Invented in 1991 by Hochreiter and Schmidnbaur Solved vanishing and exploding gradients using gating Taken from Christopher Olah’s blog

LSTM’s Equations: Different notations. H(t) instead of y(t). C(t) instead of h(t)

LSTM’s “Forget Gate” Ft = 0 forget, ft = 1. keep Examples: Period “.” New Subject gender

LSTM’s “Input gate layer” What information ngoes into the new Candidate Ctilda. “input gate layer” decides which values we’ll update

LSTM’s Updating memory cell No longer Exp.: Information Can Flow: No exponential Ct = f(t-1)*f(t-2)*Ct-2

LSTM’s Finally, Setting the output we need to decide what we’re going to output

Conclusions RNN’s are very powerful RNN’s are hard to train Nowadays - gating (LSTM’s) is the way to go! Acknowledgments: Andrej Karpathy - http://karpathy.github.io/2015/05/21/rnn- effectiveness/ Richard Socher - http://cs224d.stanford.edu/ Christopher Olah - http://colah.github.io/posts/2015-08-Understanding- LSTMs/