CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Slides:

Advertisements

Similar presentations

Deep Learning and Neural Nets Spring 2015

Advertisements

Deep Learning Neural Network with Memory (1)

Haitham Elmarakeby.  Speech recognition

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.

Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer

Attention Model in NLP Jichuan ZENG.

Prof. Adriana Kovashka University of Pittsburgh March 28, 2017

R-NET: Machine Reading Comprehension With Self-Matching Networks

CS 1674: Intro to Computer Vision Recurrent Neural Networks

Deep Learning RUSSIR 2017 – Day 3

Convolutional Sequence to Sequence Learning

Unsupervised Learning of Video Representations using LSTMs

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

End-To-End Memory Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Dhruv Batra Georgia Tech

Recurrent Neural Networks for Natural Language Processing

Neural Machine Translation by Jointly Learning to Align and Translate

CS 2750: Machine Learning Recurrent Neural Networks

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

Lecture 24: Convolutional neural networks

ICS 491 Big Data Analytics Fall 2017 Deep Learning

CSCI 5922 Neural Networks and Deep Learning: Image Captioning

Intelligent Information System Lab

Synthesis of X-ray Projections via Deep Learning

mengye ren, ryan kiros, richard s. zemel

Attention Is All You Need

RNNs: Going Beyond the SRN in Language Prediction

Attention-based Caption Description Mun Jonghwan.

Introduction to Neural Networks

RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect

Advanced Artificial Intelligence

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Vessel Extraction in X-Ray Angiograms Using Deep Learning

CS 4501: Introduction to Computer Vision Training Neural Networks II

Final Presentation: Neural Network Doc Summarization

Introduction to Text Generation

Understanding LSTM Networks

[Figure taken from googleblog

Code Completion with Neural Attention and Pointer Networks

The Big Health Data–Intelligent Machine Paradox

Lecture 16: Recurrent Neural Networks (RNNs)

Machine Translation(MT)

Ali Hakimi Parizi, Paul Cook

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

RNNs and Sequence to sequence models

Lecture 21: Machine Learning Overview AP Computer Science Principles

Jointly Generating Captions to Aid Visual Question Answering

Recurrent Neural Networks (RNNs)

Sequence to Sequence Video to Text

Presented by: Anurag Paul

Neural Machine Translation using CNN

Presented By: Harshul Gupta

Baseline Model CSV Files Pandas DataFrame Sentence Lists

Recurrent Neural Networks

Sequence-to-Sequence Models

Deep learning: Recurrent Neural Networks CV192

Bidirectional LSTM-CRF Models for Sequence Tagging

LHC beam mode classification

Neural Machine Translation by Jointly Learning to Align and Translate

A Neural Network for Car-Passenger matching in Ride Hailing Services.

Lecture 9: Machine Learning Overview AP Computer Science Principles

Presentation transcript:

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson

Faculty Who Work on Natural Language Kai-Wei Chang Vicente Ordonez Yanjun Qi Hongning Wang

Outline Image captioning Visual question answering

Previous Classes: Simple Recurrent Neural Network Outputs Hidden State (Memory) Inputs Diagrams from Christopher Olah

Previous Classes: Recurrent Neural Networks (LSTM) Gates: forget, input, output Diagrams from Christopher Olah

Previous Class: Recurrent Neural Networks Text synthesis

Image Captioning [Vinyals et al. 2014] “Show and Tell: A Neural Image Caption Generator:”, Vinyals et al. 2014 Applications? (Class discussion)

Image Captioning [Vinyals et al. 2014] Inspired by neural machine translation. English => Encoder RNN => hidden state => decoder RNN => French Image captioning: replace encoder with CNN. Image => Encoder CNN => decoder RNN => text caption. Trained end-to-end (optimize parameters of CNN, RNN).

Image Captioning [Vinyals et al. 2014] Predict current word (St) given previous words and image: End of sentence: Special “stop” word 512 dimensions Word embeddings (linear) Special “start” word

One-hot Encoding Uses one-hot encoding of words Construct a dictionary (e.g. n most common words) Associate a length n indicator vector to each word: Word Encoding “the” 1000 “dog” 0100 “fruit” 0010 “penguin” 0001

Training [Vinyals et al. 2014] Unroll across time (backpropagation through time) Use ImageNet-pretrained neural network Fine-tune last CNN layer, all LSTM weights, word embeddings.

Inference [Vinyals et al. 2014] Sampling: LSTM outputs probabilities (via softmax), so just sample the next word according to those probabilities. Beam search: Maintain a set Pt of top k predicted sentences at time t, consider all possible one-word extensions of Pt, sort these by probability of the sentence, take the top k to obtain Pt+1. Greedy search: k = 1 Probability of the sentence: Authors found best results for beam search with k = 20.

Results [Vinyals et al. 2014]

Evaluation [Vinyals et al. 2014] BLEU metric: BiLinear Evaluation Understudy. BLEU has high correlation with human judgements of quality.

Outline Image captioning Visual question answering

Visual Question Answering Demo http://visualqa.csail.mit.edu/

Visual Question Answering: Applications Assist visually impaired Search surveillance video Move towards “AI-complete” understanding of images Any others? (Class discussion)

Visual Question Answering: iBOWIMG Baseline Architecture

Visual Question Answering: iBOWIMG Baseline Examples

Visual Question Answering: iBOWIMG Baseline Importance of words and image regions

Visual Question Answering: iBOWIMG Baseline Performance on COCO VQA

Visual Question Answering Challenges+Datasets VQA 2016 Challenge Competition Results VQA v2.0 Browse Dataset