CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Slides:



Advertisements
Similar presentations
Deep Learning and Neural Nets Spring 2015
Advertisements

Deep Learning Neural Network with Memory (1)
Haitham Elmarakeby.  Speech recognition
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.
Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Attention Model in NLP Jichuan ZENG.
Prof. Adriana Kovashka University of Pittsburgh March 28, 2017
R-NET: Machine Reading Comprehension With Self-Matching Networks
CS 1674: Intro to Computer Vision Recurrent Neural Networks
Deep Learning RUSSIR 2017 – Day 3
Convolutional Sequence to Sequence Learning
Unsupervised Learning of Video Representations using LSTMs
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
CNN-RNN: A Unified Framework for Multi-label Image Classification
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Dhruv Batra Georgia Tech
Recurrent Neural Networks for Natural Language Processing
Neural Machine Translation by Jointly Learning to Align and Translate
CS 2750: Machine Learning Recurrent Neural Networks
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
Lecture 24: Convolutional neural networks
ICS 491 Big Data Analytics Fall 2017 Deep Learning
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
Intelligent Information System Lab
Synthesis of X-ray Projections via Deep Learning
mengye ren, ryan kiros, richard s. zemel
Attention Is All You Need
RNNs: Going Beyond the SRN in Language Prediction
Attention-based Caption Description Mun Jonghwan.
Introduction to Neural Networks
RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect
Advanced Artificial Intelligence
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Vessel Extraction in X-Ray Angiograms Using Deep Learning
CS 4501: Introduction to Computer Vision Training Neural Networks II
Final Presentation: Neural Network Doc Summarization
Introduction to Text Generation
Understanding LSTM Networks
[Figure taken from googleblog
Code Completion with Neural Attention and Pointer Networks
The Big Health Data–Intelligent Machine Paradox
Lecture 16: Recurrent Neural Networks (RNNs)
Machine Translation(MT)
Attention.
Ali Hakimi Parizi, Paul Cook
Please enjoy.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
RNNs and Sequence to sequence models
Lecture 21: Machine Learning Overview AP Computer Science Principles
Jointly Generating Captions to Aid Visual Question Answering
Recurrent Neural Networks (RNNs)
Sequence to Sequence Video to Text
Presented by: Anurag Paul
Neural Machine Translation using CNN
Presented By: Harshul Gupta
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Recurrent Neural Networks
Sequence-to-Sequence Models
Deep learning: Recurrent Neural Networks CV192
Bidirectional LSTM-CRF Models for Sequence Tagging
LHC beam mode classification
Neural Machine Translation by Jointly Learning to Align and Translate
A Neural Network for Car-Passenger matching in Ride Hailing Services.
Lecture 9: Machine Learning Overview AP Computer Science Principles
Presentation transcript:

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson

Faculty Who Work on Natural Language Kai-Wei Chang Vicente Ordonez Yanjun Qi Hongning Wang

Outline Image captioning Visual question answering

Previous Classes: Simple Recurrent Neural Network Outputs Hidden State (Memory) Inputs Diagrams from Christopher Olah

Previous Classes: Recurrent Neural Networks (LSTM) Gates: forget, input, output Diagrams from Christopher Olah

Previous Class: Recurrent Neural Networks Text synthesis

Image Captioning [Vinyals et al. 2014] “Show and Tell: A Neural Image Caption Generator:”, Vinyals et al. 2014 Applications? (Class discussion)

Image Captioning [Vinyals et al. 2014] Inspired by neural machine translation. English => Encoder RNN => hidden state => decoder RNN => French Image captioning: replace encoder with CNN. Image => Encoder CNN => decoder RNN => text caption. Trained end-to-end (optimize parameters of CNN, RNN).

Image Captioning [Vinyals et al. 2014] Predict current word (St) given previous words and image: End of sentence: Special “stop” word 512 dimensions Word embeddings (linear) Special “start” word

One-hot Encoding Uses one-hot encoding of words Construct a dictionary (e.g. n most common words) Associate a length n indicator vector to each word: Word Encoding “the” 1000 “dog” 0100 “fruit” 0010 “penguin” 0001

Training [Vinyals et al. 2014] Unroll across time (backpropagation through time) Use ImageNet-pretrained neural network Fine-tune last CNN layer, all LSTM weights, word embeddings.

Inference [Vinyals et al. 2014] Sampling: LSTM outputs probabilities (via softmax), so just sample the next word according to those probabilities. Beam search: Maintain a set Pt of top k predicted sentences at time t, consider all possible one-word extensions of Pt, sort these by probability of the sentence, take the top k to obtain Pt+1. Greedy search: k = 1 Probability of the sentence: Authors found best results for beam search with k = 20.

Results [Vinyals et al. 2014]

Evaluation [Vinyals et al. 2014] BLEU metric: BiLinear Evaluation Understudy. BLEU has high correlation with human judgements of quality.

Outline Image captioning Visual question answering

Visual Question Answering Demo http://visualqa.csail.mit.edu/

Visual Question Answering: Applications Assist visually impaired Search surveillance video Move towards “AI-complete” understanding of images Any others? (Class discussion)

Visual Question Answering: iBOWIMG Baseline Architecture

Visual Question Answering: iBOWIMG Baseline Examples

Visual Question Answering: iBOWIMG Baseline Importance of words and image regions

Visual Question Answering: iBOWIMG Baseline Performance on COCO VQA

Visual Question Answering Challenges+Datasets VQA 2016 Challenge Competition Results VQA v2.0 Browse Dataset