mengye ren, ryan kiros, richard s. zemel

Slides:



Advertisements
Similar presentations
Deep Learning and Neural Nets Spring 2015
Advertisements

Distributed Representations of Sentences and Documents
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
Deep Residual Learning for Image Recognition
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
DeepWalk: Online Learning of Social Representations
A Hierarchical Deep Temporal Model for Group Activity Recognition
R-NET: Machine Reading Comprehension With Self-Matching Networks
Deep Residual Learning for Image Recognition
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Unsupervised Learning of Video Representations using LSTMs
CNN-RNN: A Unified Framework for Multi-label Image Classification
What Convnets Make for Image Captioning?
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
End-To-End Memory Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning for Bacteria Event Identification
A Deep Memory Network for Chinese Zero Pronoun Resolution
WAVENET: A GENERATIVE MODEL FOR RAW AUDIO
Energy models and Deep Belief Networks
High-level Cues for Predicting Motivations
Recursive Neural Networks
Syntax-based Deep Matching of Short Texts
Neural Machine Translation by Jointly Learning to Align and Translate
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Intro to NLP and Deep Learning
CSCI 5922 Neural Networks and Deep Learning: Image Captioning
Intelligent Information System Lab
Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Deep Residual Learning for Image Recognition
Word2Vec CS246 Junghoo “John” Cho.
Image Question Answering
Recursive Structure.
Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.
Distributed Representation of Words, Sentences and Paragraphs
Visual Question Generation
Introduction to Neural Networks
Paraphrase Generation Using Deep Learning
Word Embedding Word2Vec.
[Figure taken from googleblog
The Big Health Data–Intelligent Machine Paradox
Outline Background Motivation Proposed Model Experimental Results
Ying Dai Faculty of software and information science,
Word2Vec.
Abnormally Detection
Attention for translation
Learn to Comment Mentor: Mahdi M. Kalayeh
Unsupervised Perceptual Rewards For Imitation Learning
Sequence to Sequence Video to Text
Automatic Handwriting Generation
Visual Question Answering
Presented by: Anurag Paul
Deep Object Co-Segmentation
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Learning and Memorization
Presented By: Harshul Gupta
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Week 3 Volodymyr Bobyr.
Bidirectional LSTM-CRF Models for Sequence Tagging
LHC beam mode classification
CS249: Neural Language Model
Faithful Multimodal Explanation for VQA
Presentation transcript:

mengye ren, ryan kiros, richard s. zemel Exploring models and data for image question answering mengye ren, ryan kiros, richard s. zemel

1. Address the problem of image-based question- answering with new models and dataset 2. Assume that THE ANSWERS CONSIST OF ONLY A SINGLE WORD. TREAT THE PROBLEM AS A CLASSIFICATION PROBLEM 3. A GENERIC END-TO-END QA MODEL 4. AN AUTOMATIC QUESTION GENERATION ALGORITHM THAT CONVERTS DESCRIPTION SENTENCES INTO QUESTIONS 5. A new dataset(coco-qa) generated by the above algorithm 6. comparisons to other models and baselines Overview

datasets DATASETS Previous dataset(DAQUAR): Malinowski and Fritz released a dataset with images and question-answer pairs, the DAtaset for Question Answering on Real-world images(DAQUAR), containing about 1500 images. All images are indoor scenes. Most questions in this dataset can be divided into three types: object type, object color, and number of objects. All these types of questions can be answered with on word. COCO-QA: Containing about 78736 train images and 38948 test images. Difference: The dataset they generated has more images and more even distributed,

how to get the dataset datasets Using Stanford parser to obtain the syntactic structure of the original image description Compound sentences to simple sentences Indefinite determiners “a(n)” to definite determiners “the” Wh-movement constrains example: A man is riding a horse. -> What is the man riding? Question Generation: Object Questions; Number Questions; Color Questions; Location Questions

how to get the dataset datasets Post-Processing: Answers that appear less than a frequency threshold are discarded. Enroll a QA pair one at a time. The probability of enrolling the next QA pair(q,a) is: K <= K2, K = 100 and K2 = 200 in this paper

models vis+lstm model Use the last hidden layer of the 19-layer Oxford VGG Conv Net trained on ImageNet 2014 Challenge as visual embeddings Skip-gram Word Embedding Treat the image as one word of the question The LSTM(s) outputs are fed into a softmax layer at the last step to generate answers

skip-gram word embedding word embeddings skip-gram word embedding Represent a word: Vector Representation Dimension: 300-500 dimension, each dimension may represent certain feature of a word

skip-gram word embedding word embeddings skip-gram word embedding

RNNS(recurrent neural networks) AND LSTM(long short-term memory) models RNNS(recurrent neural networks) AND LSTM(long short-term memory) Problems: I grew up in France… I speak fluent French. Bengio, et al. (1994) http://colah.github.io/posts/2015-08-Understanding-LSTMs/

RNNS(recurrent neural networks) AND LSTM(long short-term memory) models RNNS(recurrent neural networks) AND LSTM(long short-term memory) http://colah.github.io/posts/2015-08-Understanding-LSTMs/

experimental results comparisons Performance Metrics: Plain answer accuracy as well as Wu-Palmer similarity measure WUPS: Example: hatch-back and compact vs hatch-back and truck

comparisons experimental results

limitations Limitations The answers only have one word. Types of questions. Accuracy.

Questions Weiqiang Jin