Visual Question Generation

Visual Question Generation
Jhih-Ciang Wu Institution of Information Science, Academia Sinica May. 8, 2018

Overview Backgrounds Baseline model References ILSVRC VGG RNN LSTM
CNN+RNN References

ILSVRC ImageNet Large Scale Visual Recognition Challenge.
In classfication task, we list winners over the years. AlexNet(2012) ZFNet(2013) VGGNet(2014 The second place) ResNet(2015) MaskRCNN(2017)

VGG VGG uses very small 3×3 filters in all convolutional layers.

RNN Recurrent Neural Network(RNN): allows it to exhibit dynamic temporal behavior.

LSTM Long Short-Term Memory(LSTM): a special kind of RNN, capable of learning long-term dependencies.

Baseline model

CNN+LSTM what color is the surfboard ?
∗learning rate = , batch = 64, epochs = 100.

CNN+LSTM is this a zebra ?

CNN+LSTM what color are the flowers ?

CNN+LSTM what is the green vegetable ?

CNN+LSTM how many people are in the picture ?

Modified MLP We use K-means method to separate training data into K clusters.

Reference Deep Visual-Semantic Alignments for Generating Image Descriptions Show and Tell: A Neural Image Caption Generator

Visual Question Generation

Similar presentations

Presentation on theme: "Visual Question Generation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Visual Question Generation

Similar presentations

Presentation on theme: "Visual Question Generation"— Presentation transcript:

Similar presentations

About project

Feedback