Download presentation
Presentation is loading. Please wait.
1
Visual Question Generation
Jhih-Ciang Wu Institution of Information Science, Academia Sinica May. 8, 2018
2
Overview Backgrounds Baseline model References ILSVRC VGG RNN LSTM
CNN+RNN References
3
ILSVRC ImageNet Large Scale Visual Recognition Challenge.
In classfication task, we list winners over the years. AlexNet(2012) ZFNet(2013) VGGNet(2014 The second place) ResNet(2015) MaskRCNN(2017)
4
VGG VGG uses very small 3×3 filters in all convolutional layers.
5
VGG
6
RNN Recurrent Neural Network(RNN): allows it to exhibit dynamic temporal behavior.
7
LSTM Long Short-Term Memory(LSTM): a special kind of RNN, capable of learning long-term dependencies.
8
LSTM
9
LSTM
10
LSTM
11
LSTM
12
Baseline model
13
CNN+LSTM what color is the surfboard ?
∗learning rate = , batch = 64, epochs = 100.
14
CNN+LSTM is this a zebra ?
∗learning rate = , batch = 64, epochs = 100.
15
CNN+LSTM what color are the flowers ?
∗learning rate = , batch = 64, epochs = 100.
16
CNN+LSTM what is the green vegetable ?
∗learning rate = , batch = 64, epochs = 100.
17
CNN+LSTM how many people are in the picture ?
∗learning rate = , batch = 64, epochs = 100.
18
Modified MLP We use K-means method to separate training data into K clusters.
22
Reference Deep Visual-Semantic Alignments for Generating Image Descriptions Show and Tell: A Neural Image Caption Generator
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.