Visual Question Answering

Visual Question Answering
Aaron Honculada Aisha Urooj Dr. Mubarak Shah, Dr. Niels Lobo

TVQA Dataset 460 hours of video 152,545 Question and Answer Pairs
21,793 clips (60-90 sec) Multimodal Compositionality Video-QA Associated natural language (subtitles)

Questions Main Question part Grounding part Each clip has 7 questions
Temporal Localization Each clip has 7 questions Each question has 5 multiple choice answers

TVQA Subtitles Visual Concepts Video Features Object detection
Concatenate Remove duplicates Video Features ResNet

Model Used

Baseline Models LSTM BiLSTM

Baseline Models Baseline CNN+LSTM

Results Model Used TVQA + S Accuracy (%) Reported 65.15% Replication
65.74%

Results Model Used TVQA + S TVQA + V Accuracy (%) Reported 65.15%
45.03% Replication 65.74% 45.25%

Results Model Used TVQA + S TVQA + V TVQA + IMG Accuracy (%) Reported
65.15% 45.03% 43.78% Replication 65.74% 45.25% 44.42%

Results Model Used TVQA + S TVQA + V TVQA + IMG TVQA + V + IMG
Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52%

Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q LSTM 42.74% BiLSTM 42.48%

Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q S + Q LSTM 42.74% 42.71% BiLSTM 42.48% 42.67%

Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q S + Q V + Q LSTM 42.74% 42.71% 42.61% BiLSTM 42.48% 42.67%

Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q S + Q V + Q S + V + Q LSTM 42.74% 42.71% 42.61% 42.39% BiLSTM 42.48% 42.67% 42.84%

Accuracy (%) Reported 65.15% 45.03% 43.78% N/A Replication 65.74% 45.25% 44.42% 45.52% Q S + Q V + Q (FC) V + Q S + V + Q LSTM 42.74% 42.71% 42.61% 42.85% 42.39% BiLSTM 42.48% 42.67% 42.86% 42.84%

Results

Summary and Next Steps Reproduced Results Baseline Results
Look into network mistakes and address them Main Goal: Boost Performance Using Visual Cues effectively

Visual Question Answering

Similar presentations

Presentation on theme: "Visual Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Visual Question Answering

Similar presentations

Presentation on theme: "Visual Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback