Download presentation
Presentation is loading. Please wait.
Published byColin Ramsey Modified over 6 years ago
1
Overview of Challenge Aishwarya Agrawal (Virginia Tech)
Stanislaw Antol (Virginia Tech) Larry Zitnick (Facebook AI Research) Dhruv Batra (Virginia Tech) Devi Parikh (Virginia Tech)
2
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
3
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
4
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
5
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
6
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
7
VQA Task
8
VQA Task What is the mustache made of?
9
VQA Task AI System What is the mustache made of?
10
VQA Task AI System bananas What is the mustache made of?
11
Real images (from COCO)
Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext.” ECCV 2014.
12
and abstract scenes.
13
Questions Stump a smart robot!
Ask a question that a human can answer, but a smart robot probably can’t!
14
VQA Dataset
15
Dataset Stats >250K images (COCO + 50K Abstract Scenes)
>750K questions (3 per image) ~10M answers (10 w/ image + 3 w/o image)
16
Two modalities of answering
Open Ended Multiple Choice (18 choices) 1 correct answer 3 plausible choices 10 most popular answers Rest random answers
17
Accuracy Metric
18
Human Accuracy (Real) Overall Yes/No Number Other Open Ended 83.30
95.77 83.39 72.67 Multiple Choice 91.54 97.40 86.97 87.91
19
Human Accuracy (Real) Overall Yes/No Number Other Open Ended 83.30
95.77 83.39 72.67 Multiple Choice 91.54 97.40 86.97 87.91
20
Human Accuracy (Abstract)
Overall Yes/No Number Other Open Ended 87.49 95.96 95.04 75.33 Multiple Choice 93.57 97.78 96.71 88.73
21
Human Accuracy (Abstract)
Overall Yes/No Number Other Open Ended 87.49 95.96 95.04 75.33 Multiple Choice 93.57 97.78 96.71 88.73
22
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
23
VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice
24
VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice
25
Real Image Challenges: Dataset
Images Questions Answers Training 80K 240K 2.4M Dataset size is approximate
26
Real Image Challenges: Dataset
Images Questions Answers Training 80K 240K 2.4M Validation 40K 120K 1.2M Dataset size is approximate
27
Real Image Challenges: Dataset
Images Questions Answers Training 80K 240K 2.4M Validation 40K 120K 1.2M Test Dataset size is approximate
28
Real Image Challenges: Test Dataset
80K test images Four splits of 20K images each Test-dev (development) Debugging and Validation - unlimited submission to the evaluation server. Test-standard (publications) Used to score entries for the Public Leaderboard. Test-challenge (competitions) Used to rank challenge participants. Test-reserve (check overfitting) Used to estimate overfitting. Scores on this set are never released. Dataset size is approximate Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015
29
VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice
30
Abstract Scene Challenges: Dataset
Images Questions Answers Training 20K 60K 0.6M
31
Abstract Scene Challenges: Dataset
Images Questions Answers Training 20K 60K 0.6M Validation 10K 30K 0.3M
32
Abstract Scene Challenges: Dataset
Images Questions Answers Training 20K 60K 0.6M Validation 10K 30K 0.3M Test
33
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
34
Award GPUs!!!
35
Abstract Scene Challenges
Open-Ended Challenge 5 teams 5 institutions 3 countries Multiple-Choice Challenge 4 teams 4 institutions Top 3 teams are same for Open Ended and Multiple Choice
36
Abstract Scene Challenges
Winner Team MIL-UT Andrew Shin* Kuniaki Saito* Yoshitaka Ushiku Tatsuya Harada Open Ended Challenge Accuracy: 67.39 Multiple Choice Challenge Accuracy: 71.18
37
Real Image Challenges Open-Ended Challenge Multiple-Choice Challenge
25 teams 26 institutions 8 countries Multiple-Choice Challenge 15 teams 17 institutions 6 countries Top 5 teams are same for Open Ended and Multiple Choice
38
Real Image Challenges Honorable Mention Brandeis Aaditya Prakash
Open Ended Challenge Accuracy: 62.80 Multiple Choice Challenge Accuracy: 65.17
39
Real Image Challenges Runner-Up Team Naver Labs Hyeonseob Nam
Jeonghee Kim Open Ended Challenge Accuracy: 64.89 Multiple Choice Challenge Accuracy: 69.37
40
Real Image Challenges Winner Team Open Ended Challenge Accuracy: 66.90
UC Berkeley & Sony Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach Open Ended Challenge Accuracy: 66.90 Multiple Choice Challenge Accuracy: 70.52
41
Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results
42
Real Open-Ended Challenge
arXiv v6 ICCV15
43
Real Open-Ended Challenge
+12.76% absolute
44
Statistical Significance
Bootstrap samples 5000 times @ 99% confidence
45
Real Open-Ended Challenge
46
Easy vs. Difficult Questions (Real Open-Ended Challenge)
47
Easy vs. Difficult Questions (Real Open-Ended Challenge)
48
Easy vs. Difficult Questions (Real Open-Ended Challenge)
80.6% of questions can be answered by at least 1 method! Difficult Questions
49
Easy vs. Difficult Questions (Real Open-Ended Challenge)
Easy Questions Difficult Questions
50
Difficult Questions with Rare Answers
51
Difficult Questions with Rare Answers
What is the name of … What is the number on … What is written on the … What does the sign say? What time is it? What kind of … What type of … Why …
52
Easy vs. Difficult Questions (Real Open-Ended Challenge)
53
Easy vs. Difficult Questions (Real Open-Ended Challenge)
with Frequent Answers Easy Questions
54
Success Cases Q: What is the woman holding? GT A: laptop
Machine A: laptop Q: What room is the cat located in? GT A: kitchen Machine A: kitchen Q: Is it going to rain soon? GT A: yes Machine A: yes Q: Is this a casino? GT A: no Machine A: no
55
Failure Cases Q: What is the woman holding? GT A: book
Machine A: knife Q: Why is there snow on one side of the stream and clear grass on the other? GT A: shade Machine A: yes Q: Where is the blue and white umbrella? GT A: on left Machine A: right Q: Is the hydrant painted a new color? GT A: yes Machine A: no
56
Easy vs. Difficult Questions (Real Open-Ended Challenge)
Easy Questions Difficult Questions
57
Easy vs. Difficult Questions (Real Open-Ended Challenge)
58
Answer Type and Question Type Analyses
Per Answer Type No team statistically significantly better than winner Per Question Type
59
Results of the Poll 25 responses
60
Image Modelling
61
Question Modelling
62
Question Word Modelling
63
Attention on Images
64
Attention on Questions
65
Use of Ensemble
66
Use of External Data Sources
67
Question Type Specific Mechanisms
68
Classification vs. Generation of Answers
69
Future Plans VQA Challenge 2017? What changes do you want? Sub tasks?
More difficult/easy dataset? Dialogue/conversational QA? New evaluation metric? Other annotations?
70
Thanks! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.