Overview of Challenge Aishwarya Agrawal (Virginia Tech)

Overview of Challenge Aishwarya Agrawal (Virginia Tech)
Stanislaw Antol (Virginia Tech) Larry Zitnick (Facebook AI Research) Dhruv Batra (Virginia Tech) Devi Parikh (Virginia Tech)

Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

VQA Task

VQA Task What is the mustache made of?

VQA Task AI System What is the mustache made of?

VQA Task AI System bananas What is the mustache made of?

Real images (from COCO)
Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext.” ECCV 2014.

and abstract scenes.

Questions Stump a smart robot!
Ask a question that a human can answer, but a smart robot probably can’t!

VQA Dataset

Dataset Stats >250K images (COCO + 50K Abstract Scenes)
>750K questions (3 per image) ~10M answers (10 w/ image + 3 w/o image)

Two modalities of answering
Open Ended Multiple Choice (18 choices) 1 correct answer 3 plausible choices 10 most popular answers Rest random answers

Accuracy Metric

Human Accuracy (Real) Overall Yes/No Number Other Open Ended 83.30
95.77 83.39 72.67 Multiple Choice 91.54 97.40 86.97 87.91

Human Accuracy (Abstract)
Overall Yes/No Number Other Open Ended 87.49 95.96 95.04 75.33 Multiple Choice 93.57 97.78 96.71 88.73

VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

Real Image Challenges: Dataset
Images Questions Answers Training 80K 240K 2.4M Dataset size is approximate

Images Questions Answers Training 80K 240K 2.4M Validation 40K 120K 1.2M Dataset size is approximate

Images Questions Answers Training 80K 240K 2.4M Validation 40K 120K 1.2M Test Dataset size is approximate

Real Image Challenges: Test Dataset
80K test images Four splits of 20K images each Test-dev (development) Debugging and Validation - unlimited submission to the evaluation server. Test-standard (publications) Used to score entries for the Public Leaderboard. Test-challenge (competitions) Used to rank challenge participants. Test-reserve (check overfitting) Used to estimate overfitting. Scores on this set are never released. Dataset size is approximate Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015

VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

Abstract Scene Challenges: Dataset
Images Questions Answers Training 20K 60K 0.6M

Images Questions Answers Training 20K 60K 0.6M Validation 10K 30K 0.3M

Images Questions Answers Training 20K 60K 0.6M Validation 10K 30K 0.3M Test

Award GPUs!!!

Abstract Scene Challenges
Open-Ended Challenge 5 teams 5 institutions 3 countries Multiple-Choice Challenge 4 teams 4 institutions Top 3 teams are same for Open Ended and Multiple Choice

Abstract Scene Challenges
Winner Team MIL-UT Andrew Shin* Kuniaki Saito* Yoshitaka Ushiku Tatsuya Harada Open Ended Challenge Accuracy: 67.39 Multiple Choice Challenge Accuracy: 71.18

Real Image Challenges Open-Ended Challenge Multiple-Choice Challenge
25 teams 26 institutions 8 countries Multiple-Choice Challenge 15 teams 17 institutions 6 countries Top 5 teams are same for Open Ended and Multiple Choice

Real Image Challenges Honorable Mention Brandeis Aaditya Prakash
Open Ended Challenge Accuracy: 62.80 Multiple Choice Challenge Accuracy: 65.17

Real Image Challenges Runner-Up Team Naver Labs Hyeonseob Nam
Jeonghee Kim Open Ended Challenge Accuracy: 64.89 Multiple Choice Challenge Accuracy: 69.37

Real Image Challenges Winner Team Open Ended Challenge Accuracy: 66.90
UC Berkeley & Sony Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach Open Ended Challenge Accuracy: 66.90 Multiple Choice Challenge Accuracy: 70.52

Real Open-Ended Challenge
arXiv v6 ICCV15

+12.76% absolute

Statistical Significance
Bootstrap samples 5000 times @ 99% confidence

Easy vs. Difficult Questions (Real Open-Ended Challenge)

80.6% of questions can be answered by at least 1 method! Difficult Questions

Easy Questions Difficult Questions

Difficult Questions with Rare Answers

Difficult Questions with Rare Answers
What is the name of … What is the number on … What is written on the … What does the sign say? What time is it? What kind of … What type of … Why …

with Frequent Answers Easy Questions

Success Cases Q: What is the woman holding? GT A: laptop
Machine A: laptop Q: What room is the cat located in? GT A: kitchen Machine A: kitchen Q: Is it going to rain soon? GT A: yes Machine A: yes Q: Is this a casino? GT A: no Machine A: no

Failure Cases Q: What is the woman holding? GT A: book
Machine A: knife Q: Why is there snow on one side of the stream and clear grass on the other? GT A: shade Machine A: yes Q: Where is the blue and white umbrella? GT A: on left Machine A: right Q: Is the hydrant painted a new color? GT A: yes Machine A: no

Easy Questions Difficult Questions

Answer Type and Question Type Analyses
Per Answer Type No team statistically significantly better than winner Per Question Type

Results of the Poll 25 responses

Image Modelling

Question Modelling

Question Word Modelling

Attention on Images

Attention on Questions

Use of Ensemble

Use of External Data Sources

Question Type Specific Mechanisms

Classification vs. Generation of Answers

Future Plans VQA Challenge 2017? What changes do you want? Sub tasks?
More difficult/easy dataset? Dialogue/conversational QA? New evaluation metric? Other annotations?

Thanks! Questions?

Overview of Challenge Aishwarya Agrawal (Virginia Tech)

Similar presentations

Presentation on theme: "Overview of Challenge Aishwarya Agrawal (Virginia Tech)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of Challenge Aishwarya Agrawal (Virginia Tech)

Similar presentations

Presentation on theme: "Overview of Challenge Aishwarya Agrawal (Virginia Tech)"— Presentation transcript:

Similar presentations

About project

Feedback