Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Challenge Aishwarya Agrawal (Virginia Tech)

Similar presentations


Presentation on theme: "Overview of Challenge Aishwarya Agrawal (Virginia Tech)"— Presentation transcript:

1 Overview of Challenge Aishwarya Agrawal (Virginia Tech)
Stanislaw Antol (Virginia Tech) Larry Zitnick (Facebook AI Research) Dhruv Batra (Virginia Tech) Devi Parikh (Virginia Tech)

2 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

3 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

4 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

5 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

6 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

7 VQA Task

8 VQA Task What is the mustache made of?

9 VQA Task AI System What is the mustache made of?

10 VQA Task AI System bananas What is the mustache made of?

11 Real images (from COCO)
Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext.” ECCV 2014.

12 and abstract scenes.

13 Questions Stump a smart robot!
Ask a question that a human can answer, but a smart robot probably can’t!

14 VQA Dataset

15 Dataset Stats >250K images (COCO + 50K Abstract Scenes)
>750K questions (3 per image) ~10M answers (10 w/ image + 3 w/o image)

16 Two modalities of answering
Open Ended Multiple Choice (18 choices) 1 correct answer 3 plausible choices 10 most popular answers Rest random answers

17 Accuracy Metric

18 Human Accuracy (Real) Overall Yes/No Number Other Open Ended 83.30
95.77 83.39 72.67 Multiple Choice 91.54 97.40 86.97 87.91

19 Human Accuracy (Real) Overall Yes/No Number Other Open Ended 83.30
95.77 83.39 72.67 Multiple Choice 91.54 97.40 86.97 87.91

20 Human Accuracy (Abstract)
Overall Yes/No Number Other Open Ended 87.49 95.96 95.04 75.33 Multiple Choice 93.57 97.78 96.71 88.73

21 Human Accuracy (Abstract)
Overall Yes/No Number Other Open Ended 87.49 95.96 95.04 75.33 Multiple Choice 93.57 97.78 96.71 88.73

22 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

23 VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

24 VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

25 Real Image Challenges: Dataset
Images Questions Answers Training 80K 240K 2.4M Dataset size is approximate

26 Real Image Challenges: Dataset
Images Questions Answers Training 80K 240K 2.4M Validation 40K 120K 1.2M Dataset size is approximate

27 Real Image Challenges: Dataset
Images Questions Answers Training 80K 240K 2.4M Validation 40K 120K 1.2M Test Dataset size is approximate

28 Real Image Challenges: Test Dataset
80K test images Four splits of 20K images each Test-dev (development) Debugging and Validation - unlimited submission to the evaluation server. Test-standard (publications) Used to score entries for the Public Leaderboard. Test-challenge (competitions) Used to rank challenge participants. Test-reserve (check overfitting) Used to estimate overfitting. Scores on this set are never released. Dataset size is approximate Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015

29 VQA Challenges on www.codalab.org Real Open Ended Real Real
Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

30 Abstract Scene Challenges: Dataset
Images Questions Answers Training 20K 60K 0.6M

31 Abstract Scene Challenges: Dataset
Images Questions Answers Training 20K 60K 0.6M Validation 10K 30K 0.3M

32 Abstract Scene Challenges: Dataset
Images Questions Answers Training 20K 60K 0.6M Validation 10K 30K 0.3M Test

33 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

34 Award GPUs!!!

35 Abstract Scene Challenges
Open-Ended Challenge 5 teams 5 institutions 3 countries Multiple-Choice Challenge 4 teams 4 institutions Top 3 teams are same for Open Ended and Multiple Choice

36 Abstract Scene Challenges
Winner Team MIL-UT Andrew Shin* Kuniaki Saito* Yoshitaka Ushiku Tatsuya Harada Open Ended Challenge Accuracy: 67.39 Multiple Choice Challenge Accuracy: 71.18

37 Real Image Challenges Open-Ended Challenge Multiple-Choice Challenge
25 teams 26 institutions 8 countries Multiple-Choice Challenge 15 teams 17 institutions 6 countries Top 5 teams are same for Open Ended and Multiple Choice

38 Real Image Challenges Honorable Mention Brandeis Aaditya Prakash
Open Ended Challenge Accuracy: 62.80 Multiple Choice Challenge Accuracy: 65.17

39 Real Image Challenges Runner-Up Team Naver Labs Hyeonseob Nam
 Jeonghee Kim Open Ended Challenge Accuracy: 64.89 Multiple Choice Challenge Accuracy: 69.37

40 Real Image Challenges Winner Team Open Ended Challenge Accuracy: 66.90
UC Berkeley & Sony Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach Open Ended Challenge Accuracy: 66.90 Multiple Choice Challenge Accuracy: 70.52

41 Overview of Task and Dataset
Outline Overview of Task and Dataset Overview of Challenge Winner Announcements Analysis of Results

42 Real Open-Ended Challenge
arXiv v6 ICCV15

43 Real Open-Ended Challenge
+12.76% absolute

44 Statistical Significance
Bootstrap samples 5000 times @ 99% confidence

45 Real Open-Ended Challenge

46 Easy vs. Difficult Questions (Real Open-Ended Challenge)

47 Easy vs. Difficult Questions (Real Open-Ended Challenge)

48 Easy vs. Difficult Questions (Real Open-Ended Challenge)
80.6% of questions can be answered by at least 1 method! Difficult Questions

49 Easy vs. Difficult Questions (Real Open-Ended Challenge)
Easy Questions Difficult Questions

50 Difficult Questions with Rare Answers

51 Difficult Questions with Rare Answers
What is the name of … What is the number on … What is written on the … What does the sign say? What time is it? What kind of … What type of … Why …

52 Easy vs. Difficult Questions (Real Open-Ended Challenge)

53 Easy vs. Difficult Questions (Real Open-Ended Challenge)
with Frequent Answers Easy Questions

54 Success Cases Q: What is the woman holding? GT A: laptop
Machine A: laptop Q: What room is the cat located in? GT A: kitchen Machine A: kitchen Q: Is it going to rain soon? GT A: yes Machine A: yes Q: Is this a casino? GT A: no Machine A: no

55 Failure Cases Q: What is the woman holding? GT A: book
Machine A: knife Q: Why is there snow on one side of the stream and clear grass on the other? GT A: shade Machine A: yes Q: Where is the blue and white umbrella? GT A: on left Machine A: right Q: Is the hydrant painted a new color? GT A: yes Machine A: no

56 Easy vs. Difficult Questions (Real Open-Ended Challenge)
Easy Questions Difficult Questions

57 Easy vs. Difficult Questions (Real Open-Ended Challenge)

58 Answer Type and Question Type Analyses
Per Answer Type No team statistically significantly better than winner Per Question Type

59 Results of the Poll 25 responses

60 Image Modelling

61 Question Modelling

62 Question Word Modelling

63 Attention on Images

64 Attention on Questions

65 Use of Ensemble

66 Use of External Data Sources

67 Question Type Specific Mechanisms

68 Classification vs. Generation of Answers

69 Future Plans VQA Challenge 2017? What changes do you want? Sub tasks?
More difficult/easy dataset? Dialogue/conversational QA? New evaluation metric? Other annotations?

70 Thanks! Questions?


Download ppt "Overview of Challenge Aishwarya Agrawal (Virginia Tech)"

Similar presentations


Ads by Google