Download presentation
Presentation is loading. Please wait.
Published byOscar Hood Modified over 9 years ago
1
Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge
2
Background: The bAbI dataset Introducing the GABITS dataset ( http://nowin2d.com/gabits/) http://nowin2d.com/gabits/
3
Some history
4
(slide from Bordes, Weston, Chopra, Mikolov, Joulin & Bottou, 2015)
5
Generating Process Training Set Test Set
6
Facebook’s bAbI dataset (slide from Bordes, Weston, Chopra, Mikolov, Joulin & Bottou, 2015)
7
The bAbI dataset (slide from Bordes, Weston, Chopra, Mikolov, Joulin & Bottou, 2015)
8
Introducing GABITS The Grounded and bAbI-Inspired Task Set
9
Each training instance consists of – A narrative – A group of questions and associated answers – An image illustrating the state of the world at every point when something changes state – A symbolic representation of the state of the world at every point when something changes state (optional) Introducing GABITS The Grounded and bAbI-Inspired Task Set
10
1 The lamp is in the kitchen. 2 The ball is in the dining room. 3 Eve is in the hall. 4 Carol is in the kitchen. 5 Frank is in the hall. 6 Carol got the lamp. 7 Eve went to the kitchen. 8 Eve travelled to the billiard room. 9 Frank travelled to the kitchen. 10 Eve went to the kitchen. 11 Carol travelled to the billiard room. 12 Carol discarded the lamp. 13 Carol grabbed the lamp. Narrative
11
14 (T1.a) Who is in the kitchen?Eve,Frank 15 (T2.a) Where is Eve?kitchen 16 (T3.a) What is Carol holding?lamp 17 (T12.3) How many objects is Carol holding?one 18 (T3.b) Who is holding the lamp?Carol 19 (T3.b) Who is holding the ball?no one 20 (T4.a) What has Carol held?lamp 21 (T4.b) Who has held the lamp?Carol 22 (T6) Who moved the lamp to the billiard room?Carol 23 (T7.a) Where has Eve been?billiard room,hall,kitchen 24 (T7.a) Where has Frank been?hall,kitchen 25 (T7.c) Where has the lamp been?billiard room,kitchen 26 (T7.b) Where has Eve not been?dining room 29 (T8.a) Who has been in the hall?Eve,Frank 30 (T8.c) What has been in the kitchen?lamp 32 (T8.c) What has been in the billiard room?lamp 33 (T8.b) Who has not been in the billiard room?Frank Questions
12
27 (T12.8) How many people have been in the kitchen?three 28 (T13.8) Have fewer than four people been in the kitchen?yes 31 (T13.8) Have fewer than three objects been in the kitchen?yes 37 (T11) Who has been in the hall or the dining room (but not both)?Eve,Frank 38 (T9.a) Who has been in the dining room or the kitchen (or both)? Carol,Eve,Frank 42 (T13.9) Have more than five people been in the billiard room or the hall or both?no Questions (cont.)
13
Visual representation of world
14
1 The lamp is in the kitchen. 2 The ball is in the dining room. 3 Eve is in the hall. 4 Carol is in the kitchen. 5 Frank is in the hall.
15
6 Carol got the lamp.
16
7 Eve moved to the kitchen.
17
Symbolic representation of world agent2.name Frank agent2.x 170 agent2.y 414 (agent2.room hall) item0.name lamp item0.x 278 item0.y 408 (item0.room kitchen) (item0.owner Carol) item1.name ball item1.x 118 item1.y 52 (item1.room dining room) (item1.owner null) time: 65 (Carol took the lamp) agent0.name Eve agent0.x 149 agent0.y 324 (agent0.room hall) agent1.name Carol agent1.x 284 agent1.y 414 (agent1.room kitchen)
18
self-contained: all or nearly all of the information necessary to perform well at the task is present within the training data – It should be obviously possible for a human to solve the task even if they do not speak the language in which the task is rendered Advantages
19
incremental and compositional: questions build on each other Advantages Who is in the hall? Who has been in the hall? Who has not been in the hall? Who has been in the hall and the lounge? How many people have been in the hall? How many people have been in the hall and the lounge?
20
wide-coverage: the tasks in the dataset correspond to diverse abilities For even wider coverage - even more tasks! In recent years, there has been an increasing number of papers with (mostly) self-contained tasks involving two- or three-dimensional spatial representations Contact me for our list so far! Or to let me know about more tasks to add to the collection! Advantages
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.