Download presentation
Presentation is loading. Please wait.
Published byAlexina Hoover Modified over 9 years ago
1
Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen
2
Available resources ● TUNA Corpus (Gatt et al; ca. 2500 refs) one-shot references balanced 2500 refs to furniture or people ● Robert's drawers (Viethen and Dale; ca. 140 refs) one-shot references not yet balanced ● GREC (“GRE in Context”) (Belz and Varges) 2000 introductory passages from Wikipedia 1000 annotated, rest in progress annotated for reference to the main subject (“topic”) different NP types:subjects, objects, possessives ● COCONUT (Jordan) goes beyond just identification ● (possibly another corpus of newspaper texts)
3
Short-term additions to resources ● Add comprehension data: Carry out experiments to get people to identify referents and pair results with corpus descriptions. Data include: ● reaction time ● error rate ● self-paced reading for GREC-type corpora
4
Long-term additions to resources ● Eye-tracking data ● Situated reference in virtual environments (Koller et al, this Workshop) ● In progress: small multimodal corpus (Bangerter, van der Sluis, Gatt)
5
Task definition ● Task structure: provide a data source have a small set of clearly defined tasks but ALSO: have an open category ● Evaluation: default metric call for proposals for evaluation metrics correlate metrics with human judgments/performance ● Scope for variation: Task: content determination, realisation, lexical choice Type of reference: full definite, anaphoric, singular/plural Goal: model production or enhance comprehension
6
(Sub-)communities ● GRE people (the usual suspects) ● CoNLL/EMNLP community ● Psycholinguists: advice/expertise computational psycholinguistic modelling
7
Aims ● “Community” aims: Have fun! Get people working together, consolidate the community Broaden the community ● Broader aims: Have a test-bed to see if NLG STECs actually work GRE is probably the best initial candidate ● Scientific aims: Hothouse effect Evaluation: ● Use different methods ● Evaluate the methods
8
Execution: Logistics ● Dry run to pilot the idea Possibly at UCNLG (September) Shared competitive task: Content Determination ● singular definites, furniture Production evaluation, using TUNA Include a call for evaluation metrics Also include open track ● Main event (larger scale & wider scope) Co-located with INLG? Several shared tasks + open category Evaluation: ● Production: match between algorithm & human ● Comprehension: ease of identification, etc.
9
Evaluation: £££ ● Sources of expense: Human evaluations Adding comprehension data to the corpora Organisational costs (web site, etc) ● Who's paying? Community effort Aberdeen platform grant Brighton Prodigy project funds No special funding (yet)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.