Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Slides:



Advertisements
Similar presentations
VARTAN – Validation Reporting Templates Jürgen Teutsch, NLR CAATS Workshop, 16-Feb-2006, Lanzarote.
Advertisements

ESCalate Funding Workshop how to give yourself the best chance of success Student Conference April 2009 Dr Alexis Taylor.
Generation of Referring Expressions: Managing Structural Ambiguities I.H. KhanG. Ritchie K. van Deemter University of Aberdeen, UK.
Generation of Referring Expressions: the State of the Art SELLC Winter School, Guangzhou 2010 Kees van Deemter Computing Science University of Aberdeen.
Some common assumptions behind Computational Generation of Referring Expressions (GRE) (Introductory remarks at the start of the workshop)
SELLC Winter School 2010 Evaluating Algorithms for GRE Kees van Deemter (work with Albert Gatt, Ielka van der Sluis, and Richard Power) University of Aberdeen,
Conceptual coherence in the generation of referring expressions Albert Gatt & Kees van Deemter University of Aberdeen {agatt,
Generation of Referring Expressions: the State of the Art SELLC Summer School, Harbin 2010 Kees van Deemter Computing Science University of Aberdeen.
Generation of Referring Expressions: the State of the Art LOT Winter School, Tilburg 2008 Kees van Deemter Computing Science University of Aberdeen.
Generation of Referring Expressions: the State of the Art SELLC Winter School, Guangzhou 2010 Kees van Deemter Computing Science University of Aberdeen.
Topic Identification in Forums Evaluation Strategy IA Seminar Discussion Ahmad Ammari School of Computing, University of Leeds.
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Dr. Ehud Reiter, Computing Science, University of Aberdeen1 NLG Shared Tasks: Lets try it and see what happens Ehud Reiter (Univ of Aberdeen)
CH 11: Learning Together on the Web Definition: Collaborative learning is a structured exchange between two or more participants designed to enhance achievement.
1 Today’s Plan 900am–915am:Sort out questions regarding refund forms 915am–945am:Finalise today’s agenda 945am-1045am:Breakout Groups Session am–1115am:Coffee.
Multimedia Project Proposal
Fightin’ Words Points for discussion Points for review Looking for a report structure?
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
School of Computing and Mathematical Sciences
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
PILOT STUDY. Pilot study is a small scale preliminary study conducted in order to evaluate feasibility, time, cost, adverse events, and effect size (Statistical.
Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo
Parliamentary Committees in Democracies: Unit 4 Research Services for Parliamentary Committees.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Types of evaluation examine different aspects of performance Resources (Inputs) ActivitiesOutputs Short-Term Outcomes Intermediate Outcomes (through customers)
Final evaluation of the Research Programme on Social Capital and Networks of Trust (SoCa) 2004 – 2007: What should the Academy of Finland learn.
G O D D A R D S P A C E F L I G H T C E N T E R 1 Global Precipitation Measurement (GPM) GV Data Exchange Protocol Mathew Schwaller GPM Formulation Project.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
Let’s stay serious – but motivated! Introduction to Gamification in Education DELP Workshop 2015 – Renée Schulz.
Sarasota Policy Wiki Why Wiki? To provide a new platform for community input on public policies and issues. To encourage engagement.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI SA2 services evolution (after the end of EGI-InSPIRE) Peter Solagna, Michel.
Contact Seminar Leonardo da Vinci - Grundtvig november 2008 Sustainable Development Leonardo da Vinci and Grundtvig Contact Seminar « sustainable.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Breakout C3 Breakout leader: Cynthia Breazeal. What must researchers and NSF do to achieve measurable results Need to define metrics (technical perf,
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
The National History Day Way. Hung, W. (2008). The 9-step problem design process for problem-based learning: Application of the 3C3R model. Educational.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
RANLP, Borovets Sept Evaluating Algorithms for GRE (Going beyond Toy Domains) Ielka van der Sluis Albert Gatt Kees van Deemter University of.
ETISEO Evaluation Nice, May th 2005 Evaluation Cycles.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
September 05Eason et al LWW61 A ‘Joined-Up’ Electronic Journal Service: User Attitudes and Behaviour Ken Eason 1, Ross MacIntyre 2 and Ann Apps 2 1 The.
1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.
Virtual Collaborative Social Living Community for Elderly Kick Off Event WP 7 – Dissemination, Exploitation Strategy and Standardization Inova+ Co-Living.
Corpus-based evaluation of Referring Expression Generation Albert Gatt Ielka van der Sluis Kees van Deemter Department of Computing Science University.
Jette Viethen 20 April 2007NLGeval07 Automatic Evaluation of Referring Expression Generation is Possible.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Summary CDBM IMAGE meeting, 07 IMAGE SUMMARY IMAGE set up to provide a venue for discussion between the different Integrated Modelling (IM) activities.
Shared Task Virtual World Barbara Di Eugenio Andrew Koller James Lester Johanna Moore Laura Stoia.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
ACTion for Mathematics-ASPIRE. Background The math assessment was developed to reflect students’ knowledge and skill accumulation over time; capturing.
GRE READING COMPREHENSION. READING COMPREHENSION PASSAGE STRUCTURES Three Classic GRE Passage Structures Arguing a Position Discussing something specific.
Putting development and evaluation of core technology first Anja Belz Natural Language Technology Group University of Brighton, UK N L T G.
Summary of HEP SW workshop Ian Bird MB 15 th April 2014.
Metadata Information about information. What is the information here? Say we have part of a data set: What do these numbers signify?
Kees van Deemter Generation of Referring Expressions: a crash course Background information and Project HIT 2010.
1 CORDIS and FP6 4 steps to succeed Greece FP6 Workshop.
September 30, 2008 Jim Van Dyke, Web:
Crowdsourcing: How to Benefit from (Too) Many Great Ideas (Blohm et al., 2013) Olga Jemeljanova Joona Kanerva Niko Kuki Mikko Nummela Group
22 nd February 2006 Virtual Research Environments Programme Presentation to JISC Committee for the Support of Research VRE Formative Evaluation: First.
Deliverables, final review and final reporting
What is a Learning Collaborative?
CS 456 Interactive Software.
National 5 Course content.
Introduction to Web Authoring Ellen Cushman cushmane
Developing Metrics to Assess Community Impact The Anchor Dashboard
Learning outcomes By the end of this chapter you should: • understand the importance and purpose of the critical literature review to your research project;
Kees van Deemter Computing Science University of Aberdeen
Presentation transcript:

Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen

Available resources ● TUNA Corpus (Gatt et al; ca refs)  one-shot references  balanced  2500 refs to furniture or people ● Robert's drawers (Viethen and Dale; ca. 140 refs)  one-shot references  not yet balanced ● GREC (“GRE in Context”) (Belz and Varges)  2000 introductory passages from Wikipedia  1000 annotated, rest in progress  annotated for reference to the main subject (“topic”)  different NP types:subjects, objects, possessives ● COCONUT (Jordan)  goes beyond just identification ● (possibly another corpus of newspaper texts)

Short-term additions to resources ● Add comprehension data:  Carry out experiments to get people to identify referents and pair results with corpus descriptions. Data include: ● reaction time ● error rate ● self-paced reading for GREC-type corpora

Long-term additions to resources ● Eye-tracking data ● Situated reference in virtual environments (Koller et al, this Workshop) ● In progress: small multimodal corpus (Bangerter, van der Sluis, Gatt)

Task definition ● Task structure:  provide a data source  have a small set of clearly defined tasks but ALSO:  have an open category ● Evaluation:  default metric  call for proposals for evaluation metrics  correlate metrics with human judgments/performance ● Scope for variation:  Task: content determination, realisation, lexical choice  Type of reference: full definite, anaphoric, singular/plural  Goal: model production or enhance comprehension

(Sub-)communities ● GRE people (the usual suspects) ● CoNLL/EMNLP community ● Psycholinguists:  advice/expertise  computational psycholinguistic modelling

Aims ● “Community” aims:  Have fun!  Get people working together, consolidate the community  Broaden the community ● Broader aims:  Have a test-bed to see if NLG STECs actually work  GRE is probably the best initial candidate ● Scientific aims:  Hothouse effect  Evaluation: ● Use different methods ● Evaluate the methods

Execution: Logistics ● Dry run to pilot the idea  Possibly at UCNLG (September)  Shared competitive task: Content Determination ● singular definites, furniture  Production evaluation, using TUNA  Include a call for evaluation metrics  Also include open track ● Main event (larger scale & wider scope)  Co-located with INLG?  Several shared tasks + open category  Evaluation: ● Production: match between algorithm & human ● Comprehension: ease of identification, etc.

Evaluation: £££ ● Sources of expense:  Human evaluations  Adding comprehension data to the corpora  Organisational costs (web site, etc) ● Who's paying?  Community effort  Aberdeen platform grant  Brighton Prodigy project funds  No special funding (yet)