Jette Viethen 20 April 2007NLGeval07 Automatic Evaluation of Referring Expression Generation is Possible.

Slides:



Advertisements
Similar presentations
Kees van Deemter Matthew Stone Formal Issues in Natural Language Generation Lecture 4 Shieber 1993; van Deemter 2002.
Advertisements

Some common assumptions behind Computational Generation of Referring Expressions (GRE) (Introductory remarks at the start of the workshop)
The Cost of Authoring with a Knowledge Layer Judy Kay and Lichao Li School of Information Technologies The University of Sydney, Australia.
Strategies to Measure Student Writing Skills in Your Disciplines Joan Hawthorne University of North Dakota.
Session # 2 SWE 211 – Introduction to Software Engineering Lect. Amanullah Quadri 2. Fact Finding & Techniques.
CS4018 Formal Models of Computation weeks Computability and Complexity Kees van Deemter (partly based on lecture notes by Dirk Nikodem)
Is it Mathematics? Linking to Content Standards. Some questions to ask when looking at student performance Is it academic? – Content referenced: reading,
1 To Share a Task or Not: Some Ramblings from a Mad (i.e., crazy) INLGer Kathy McCoy CIS Department University of Delaware.
The Big Picture Chapter 3. We want to examine a given computational problem and see how difficult it is. Then we need to compare problems Problems appear.
The Middle Ages of Planning Craig Knoblock University of Southern California.
Software Testing and Quality Assurance
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
PDDL: A Language with a Purpose? Lee McCluskey Department of Computing and Mathematical Sciences, The University of Huddersfield.
CAD/CAM Design Process and the role of CAD. Design Process Engineering and manufacturing together form largest single economic activity of western civilization.
Requirements Engineering Process – 1
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 27 Slide 1 Quality Management 1.
Before I stated the database I had to save it into My Documents> ICT> You can do it> D201EPORTFOLIO> Evidence For the field group food item, I set the.
Chapter 8 Architecture Analysis. 8 – Architecture Analysis 8.1 Analysis Techniques 8.2 Quantitative Analysis  Performance Views  Performance.
 1  Outline  stages and topics in simulation  generation of random variates.
How to write a good proposal? Bucharest, 3-4 March 2010 Koos de Korte.
Business Analysis and Essential Competencies
Ways for Improvement of Validity of Qualifications PHARE TVET RO2006/ Training and Advice for Further Development of the TVET.
Programming Project (Last updated: August 31 st /2010) Updates: - All details of project given - Deadline: Part I: September 29 TH 2010 (in class) Part.
WXGE 6103 Digital Image Processing Semester 2, Session 2013/2014.
An Introduction to Programming and Algorithms. Course Objectives A basic understanding of engineering problem solving process. A basic understanding of.
Chapter 3 Developing an algorithm. Objectives To introduce methods of analysing a problem and developing a solution To develop simple algorithms using.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Creative Job Search How to Get the Job You Really Want X420 Discussion Session # 19.
BEHAVIOUR FOR LEARNING Be an ACTIVE learner Be willing and ready to learn Behave positively Treat everyone with respect Always try your best. Are you.
Search Engine Optimization © HiTech Institute. All rights reserved. Slide 1 What is Solution Assessment & Validation?
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Corpus-based evaluation of Referring Expression Generation Albert Gatt Ielka van der Sluis Kees van Deemter Department of Computing Science University.
An Evaluation Competition? Eight Reasons to be Cautious Donia Scott Open University & Johanna Moore University of Edinburgh.
JOBTALKS Your Creative Job Search Indiana University Kelley School of Business C. Randall Powell, Ph.D Contents used in this presentation are adapted from.
– ALGEBRA I – Unit 1 – Section 3 Function Notation Suppose that you have a two-variable relationship… To generate ordered pairs, you need to choose values.
Lecture 2 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Write a function rule for a graph EXAMPLE 3 Write a rule for the function represented by the graph. Identify the domain and the range of the function.
Putting development and evaluation of core technology first Anja Belz Natural Language Technology Group University of Brighton, UK N L T G.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 4: Introduction to C: Control Flow.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Kees van Deemter Generation of Referring Expressions: a crash course Background information and Project HIT 2010.
US Army Corps of Engineers BUILDING STRONG ® STEP 3 - FORMULATION BRAINSTORMING MANAGEMENT MEASURES Planning Principles & Procedures – FY11.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Of 24 lecture 11: ontology – mediation, merging & aligning.
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
Algorithms and Problem Solving
What is a CAT? What is a CAT?.
Processes and Process Models
SYSTEMS ANALYSIS Chapter-2.
Evaluate the expression ( i) + ( i) and write the result in the form a + bi. Choose the answer from the following: i i i.
Lecture 12: Data Wrangling
Presentation 王睿.
What Are Rubrics? Rubrics are components of:
For x = -1, determine whether f is continuous from the right, or from the left, or neither. {image}
For x = 5, determine whether f is continuous from the right, or from the left, or neither. {image}
Coding Concepts (Basics)
Programming We have seen various examples of programming languages
Block Matching for Ontologies
Evaluate the limit: {image} Choose the correct answer from the following:
Algorithms and Problem Solving
CONTROL SYSTEM AN INTRODUCTION.
Requirements Engineering Process – 1
Competence (human resources)
Engineering Design Process
Given that {image} {image} Evaluate the limit: {image} Choose the correct answer from the following:
Processes and Process Models
Presentation transcript:

Jette Viethen 20 April 2007NLGeval07 Automatic Evaluation of Referring Expression Generation is Possible

20 April 2007NLGeval07 2 Preliminaries Why do we want shared evaluation? It has benefited other fields. To learn about evaluation techniques. For fun. To provide resources. To measure and ensure progress. What do we want to evaluate? Applications or NLG subtasks? How do we want to evaluate? Competitive – Comparative – Collaborative Automatic or human evaluation?

20 April 2007NLGeval07 3 Agenda Yes, REG is still mainly focussed on distinguishing initial reference. Taking an optimistic look at the 5 main challenges for REG evaluation: Defining Gold Standards Output Expectations Parameters A wide field with few players Input Representation

20 April 2007NLGeval07 4 Defining Gold Standards For automatic evaluation, we need gold standard corpora to which to compare system output. There is never just one correct answer in NLG. Every object can be described in many acceptable ways. A gold standard for REG needs to contain “all” acceptable descriptions for each object to be fair. The TUNA corpus looks like the right point.

20 April 2007NLGeval07 5 Output Expectations Quantity: Are we content with only one solution? Evaluate one description per object from each system – for now. Maybe later allow multiple entries. Quality: What is a “good” referring expression? Get people to rank different descriptions for the same object. Assess usability by success rate and time. Many factors make it hard to assess one subtask. Linguistic Level: From content determination to surface realisation. Concentrate on content determination – for now.

20 April 2007NLGeval07 6 Parameters Most REG systems take one or a number of parameters. Very fine grained parameters allow the engineering of virtually any desired output. What do we want to evaluate? The theoretical capacity of a system: Parameter part of the system and not be switched during an evaluation. The actual execution of the task: Automatic determination of the best parameter settings for describing a certain object.

20 April 2007NLGeval07 7 A wide field with few players We need to use our human resources wisely! REG has many people working on it and is well defined. However, people are working on many sub-problems and domains. Concentrating on one competitive task would divert attention from other important areas. The evaluation corpus needs to cover a number of domains and be subdividable into types of referring expressions.

20 April 2007NLGeval07 8 Input Representation Counting from infinity to infinity... Highly dependent on application domain. Tightly intertwined with algorithm design. Let everyone choose their own representation. Representation is part of the system. Challenge of finding the same properties that people used. Agree on a common underlying knowledge base. Based on properties and relations used in the corpus. Input representation and algorithm design can be detangled.

20 April 2007NLGeval07 9 Summary To get started with automatic evaluation for REG: Build a corpus containing as many “good” REs per object as possible. Get human rankings for the REs in the corpus. Concentrate on a low linguistic level for now. Treat parameter settings as part of the algorithm. Include many different kinds of REs in the corpus. Don‘t compete on one task. Share resources. Standardise the underlying knowledge base.