Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University.

Slides:

Advertisements

Similar presentations

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.

Advertisements

The Scientific Method 6 easy steps.

Progress Monitoring Short Response. Rubric for a score of 2 Indicates a thorough understanding of the scientific concept Completed the task correctly.

Generation of Referring Expressions: Managing Structural Ambiguities I.H. KhanG. Ritchie K. van Deemter University of Aberdeen, UK.

Conceptual coherence in the generation of referring expressions Albert Gatt & Kees van Deemter University of Aberdeen {agatt,

Generation of Referring Expressions: the State of the Art LOT Winter School, Tilburg 2008 Kees van Deemter Computing Science University of Aberdeen.

Generation of Referring Expressions: the State of the Art SELLC Winter School, Guangzhou 2010 Kees van Deemter Computing Science University of Aberdeen.

A small taste of inferential statistics

Chapter 7 Hypothesis Testing

CS4018 Formal Models of Computation weeks Computability and Complexity Kees van Deemter (partly based on lecture notes by Dirk Nikodem)

This screen is for reference only Objective: The VP Storyboard Template provides a skeletal structure for VP case design where each slide corresponds to.

Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.

HOW TO WRITE AN ACADEMIC PAPER

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.

A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, and ChengXiang Zhai University of Illinois at Urbana Champaign SIGIR 2004 (Best paper.

Project Proposal.

1 Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Type I and Type II Errors One-Tailed Tests About a Population Mean: Large-Sample.

1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.

Writing for Publication

Using Statistics to Analyze your Results

Writing a Research Paper

Writing tips Based on Michael Kremer’s “Checklist”,

A quick guide to success

Click to highlight each section of the article one by one Read the section, then click once to view the description of it If you want to read it, you.

Click to highlight each section of the article one by one Read the section, then click once to view the description of it If you want to read it, you.

Left click or use the forward arrows to advance through the PowerPoint Upon clicking, each section of the article will be highlighted one by one Read.

Topics Covered Abstract Headings/Subheadings Introduction/Literature Review Methods Goal Discussion Hypothesis References.

DERIVING LINEAR REGRESSION COEFFICIENTS

Science Inquiry Minds-on Hands-on.

Writing User-Oriented Instructions and Manuals Debopriyo Roy.

Left click or use the forward arrows to advance through the PowerPoint Upon clicking, each section of the article will be highlighted one by one Read.

The Characteristics of an Experimental Hypothesis

An Introduction to Research Methodology

ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?

Take out a blank sheet of paper  Number one side of the page from 1 to 10 from top to bottom. Do NOT put your name on it.  As words appear on the screen,

Hypothesis Testing A hypothesis is a conjecture about a population. Typically, these hypotheses will be stated in terms of a parameter such as  (mean)

RESEARCH A systematic quest for undiscovered truth A way of thinking

INTRODUCTION TO SCIENCE & THE

Introduction to writing scientific papers Gaby van Dijk.

1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

How To Give A Scientific Seminar Michelle Chow Ocean Discovery! Sebastopol, CA.

ABSTRACT Function: An abstract is a summary of the entire work that helps readers to decide whether they want to read the rest of the paper. (HINT…write.

1 1 Slide © 2003 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

Knowledge based Humans use heuristics a great deal in their problem solving. Of course, if the heuristic does fail, it is necessary for the problem solver.

UNIT 3 SEMINAR LS504: Applied Research in Legal Studies.

Left click or use the forward arrows to advance through the PowerPoint Upon advancing, each section of the article will be highlighted one by one Read.

In this chapter, you learn about the following: ❑ Anomalies ❑ Dependency and determinants ❑ Normalization ❑ A layman’s method of understanding normalization.

Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.

RANLP, Borovets Sept Evaluating Algorithms for GRE (Going beyond Toy Domains) Ielka van der Sluis Albert Gatt Kees van Deemter University of.

Creating a poster is easier than you think.

Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry

Expert System Note: Some slides and/or pictures are adapted from Lecture slides / Books of Dr Zafar Alvi. Text Book - Aritificial Intelligence Illuminated.

What do we cover in section C?. Unit 4 research methods Explain the key features of scientific investigation and discuss whether psychology can be defined.

Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.

Introduction to Science.  Science: a system of knowledge based on facts or principles  Science is observing, studying, and experimenting to find the.

Synthesizing Natural Textures Michael Ashikhmin University of Utah.

CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.

Corpus-based evaluation of Referring Expression Generation Albert Gatt Ielka van der Sluis Kees van Deemter Department of Computing Science University.

Le parc japonais est beau et calme La fille japonaise est belle mais bavarde Ritsurin Park, Takamatsu.

BY DR. HAMZA ABDULGHANI MBBS,DPHC,ABFM,FRCGP (UK), Diploma MedED(UK) Associate Professor DEPT. OF MEDICAL EDUCATION COLLEGE OF MEDICINE June 2012 Writing.

What is Research?. Intro.  Research- “Any honest attempt to study a problem systematically or to add to man’s knowledge of a problem may be regarded.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

Statistical Data Analysis

Introduction to science

Reading Research Papers-A Basic Guide to Critical Analysis

Statistical Data Analysis

Managerial Decision Making and Evaluating Research

Kees van Deemter Computing Science University of Aberdeen

Presentation transcript:

Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University of Sao Paulo # = University of Aberdeen

What this is about Generation of Referring Expressions (GRE) Referring expression is overspecified if a clear referring expression can be obtained by removing a property Informally: overspecified = logically redundant

Introduction to the problem Suppose – I live on Western Road, the longest street in Aberdeen –I live at number 968. No other house in Aberdeen has that number Number 968, Aberdeen is a distinguishing description, but its not very useful Its better to add logically redundant information, e.g., 968 Western Road, Aberdeen, or even 968 Western Road, Bon Accord, Aberdeen

Overspecification in referring expressions Any GRE algorithm that does not achieve Full Brevity (Dale 1989) Investigated in its own right by e.g. –Arts 2004 (role of location; purely empirical) –Jordan 2000 (overspec in specific situations, e.g., when a sale is confirmed) –Horacek 2005 (overspec when there is uncertainty about applicability of properties)

Our focus: The need for overspecification when a large domain is not fully known in advance to a hearer. Typical examples involve space or time: –A house in a city, a photocopier in a building, a picture in a document –(An event or object in time, e.g., the minister of the colonies in the XYZ government ) This talk: empirical validation of algorithms

Caveat Overspecification can make it easier to identify the referent but it is bound to lengthen reading times Our terminology: we expect overspecification –to make interpretation harder –to make resolution easier

Short history... Paraboni & van Deemter (INLG-2002): A simple theory of the way in which hearers perform search. Ancestral Search (AS) Two types of situations that AS predicts to be problematic for hearers: Lack of Orientation (LO) and Dead End (DE). An algorithm (in two flavours) that adds redundant information when AS predicts these problems An experiment to test whether these algorithms improve the output of GRE

(1) Lack of Orientation (LO) University of Brighton Watts building Cockcroft building North Wing South WingNorth West South biblioteca auditorium the West Wing

(2) Dead End (DE) University of Brighton Watts building Cockcroft building North Wing ? South WingNorth West South library auditorium the library in the North Wing

Explanation (informal!) Why are LO and DE bad? Ancestral Search (AS): Search locally, then one level up at a time Essentially, this is just salience (cf. Krahmer & Theune 2000) applied to hierarchies

Summary of Experiment 1: Descriptions compared by subjects 15 subjects were shown documents from which most of the words were deleted Binary forced choice between two expressions that refer to document parts: 1.the obvious minimal description 2.the redundant description generated by our algorithm

What the subjects chose between (example)

Hypotheses & Outcomes Hyp 1: In problematic situations, redundant descriptions are preferred Hyp 2: In non-problematic situations, non-redundant descriptions are preferred Outcomes: –Hyp 1: overwhelmingly confirmed –Hyp 2: trend in the right direction (57%), but not statistically significant. (Too few subjects?)

Limitations of first experiment This experiment was hybrid: partly about reading, partly about writing It did not teach us why redundant descriptions were preferred (in problematic cases) We think this was because non-redundant descriptions caused problems for resolution but the experiment did not address resolution separately. (Subjects may have balanced interpretation and resolution when judging).

What next? Therefore, a new experiment was called for, which addresses resolution only. Documents as our domain again Add hyperlinks to support non-linear search through the document Track readers resolution (i.e., search) process Intricate experiment, hence a new author: Judith Masthoff (University of Aberdeen)

Experiment 2: Tracking resolution Effect of logical redundancy on the performance of readers Focussing on resolution

Experimental Design 40 subjects completed experiment Within-subjects design: each subject shown 20 documents Order of documents randomized Documents were made to look different Reader had knowledge of hierarchical structure Reader was given task: Please click on.. Navigation actions recorded

Lets talk about helicopters. Please click on picture 4 in part C Reader Location

Hypothesis 1 In a problematic (DE/LO) situation, the number of navigation actions required for a long (FI/SL) description is smaller than that required for a minimal description. Informally: redundancy helps resolution! (in problematic situations)

But... it seems likely that redundant information will always help resolution so lets compare the Gain in problematic/unproblematic situations

Hypothesis 2 The Gain achieved by a long description over a minimal description will be larger in a problematic situation than in a non-problematic situation Informally: redundancy helps especially in problematic situations

But... Even more redundancy might have helped even more The obvious candidate: a complete description Compare cases where our algorithm prescribes a complete description with ones where it does not. We want b to be greater than a: a = Gain(complete-description, incomplete-description-generated-by-algorithm) b = Gain(complete-description-generated-by-algorithm, incomplete-description)

Hypothesis 3 The Gain of a complete description over a less complete one will be larger for a situation in which our algorithms generated the complete description, than for a situation in which our algorithms generated the less complete description.

Results: Hypothesis 1 Do redundant descriptions benefit problematic situations?

Results: Hypothesis 1 Do redundant descriptions benefit problematic situations? Yes!

Results: Hypothesis 2 Do redundant descriptions benefit problematic situations MORE than non-problematic situations?

Comparing like with like General Linear Model (GML) with repeated measures Comparison of similar situations, e.g. 2 and 7 sit2&7: minimal = pic.3 in part A redundant = pic.3 in part A of section 2 sit2: reader is in same section as target sit7: reader is in a different section

Results: Hypothesis 2 Do redundant descriptions benefit problematic situations MORE than non-problematic situations? Yes!

Results: Hypothesis 3 FI Are our algorithms economical with redundancy?

Results: Hypothesis 3 FI Are our algorithms economical with redundancy? Yes!

How much overspecification is optimal ? University of Brighton Watts building Cockcroft building North Wing SouthNorth West South library auditorium The auditorium The...in the North Wing The.... in the Watts building The.... on this campus

Which of all these descriptions is best? Depends on issues other than the structure of the domain, e.g., –how much time/space has the speaker/writer available? –how important is it that misunderstandings are avoided? [cf., Van Deemter et al., this conference] –is there room for negotiation through dialogue [cf., Khan et al., this conference])

In setting of this experiment We did not find a point beyond which overspecification backfires We did find a point of diminishing returns for resolution speed Given that interpretation deteriorates with every added property, the figures are suggestive

Getting a feeling for the numbers Nonproblematic situations (situations 7 and 8): –short descr: 1.53 clicks (2 properties) –redundant (other): 1.34 clicks (3 properties) Problematic situations (situations 3 and 4): –short descr: 4.05 clicks (1 property) –redundant (algorithm): 1.77 clicks (2 properties) –redundant(other): 1.31 clicks (3 properties)

Conclusion Overspec can have many reasons (Jordan 2000, Horacek 2005) Overspec isnt always equally necessary Focus on overspec for guiding resolution The optimum amount of overspec is hard to determine But we have found a point of diminishing returns, based on the need to avoid DE and LO.

Additional slides

[ A medical comparison A hospital with two types of patients, all of whom have coughing (cf., clicking!) as their main symptom –chest infections (serious patients) –throat infections (light patients) you can administer 1, 2, or 3 of pills (cf., properties). But pills can be harmfull, so the doctor uses them sparingly

The doctors regime: light patients should get 1 pill serious patients should get 2 pills on a normal night, and 3 pills on a bad night Is this a wise regime? Tests were done...

Test of effectiveness of pills 1.Serious patients who get their 2 or 3 pills start coughing less 2.Serious patients benefit more from getting their prescribed high number of pills (as opposed to just 1) than light patients 3.Focus on serious patients. Try giving the ones that are having a good night 3 pills (i.e. one more than prescribed). They benefit less (from getting 3 instead of 2 pills) than the ones that are having a bad night benefitted (from getting 3 instead of 2 pills). ]

Results on Search Behaviour # Deviations from Ancestral Search in first navigation action for 12 documents with incomplete descriptions