Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Ehud Reiter, Computing Science, University of Aberdeen1 NLG Shared Tasks: Lets try it and see what happens Ehud Reiter (Univ of Aberdeen)

Similar presentations


Presentation on theme: "Dr. Ehud Reiter, Computing Science, University of Aberdeen1 NLG Shared Tasks: Lets try it and see what happens Ehud Reiter (Univ of Aberdeen)"— Presentation transcript:

1 Dr. Ehud Reiter, Computing Science, University of Aberdeen1 NLG Shared Tasks: Lets try it and see what happens Ehud Reiter (Univ of Aberdeen) http://www.csd.abdn.ac.uk/~ereiter

2 Dr. Ehud Reiter, Computing Science, University of Aberdeen2 Contents l General Comments l Geneval proposal

3 Dr. Ehud Reiter, Computing Science, University of Aberdeen3 Good points of Shared Task l Compare different approaches l Encourage people to interact more l Reduce NLG “barriers to entry” l Better understanding of evaluation

4 Dr. Ehud Reiter, Computing Science, University of Aberdeen4 Bad Points l May narrow focus of community »IR ignored web search because of TREC? l May encourage incremental research instead of new ideas

5 Dr. Ehud Reiter, Computing Science, University of Aberdeen5 My opinion l Lets give it a try l But I suspect one-off exercises are better than a series »Many people think MUC, DUC, etc were very useful initially but became less scientifically exciting over time

6 Dr. Ehud Reiter, Computing Science, University of Aberdeen6 Practical Issues l Domain/task? »Need something which several (6?) group are interested in l Evaluation technique »Avoid techniques that are biased –Eg, some automatic metrics may favour stat systems

7 Dr. Ehud Reiter, Computing Science, University of Aberdeen7 Geneval l Proposal to evaluate NLG evaluation »Core idea is to evaluate in many ways a set of systems with similar input/output functionality, and see how well different evaluation techniques correlate »Anja Belz and Ehud Reiter »Hope to submit to EPSRC (roughly similar to NSF in US) soon

8 Dr. Ehud Reiter, Computing Science, University of Aberdeen8 NLG Evaluation l Many types »Task-based, human ratings, BLEU-like metrics, etc l Little consensus on best technique »Ie, most appropriate for a context l Poorly understood

9 Dr. Ehud Reiter, Computing Science, University of Aberdeen9 Some open questions l How well do diff types correlate? »Eg, does BLEU predict human ratings? l Are there biases? »Eg, are statistical NLG systems over/under rated by some techniques? l What is best design? »Number of subjects, subject expertise, number (quality) of reference texts, etc

10 Dr. Ehud Reiter, Computing Science, University of Aberdeen10 Belz and Reiter (2006) l Evaluated several systems for generating wind statements in weather forecasts, using both human judgements and BLEU-like metrics l Found OK (not wonderful) correlation, but also some biases l Geneval: do this on a much larger scale »More domains, more systems, more evaluation techniques (including new ones), etc

11 Dr. Ehud Reiter, Computing Science, University of Aberdeen11 Geneval: Possible Domains l Weather forecasts (not wind statements) »Use SumTime corpus l Referring expressions »Use Prodigy-Grec or Tuna corpus l Medical summaries »Use Babytalk corpus l Statistical summaries »Use Atlas corpus

12 Dr. Ehud Reiter, Computing Science, University of Aberdeen12 Geneval: Evaluation techniques l Human task-based »Eg, referential success l Human ratings »Likert vs pref; expert vs non-expert l Automatic metrics based on ref texts »BLEU, ROUGE, METEOR, etc l Automatic metrics without ref texts »MT T and X scores, length

13 Dr. Ehud Reiter, Computing Science, University of Aberdeen13 Geneval: new techniques l Would also like to explore and develop new evaluation techniques »Post-edit based human evaluations? »Automatic metrics which look at semantic features? »Open to suggestions for other ideas!

14 Dr. Ehud Reiter, Computing Science, University of Aberdeen14 Would like systems contributed l Study would be better if other people would contribute systems »We supply data sets and corpora, and carry out evaluations »So you can focus 100% on your great new algorithmic ideas!

15 Dr. Ehud Reiter, Computing Science, University of Aberdeen15 Geneval from STEC perspect l Sort of like STEC??? »If people contribute systems based on our data sets and corpora »But results will be anonymised –only developer of system X knows how well X did »One-off exercises, not repeated »Multiple evaluation techniques l Hope data sets will reduce barriers to entry

16 Dr. Ehud Reiter, Computing Science, University of Aberdeen16 Geneval l Please let Anja or I know if »You have general comments, and/or »You have a suggestion for an additional evaluation technique »You might be interested in contributing a system


Download ppt "Dr. Ehud Reiter, Computing Science, University of Aberdeen1 NLG Shared Tasks: Lets try it and see what happens Ehud Reiter (Univ of Aberdeen)"

Similar presentations


Ads by Google