Download presentation
Presentation is loading. Please wait.
Published byMarsha Murphy Modified over 9 years ago
1
Putting development and evaluation of core technology first Anja Belz Natural Language Technology Group University of Brighton, UK N L T G
2
N L T G Belz: Putting development and evaluation of core technology first Overview NLG needs comparative evaluation Core technology first, applications second Towards common subtasks, corpora and evaluation techniques What kind of STEC event for NLG?
3
N L T G Belz: Putting development and evaluation of core technology first NLG needs comparative evaluation NLG has strong evaluation traditions But there has been no comparative evaluation, except handful of results, e.g.: –regenerating the Wall Street Journal Corpus –SumTime wind forecast generation At present, we don’t really know which NLG techniques generally work better For consolidation of results and collective progress, need ability to comparatively evaluate
4
N L T G Belz: Putting development and evaluation of core technology first Core technology first, applications second Biggest challenge: identifying sharable tasks Shared application—potentially divisive: –NLG is a varied field with many applications –hard to select one with enough agreement –evaluation results would be application-specific Instead—choose tasks that can unify NLG: –tasks that are relevant to all NLG –core technology that is potentially useful to all NLG –utilise commonalities and agreement that have already emerged: GRE, lexicalisation, content ordering
5
N L T G Belz: Putting development and evaluation of core technology first Towards common subtasks, corpora and evaluation techniques Standardising subtasks and input/output requirements Building data resources for building and evaluating systems Creating NLG-specific evaluation techniques –ISO quality characteristics: functionality, reliability, usability, efficiency, maintainability, portability –Need to focus on evaluation of quality of outputs: (New) GENEVAL : test existing and new evaluation techniques –that assess different evaluation criteria –and have a range of associated cost/time requirements
6
N L T G Belz: Putting development and evaluation of core technology first What kind of STEC? Don’t have an NLG STEC at application level (yet) Don’t invest millions (yet) Don’t have a large organisation run it (yet) Because: –NLG technology isn’t ready –participation would involve large investment in terms of money and time –not many groups would be able to do that –would have to decide on an application – potentially divisive
7
N L T G Belz: Putting development and evaluation of core technology first What kind of STEC? Do encourage many different shared tasks and subtasks (at least, initially) Involve many NLG researchers in organising STECs Involve SIGGEN, have steering committee Because: –diversity in tasks reflects diversity of field (NLG just isn’t one thing) –it’s inclusive and representative –control stays with international academic community
8
N L T G Belz: Putting development and evaluation of core technology first Stakeholder STECs Similar to SemEval 2007 (Senseval 4) As opposed to shareholder STECs like DUC and MT-Eval Annual STEC event attached to INLG and ENLG Call for task proposals Proposers organise and run their own STEC tasks Ready test bed for new tasks: popular tasks grow, less popular ones disappear
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.