Download presentation
Presentation is loading. Please wait.
Published byAshley Sims Modified over 9 years ago
1
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010
2
Introduction Surface text similarity is not a reliable indicator in automatic MT evaluation Insensitive to variation of translation Deeper linguistic analysis is preferred WordNet is widely used for matching synonyms E.g. METEOR (Banerjee & Lavie 2005), TERp (Snover et al. 2009), ATEC (Wong & Kit 2010)… Is the similarity of words between MT outputs and references fully described?
3
Motivation WordNet Granularity of sense distinctions is highly fine-grained Word pairs not in the same sense: [mom vs mother], [safeguard vs security], [expansion vs extension], [journey vs tour], [impact vs influence]…etc. Word pairs in similar meaning Problematic if ignore them in evaluation What is needed is a word similarity measure Proposal: Utilization of word similarity measures in automatic MT evaluation
4
Word Similarity Measures Knowledge-based (WordNet) Wup (Wu & Palmer 1994) Res (Resnik 1995) Jcn (Jiang & Conrath 1997) Hso (Hirst & St-Onge 1998) Lch (Leacock & Chodorow 1998) Lin (Lin 1998) Lesk (Banerjee & Pedersen 2002) Corpus-based LSA (Landauer et al. 1998)
5
Experiment Three questions: To what extent two words are considered similar? Which word similarity measure(s) is/are more appropriate to use? How much performance gain an MT evaluation metric can obtain by incorporating word similarity measures?
6
Setting Data MetricsMATR08 development data 1992 MT outputs 8 MT systems 4 references Evaluation metric Unigram matching Exact match / synonym / semantically similar Same weight Three variants Precision (p), recall (r) and F-measure (f) where c: MT output t: reference translation
7
Result (1) Correlation thresholds of each measure
8
Result (2) Correlation of the metric
9
Conclusion The importance of semantically similar words in automatic MT evaluation Two word similarity measures, wup and LSA, perform relatively better Remaining problems Semantic similarity vs. Semantic relatedness E.g. [committee vs chairman] (LSA) Most WordNet similarity measures run on verbs and nouns only
10
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.