Download presentation
Presentation is loading. Please wait.
Published byMillicent Warner Modified over 9 years ago
1
Linguistically Targeted Test Suites November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro
2
Linguistic Core evaluation for Linguistic Core MT Corpus of naturally occurring sentences in Kinyarwanda and Malagasy Sentences are annotated with tags showing constructions of interest (relative clauses, passives, etc.) Example: – Conditional, relative clause, headless relative clause, VOS, voice alternation, proximity, adjectival predicate – To really increase farmers’ representation in national politics, it is not enough to increase the number of delegates elected by farmers. – Tsy ampy ny mampitombo ny isan'ny solontena fidian'ny tantsaha raha tiana ny hampitomboana ny solontenan'ny tantsaha eo amin'ny sehatra nasionaly.
3
Lexical similarity is not always a good measure of translation quality Good translations have low scores – when higher order n-grams don’t match Bad translations persist when function words are undervalued – Errors in tense, definiteness, and negation persist Lack of error analysis – Well understood constructions like relative clauses and passive voice are not modelled
4
Underrating good translations From Giménez and Màrquez, 2010, page212 HYP: On Tuesday several missiles and mortar shells fell in southern Israel, but there were no casualties. R1: Several Qassam rockets and mortar shells were fired on southern Israel today Tuesday without victims. R2: Several Qassam rockets and mortars hit southern Israel today without causing any casualties. R3: A number of Quassam rockets and Howitzer missiles fell over southern Israel today, Tuesday, without causing any casualties. R4: Several Qassam rockets and mortar shells fell today, Tuesday on southern Israel without causing any victim. R5: Several Qassam rockets and mortar shells fell today, Tuesday, in southern Israel without causing any casualties. Acceptable to human translators but low BLEU score because of no higher order n-gram matches.
5
Underrating good translations From our Malagasy-English system: Low BLEU score 0.0149826 – HYP: many held for many months but have no right to a lawyer. – REF: many got arrested for months without any right to have access to any lawyer. High BLEU score 0.510864 – HYP: in a long post, called for freedom for other members of the committee for zon'oombelona i koohyar goodarzi. – REF: koohyar goodarzi in a long post asked for freedom for other members of the committee of human rights.
6
Overrating bad translations Lack of focus on function words Google Translate, October 31, 2012 – Chinese to English: Lost tense and missed preposition I saw the person you talked to 我看到了你交談的人 I see the person you are talking – English to Japanese: Trouble with negative determiner “no” No students bought books. いかなる生徒は本を買った。 Any student bought a book.
7
Not identifying the source of errors In mature MT systems, many systematic errors occur in well-understood linguistic constructions: – Relative clauses (non-subject gaps) Google translate, October 31, 2012 I saw the person you gave a book to. 私はあなたに本をくれた人を見た。 I saw a man who gave me the book for you.
8
Linguistic Evaluation Evaluation based on syntactic or semantic roles is not reliable in the early stages of development when the output cannot be parsed well. – Och et al. 2003; Giménez and Màrquez 2010
9
Early Stage Linguistic Core Evaluation How well are we translating specific constructions? Preliminary list of constructions of interest in Kinyarwanda and Malagasy: – Relative clauses – Passives and other non-active voices – Clefts and focus constructions – Conditional sentences – Comparatives – VOS word order – Causatives – Applicatives
10
Early stage linguistically targeted evaluation Automatic measures of lexical similarity – Which constructions correlate with low scores? Error analysis conducted by human system developers
11
Plans More constructions – Possessives, rates, questions, tense, mood, aspect, etc. Evaluation metrics based on linguistic structure – Such as lexical similarity of syntactic and semantic functions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.