Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linguistically Targeted Test Suites November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro.

Similar presentations


Presentation on theme: "Linguistically Targeted Test Suites November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro."— Presentation transcript:

1 Linguistically Targeted Test Suites November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro

2 Linguistic Core evaluation for Linguistic Core MT Corpus of naturally occurring sentences in Kinyarwanda and Malagasy Sentences are annotated with tags showing constructions of interest (relative clauses, passives, etc.) Example: – Conditional, relative clause, headless relative clause, VOS, voice alternation, proximity, adjectival predicate – To really increase farmers’ representation in national politics, it is not enough to increase the number of delegates elected by farmers. – Tsy ampy ny mampitombo ny isan'ny solontena fidian'ny tantsaha raha tiana ny hampitomboana ny solontenan'ny tantsaha eo amin'ny sehatra nasionaly.

3 Lexical similarity is not always a good measure of translation quality Good translations have low scores – when higher order n-grams don’t match Bad translations persist when function words are undervalued – Errors in tense, definiteness, and negation persist Lack of error analysis – Well understood constructions like relative clauses and passive voice are not modelled

4 Underrating good translations From Giménez and Màrquez, 2010, page212 HYP: On Tuesday several missiles and mortar shells fell in southern Israel, but there were no casualties. R1: Several Qassam rockets and mortar shells were fired on southern Israel today Tuesday without victims. R2: Several Qassam rockets and mortars hit southern Israel today without causing any casualties. R3: A number of Quassam rockets and Howitzer missiles fell over southern Israel today, Tuesday, without causing any casualties. R4: Several Qassam rockets and mortar shells fell today, Tuesday on southern Israel without causing any victim. R5: Several Qassam rockets and mortar shells fell today, Tuesday, in southern Israel without causing any casualties. Acceptable to human translators but low BLEU score because of no higher order n-gram matches.

5 Underrating good translations From our Malagasy-English system: Low BLEU score 0.0149826 – HYP: many held for many months but have no right to a lawyer. – REF: many got arrested for months without any right to have access to any lawyer. High BLEU score 0.510864 – HYP: in a long post, called for freedom for other members of the committee for zon'oombelona i koohyar goodarzi. – REF: koohyar goodarzi in a long post asked for freedom for other members of the committee of human rights.

6 Overrating bad translations Lack of focus on function words Google Translate, October 31, 2012 – Chinese to English: Lost tense and missed preposition I saw the person you talked to 我看到了你交談的人 I see the person you are talking – English to Japanese: Trouble with negative determiner “no” No students bought books. いかなる生徒は本を買った。 Any student bought a book.

7 Not identifying the source of errors In mature MT systems, many systematic errors occur in well-understood linguistic constructions: – Relative clauses (non-subject gaps) Google translate, October 31, 2012 I saw the person you gave a book to. 私はあなたに本をくれた人を見た。 I saw a man who gave me the book for you.

8 Linguistic Evaluation Evaluation based on syntactic or semantic roles is not reliable in the early stages of development when the output cannot be parsed well. – Och et al. 2003; Giménez and Màrquez 2010

9 Early Stage Linguistic Core Evaluation How well are we translating specific constructions? Preliminary list of constructions of interest in Kinyarwanda and Malagasy: – Relative clauses – Passives and other non-active voices – Clefts and focus constructions – Conditional sentences – Comparatives – VOS word order – Causatives – Applicatives

10 Early stage linguistically targeted evaluation Automatic measures of lexical similarity – Which constructions correlate with low scores? Error analysis conducted by human system developers

11 Plans More constructions – Possessives, rates, questions, tense, mood, aspect, etc. Evaluation metrics based on linguistic structure – Such as lexical similarity of syntactic and semantic functions


Download ppt "Linguistically Targeted Test Suites November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro."

Similar presentations


Ads by Google