Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and Information Processing Lab University of Maryland Crowdsourcing Translation with People Who Speak Only One Language
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Source: Global Reach, Internet World Stats Languages on Internet by Population
Source: Global Reach, Internet World Stats Languages on Internet by Population
Source: Global Reach, Internet World Stats Languages on Internet by Population
A real-world problem: International Childrens Digital Library
Machine Translation (MT) Large volume, cheap, fastUnreliable quality ( = restaurant, dining hall)
Professional Translators High quality, but slow and expensive (even for common language pairs)
Translation with the Crowd Bottle neck: bilingual people
Translation with the Crowd vs. 75,000 contributors Wikipedia: 800 translators Translation with the Monolingual Crowd
Quality Affordability Machine Translation Machine Translation Professional Bilingual Human Participation Amateur Bilingual Human Participation Monolingual Human Participation Monolingual Human Participation
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Basic Idea Original source sentence Fluent translation MT Inaccurate back translation Fluent, accurate source sentence MT Et cetera… Source language speaker MT Inaccurate translation Target language speaker
An (Richer) Example
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Mary
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Mary Sees: In general, it means well, both. MT
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. MT
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Sees: En général, Il est à la fois de nous.(*) Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. MT
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Sees: En général, Il est à la fois de nous.(*) Edits into: En général, nous nous entendons bien. (lit. In general, we get along well.) Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. MT
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Sees: En général, Il est à la fois de nous.(*) Edits into: En général, nous nous entendons bien. (lit. In general, we get along well.) Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. Sees: In general, we get along fine. MT enrichment In generalEn général Get alongNous entendons
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Sees: En général, Il est à la fois de nous.(*) Edits into: En général, nous nous entendons bien. (lit. In general, we get along well.) Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. Sees: In general, we get along fine. MT enrichment
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Sees: En général, Il est à la fois de nous.(*) Edits into: En général, nous nous entendons bien. (lit. In general, we get along well.) Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. Sees: In general, we get along fine. Edits into: In general, we get along well. MT enrichment
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Sees: En général, Il est à la fois de nous.(*) Edits into: En général, nous nous entendons bien. (lit. In general, we get along well.) Sees: En général, nous nous entendons bien. (lit. In general, we get along well.) Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. Sees: In general, we get along fine. Edits into: In general, we get along well. MT enrichment
Pierre Says: En général, on s'entend bien, tous les deux. (lit. In general, we get along together, the two of us.) Sees: En général, Il est à la fois de nous.(*) Edits into: En général, nous nous entendons bien. (lit. In general, we get along well.) Sees: En général, nous nous entendons bien. (lit. In general, we get along well.) Proposes to stop with current translation Mary Sees: In general, it means well, both. Edits into: In general, it is about both of us. Sees: In general, we get along fine. Edits into: In general, we get along well. Agrees to stop with current translation MT enrichment
Monotrans Protocol
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Web link Image Mark OK Mark unclear
Why translation by monolingual people? How Monotrans works Research prototype Preliminary evaluation Outline
Preliminary Evaluation Older version of the UI (same protocol) Childrens book, Russian to Chinese 2 Russian speakers and 4 Chinese speakers formed 4 Pairs* 1 hour per pair
Results 44 sentences (6 pages) worked on 28 sentences finished ( 4 pages) Overall translation speed: 50 words per hour professional translator speed: 250 words per hour
Evaluation
Google Translate …
… Monotrans
Where to from here? Larger and more formal validation of the protocol Richer annotations Images Web links Marking correct spans Marking incorrect spans Paraphrase Word clouds …?? Large-scale crowd support (CrowdFlow
Monolingual translation can help large-scale translation Translation with monolingual people is actually feasible Take-Away Message
Sponsors
Thank You Q&A
Backup slides
Projected annotation Project information from one language to another using word alignments as a bridge Illustration of how this has been done for natural language annotation [Kolak 2005]
Projected annotation Tout le monde doit entendre l'histoire de Cendrillon Everybody has heard the business by Cinderella MT Tout le monde doit entendre l'histoire de Cendrillon Everybody has heard the business by Cinderella MT Everybody has heard the story about Cinderella Tout le monde doit entendre l'histoire de Cendrillon Everybody has heard the business by Cinderella MT => Pilot experiment results: Projected annotations helped improve translation
One of my examples involves rmvng ll th vwls frm th wrds nd shwng tht th rdr cn stll ndrstnd th sntnc.
Tout le monde doit entendre l'histoire de Cendrillon. MT Pilot experiment results: Post-editing machine translation output by monolingual people improves translation quality Everybody has hear story about Cinderella Everybody has heard the story about Cinderella Three Types of Errors I. Detectable and Correctable Error
Everybody has heard the story about Cinderella Tout le monde doit entendre l'histoire de Cendrillon. Everybody has hear story about Cinderella MT Everybody has heard the business by Cinderella II. Detectable but not Correctable Error Communication needed Three Types of Errors
Everybody has heard the story about Cinderella Tout le monde doit entendre l'histoire de Cendrillon. Everybody has hear story about Cinderella MT Everybody has heard the business by Cinderella II. Detectable but not Correctable Error Pilot experiment results: Communication through enrichment channel can improve translation Three Types of Errors
Everybody has heard the story about Cinderella Tout le monde doit entendre l'histoire de Cendrillon. Everybody has hear story about Cinderella MT Everybody has heard the business by Cinderella Everybody loves the story about Cinderella Need more redundancy III. Undetectable Error Add more redundancy, reduce it to type I or type II Three Types of Errors
Prototype Evaluation System seems promising (1=unintelligible, 4=very intelligible)(1=not translated, 5=full meaning)