Carlos S. C. Teixeira Universitat Rovira i Virgili Knowledge of provenance: How does it affect TM/MT integration? New Research in Translation and Interpreting Studies (Tarragona – May 2011)
◦ Text editors (e.g. Word) ◦ E-dictionnaries ◦ Online references ◦ Translation-memory (TM) systems ◦ Machine translation (MT) CAT: Computer-aided translation Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Christensen, Tina Paulsen & Anne Schjoldager “Translation-Memory (TM) Research: What Do We Know and How Do We Know It?” Hermes – Journal of Language and Communication Studies. O’Brien, Sharon Eye-tracking and translation memory matches. Perspectives: Studies in Translatology 14, n. 3: Guerberof, Ana Productivity and quality in the post-editing of outputs from translation memories and machine translation. Localisation Focus - The International Journal of Localisation 7, n. 1: Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
◦ Source text ◦ Translation memory (TM) ◦ Machine translation (MT) ◦ Tools (TM+MT) ◦ Translator Two environments: 1) ‘Regular’ translation 2) Pretranslation + postediting Let’s have a look ⇨ Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Speed: Will you translate faster? Effort: Will you feel more tired? Quality: Will you translate better? Reason: Does provenance play a role? Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Speed: V is faster than B H1: The translation speed is higher in V than in B Effort: V requires less editing than B H2: The amount of editing is smaller in V than in B Quality: V and B produce similar quality H4: There is no significant difference in quality between V and B Provenance (inversely) affects effort H3: The indication of provenance has an influence on the smaller amount of editing in V Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
English textSpanish text Translation Memory (Alignment) Source text 1 Source text 2 Exact matches 90-99% fuzzy 80-89% fuzzy 70-79% fuzzy No matches (MT) TM 1 TM 2
◦ Same type of text ◦ Same types of matches ◦ Same machine-translation engine So what is different? Provenance information Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Data treatment 1 st RENDERINGTYPINGNOTES2 nd RENDERINGTYPINGNOTES 1FUZZY 75% 00:00,0000:40,3340, :38,4418:43,8905,450 00,00 2FUZZY 86% 00:40,5601:37,7857, :44,5618:46,2201,660 00,00 3NO MATCH 02:30,3304:31,67121,3417Asks a question to researcher18:46,8919:19,3332, ,00 4NO MATCH 04:31,6704:38,2206,55 19:20,0019:28,2208,220 05:05,7805:09,5603,78 00,00 06:28,5606:51,4422,88 00,00 11:51,3313:06,3375, ,00 14:04,3314:35,1130,7834 5NO MATCH 14:35,6715:57,2281, :28,7819:43,6714,898 00,00 6NO MATCH 15:57,8917:17,8980, :44,3319:59,8915,560 00,00 7FUZZY 87% 17:18,5619:14,44115, :00,4420:16,5616,125 00,00 8EXACT 19:14,4420:49,1194, :17,2220:34,3317,119 22:47,2223:03,7816,561 00,00 9NO MATCH 23:04,3323:24,4420,11- 20:35,0020:59,6742, :08,4426:52,0043, ,00 27:53,5628:15,2221, ,00 10FUZZY 95% 28:16,0030:35,11139, :00,4421:11,6711,230 31:03,5631:39,7836,220 00,00 31:51,3332:48,6757, ,00 33:23,6734:28,0064, ,00 11FUZZY 99% 34:28,5634:51,6723, :12,2221:13,3301,110 00,00 12FUZZY 74% 34:52,3335:14,5622,230 21:14,1121:25,1111,000 35:41,1137:17,8996, ,00 37:46,4438:04,3317,892 00,00 43:11,5643:19,5608,000 00,00 55:10,6755:51,8941, EXACT 55:52,4457:12,2279, :25,7821:39,5613,780 00,00 14EXACT 57:12,7857:23,2210,443 Researcher interrupts subject to tell he has to leave the room for a while21:40,0021:44,3304,330 57:35,8958:25,2249, ,00 59:10,2259:31,8921,67 00,00 15NO MATCH 59:32,5600:44,8972, :44,8921:58,7813,890 00,00 16EXACT 00:45,5601:27,4441,88 21:59,4422:04,1104,670 02:28,7802:48,2219,44 00,00 05:22,4405:33,5611,12 00,00 05:37,7806:25,3347,55 00,00 06:54,0007:17,8923,89 00,00 09:11,7809:31,0019,22 00,00 11:05,6711:24,2218, ,00 12:07,8912:28,1120,22 00,00 13:15,6713:32,1116,44 00,00 13:44,2214:24,1139, ,00 258,20 00,00 17FUZZY 86% 14:24,6715:01,4436, :04,6722:17,1112,440 00,00 18NO MATCH 15:02,0015:14,0012, :17,6722:18,8901,220 00,00 19FUZZY 93% 15:14,6715:35,2220, :19,4423:00,4441,0023Check sound here! 15:56,0016:24,0028, ,00 20FUZZY 72% 16:24,5617:11,5647, :01,1123:04,3303,220 00,00 21EXACT 17:12,2217:47,1134,899 23:04,8923:14,4409,550 00,00
Data treatment SOURCE WORDS TIME (sec) 1 st rendition SPEED (words/h) 1 st rendition TIME (sec) Proof- reading SPEED (words/h) Combined TARGET CHARS TYPED CHARS 1 st rendition AMOUNT OF EDITING 1 st rendition TYPED CHARS 2 nd rend AMOUNT OF EDITING Combined TRANSLATION BLIND (Text12) EXACT (100%) MATCHES SEGMENT #130111, , ,95%984,87% SEGMENT #23079, , ,09%0 SEGMENT #32581, , ,33%0 SEGMENT #418258,22514, ,49%0 SEGMENT #52534, , ,77%0 TOTAL128565, , ,13%939,27% 90-99% MATCHES SEGMENT # , ,52%0 SEGMENT #2723, , ,56%0 SEGMENT #32048, ,03%2382,35% TOTAL65368, , ,66%2349,57% 80-89% MATCHES SEGMENT #12757, , ,11%0 SEGMENT #224115, , ,13%566,25% SEGMENT #32636, , ,19%0 TOTAL77209, , ,70%539,90% 70-79% MATCHES SEGMENT #11640, , ,19%0 SEGMENT #244186, ,19%0 SEGMENT # , ,49%0 TOTAL77273, , ,06%0 NO MATCHES (MT FEEDS) SEGMENT #131121, , ,76%1916,44% SEGMENT #230138,997778, ,02%0 SEGMENT #32681, , ,38%819,61% SEGMENT # , ,52%0 SEGMENT #51585, , ,11%26109,47% SEGMENT #62972, , ,57%0 SEGMENT # , ,33%0 TOTAL165591, , ,62%5333,58%
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili SOURCE WORDS TIME (sec) 1 st rendition SPEED (words/h) 1 st rendition TIME (sec) 2 nd rendition SPEED (words/h) Combined TARGET CHARS TYPED CHARS 1 st rendition AMOUNT OF EDITING 1 st rendition TYPED CHARS 2 nd rend AMOUNT OF EDITING Combined COPY ,892337, ,18% TRANSL W/O CAT 79380,89746, ,78% VISUAL EXACT (100%) MATCHES % MATCHES % MATCHES % MATCHES NO MATCHES (MT FEEDS) BLIND EXACT (100%) MATCHES ,13%939,27% 90-99% MATCHES ,66%2349,57% 80-89% MATCHES ,70%539,90% 70-79% MATCHES ,06%0 NO MATCHES (MT FEEDS) ,62%5333,58% ,71% Preliminary results
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Conclusions ?Generalisations? Specific type of text Particular subject Given fuzzy match grid A particular MT engine
Editing Quality Interviews Statistical analysis Eye-tracking? Translog? Camtasia? Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili More subjects
Carlos S. C. Teixeira Universitat Rovira i Virgili Thank you! New Research in Translation and Interpreting Studies (Tarragona – May 2011)