Carlos S. C. Teixeira Intercultural Studies Group Universitat Rovira i Virgili (Tarragona, Spain) Knowledge of provenance and its effects on translation performance (in an integrated TM/MT environment) NLPCS th International Workshop on Natural Language Processing and Cognitive Science Special Issue: Human-Machine Interaction in Translation August, Copenhagen, Denmark
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Speed: Will you translate faster? Effort: Will you feel more tired? Quality: Will you translate better? Reason: Does provenance play a role? Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Speed: V is faster than B H1: The translation speed is higher in V than in B Effort: V requires less editing than B H2: The amount of editing is smaller in V than in B Quality: V and B produce similar quality H4: There is no significant difference in quality between V and B Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
English textSpanish text Translation Memory (Alignment) Source text 1 Source text 2 Exact matches 90-99% fuzzy 80-89% fuzzy 70-79% fuzzy No matches (MT) TM 1 TM 2
◦ Same type of text ◦ Same types of matches ◦ Same machine-translation engine (ecological validity) So what is different? Provenance information Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
BBFlashBack ◦ Screen activity ◦ Keystrokes ◦ Mouse movements and clicks ◦ Translator’s face ◦ Sound (voices, keyboard, etc) Retrospective interviews Quality assessment
Data treatment 1 st RENDERINGTYPINGNOTES2 nd RENDERINGTYPINGNOTES 1FUZZY 75% 00:00,0000:40,3340, :38,4418:43,8905,450 00,00 2FUZZY 86% 00:40,5601:37,7857, :44,5618:46,2201,660 00,00 3NO MATCH 02:30,3304:31,67121,3417Asks a question to researcher18:46,8919:19,3332, ,00 4NO MATCH 04:31,6704:38,2206,55 19:20,0019:28,2208,220 05:05,7805:09,5603,78 00,00 06:28,5606:51,4422,88 00,00 11:51,3313:06,3375, ,00 14:04,3314:35,1130,7834 5NO MATCH 14:35,6715:57,2281, :28,7819:43,6714,898 00,00 6NO MATCH 15:57,8917:17,8980, :44,3319:59,8915,560 00,00 7FUZZY 87% 17:18,5619:14,44115, :00,4420:16,5616,125 00,00 8EXACT 19:14,4420:49,1194, :17,2220:34,3317,119 22:47,2223:03,7816,561 00,00 9NO MATCH 23:04,3323:24,4420,11- 20:35,0020:59,6742, :08,4426:52,0043, ,00 27:53,5628:15,2221, ,00 10FUZZY 95% 28:16,0030:35,11139, :00,4421:11,6711,230 31:03,5631:39,7836,220 00,00 31:51,3332:48,6757, ,00 33:23,6734:28,0064, ,00 11FUZZY 99% 34:28,5634:51,6723, :12,2221:13,3301,110 00,00 12FUZZY 74% 34:52,3335:14,5622,230 21:14,1121:25,1111,000 35:41,1137:17,8996, ,00 37:46,4438:04,3317,892 00,00 43:11,5643:19,5608,000 00,00 55:10,6755:51,8941, EXACT 55:52,4457:12,2279, :25,7821:39,5613,780 00,00 14EXACT 57:12,7857:23,2210,443 Researcher interrupts subject to tell he has to leave the room for a while21:40,0021:44,3304,330 57:35,8958:25,2249, ,00 59:10,2259:31,8921,67 00,00 15NO MATCH 59:32,5600:44,8972, :44,8921:58,7813,890 00,00 16EXACT 00:45,5601:27,4441,88 21:59,4422:04,1104,670 02:28,7802:48,2219,44 00,00 05:22,4405:33,5611,12 00,00 05:37,7806:25,3347,55 00,00 06:54,0007:17,8923,89 00,00 09:11,7809:31,0019,22 00,00 11:05,6711:24,2218, ,00 12:07,8912:28,1120,22 00,00 13:15,6713:32,1116,44 00,00 13:44,2214:24,1139, ,00 258,20 00,00 17FUZZY 86% 14:24,6715:01,4436, :04,6722:17,1112,440 00,00 18NO MATCH 15:02,0015:14,0012, :17,6722:18,8901,220 00,00 19FUZZY 93% 15:14,6715:35,2220, :19,4423:00,4441,0023Check sound here! 15:56,0016:24,0028, ,00 20FUZZY 72% 16:24,5617:11,5647, :01,1123:04,3303,220 00,00 21EXACT 17:12,2217:47,1134,899 23:04,8923:14,4409,550 00,00
Data treatment SOURCE WORDS TIME (sec) 1 st rendition SPEED (words/h) 1 st rendition TIME (sec) Proof- reading SPEED (words/h) Combined TARGET CHARS TYPED CHARS 1 st rendition AMOUNT OF EDITING 1 st rendition TYPED CHARS 2 nd rend AMOUNT OF EDITING Combined TRANSLATION BLIND (Text12) EXACT (100%) MATCHES SEGMENT #130111, , ,95%984,87% SEGMENT #23079, , ,09%0 SEGMENT #32581, , ,33%0 SEGMENT #418258,22514, ,49%0 SEGMENT #52534, , ,77%0 TOTAL128565, , ,13%939,27% 90-99% MATCHES SEGMENT # , ,52%0 SEGMENT #2723, , ,56%0 SEGMENT #32048, ,03%2382,35% TOTAL65368, , ,66%2349,57% 80-89% MATCHES SEGMENT #12757, , ,11%0 SEGMENT #224115, , ,13%566,25% SEGMENT #32636, , ,19%0 TOTAL77209, , ,70%539,90% 70-79% MATCHES SEGMENT #11640, , ,19%0 SEGMENT #244186, ,19%0 SEGMENT # , ,49%0 TOTAL77273, , ,06%0 NO MATCHES (MT FEEDS) SEGMENT #131121, , ,76%1916,44% SEGMENT #230138,997778, ,02%0 SEGMENT #32681, , ,38%819,61% SEGMENT # , ,52%0 SEGMENT #51585, , ,11%26109,47% SEGMENT #62972, , ,57%0 SEGMENT # , ,33%0 TOTAL165591, , ,62%5333,58%
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili SOURCE WORDS TIME (sec) 1 st rendition SPEED (words/h) 1 st rendition TIME (sec) 2 nd rendition SPEED (words/h) Combined TARGET CHARS TYPED CHARS 1 st rendition AMOUNT OF EDITING 1 st rendition TYPED CHARS 2 nd rend AMOUNT OF EDITING Combined COPY ,892337, ,18% TRANSL W/O CAT 79380,89746, ,78% VISUAL EXACT (100%) MATCHES % MATCHES % MATCHES % MATCHES NO MATCHES (MT FEEDS) BLIND EXACT (100%) MATCHES ,13%939,27% 90-99% MATCHES ,66%2349,57% 80-89% MATCHES ,70%539,90% 70-79% MATCHES ,06%0 NO MATCHES (MT FEEDS) ,62%5333,58% ,71% Preliminary results
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 1: Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 1: Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 1: Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 2: Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 2: Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 2: Translation speed (words/hour)
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Quality
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Conclusions: Testing of first hypothesis (speed) is inconclusive if we take the whole texts as a reference. Subject1 was slightly faster (5.2 percent) in environment V, while Subject2 was slightly faster (5.6 percent) in environment B. Overall speed depends on the distribution of different types of translation suggestions in the texts (besides individual-specific differences).
Small number of subjects Small number of segments Irregular segments Terminology Segment identification Experience increases over time Subject variability Quality assessment? Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Conclusions ?Generalisations? Specific type of text Particular subject Given fuzzy match grid A particular MT engine
Quality assessments Retrospective interviews Statistical analysis MT trust scores? Eye-tracking? Translog? Implications/Applications of findings Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
O’Brien, Sharon Eye-tracking and translation memory matches. Perspectives: Studies in Translatology 14, n. 3: Guerberof, Ana Productivity and quality in the post-editing of outputs from translation memories and machine translation. Localisation Focus - The International Journal of Localisation 7, n. 1: Christensen, Tina Paulsen & Anne Schjoldager “Translation-Memory (TM) Research: What Do We Know and How Do We Know It?” Hermes – Journal of Language and Communication Studies. Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili
Thank you! Carlos S. C. Teixeira Intercultural Studies Group Universitat Rovira i Virgili (Tarragona, Spain)