Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carlos S. C. Teixeira Intercultural Studies Group Universitat Rovira i Virgili (Tarragona, Spain) Knowledge of provenance.

Similar presentations


Presentation on theme: "Carlos S. C. Teixeira Intercultural Studies Group Universitat Rovira i Virgili (Tarragona, Spain) Knowledge of provenance."— Presentation transcript:

1 Carlos S. C. Teixeira Intercultural Studies Group Universitat Rovira i Virgili (Tarragona, Spain) carlostx@linguanativa.com.br Knowledge of provenance and its effects on translation performance (in an integrated TM/MT environment) NLPCS 2011 8th International Workshop on Natural Language Processing and Cognitive Science Special Issue: Human-Machine Interaction in Translation 20-21 August, 2011 - Copenhagen, Denmark

2 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

3 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

4  Speed: Will you translate faster?  Effort: Will you feel more tired?  Quality: Will you translate better? Reason: Does provenance play a role? Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

5  Speed: V is faster than B H1: The translation speed is higher in V than in B  Effort: V requires less editing than B H2: The amount of editing is smaller in V than in B  Quality: V and B produce similar quality H4: There is no significant difference in quality between V and B Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

6 English textSpanish text Translation Memory (Alignment) Source text 1 Source text 2 Exact matches 90-99% fuzzy 80-89% fuzzy 70-79% fuzzy No matches (MT) TM 1 TM 2

7 ◦ Same type of text ◦ Same types of matches ◦ Same machine-translation engine (ecological validity) So what is different?  Provenance information Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

8

9  BBFlashBack ◦ Screen activity ◦ Keystrokes ◦ Mouse movements and clicks ◦ Translator’s face ◦ Sound (voices, keyboard, etc)  Retrospective interviews  Quality assessment

10 Data treatment 1 st RENDERINGTYPINGNOTES2 nd RENDERINGTYPINGNOTES 1FUZZY 75% 00:00,0000:40,3340,3311 18:38,4418:43,8905,450 00,00 2FUZZY 86% 00:40,5601:37,7857,2239 18:44,5618:46,2201,660 00,00 3NO MATCH 02:30,3304:31,67121,3417Asks a question to researcher18:46,8919:19,3332,4419 00,00 4NO MATCH 04:31,6704:38,2206,55 19:20,0019:28,2208,220 05:05,7805:09,5603,78 00,00 06:28,5606:51,4422,88 00,00 11:51,3313:06,3375,0060 00,00 14:04,3314:35,1130,7834 5NO MATCH 14:35,6715:57,2281,5522 19:28,7819:43,6714,898 00,00 6NO MATCH 15:57,8917:17,8980,0048 19:44,3319:59,8915,560 00,00 7FUZZY 87% 17:18,5619:14,44115,88101 20:00,4420:16,5616,125 00,00 8EXACT 19:14,4420:49,1194,67119 20:17,2220:34,3317,119 22:47,2223:03,7816,561 00,00 9NO MATCH 23:04,3323:24,4420,11- 20:35,0020:59,6742,4526 26:08,4426:52,0043,5668 00,00 27:53,5628:15,2221,6610 00,00 10FUZZY 95% 28:16,0030:35,11139,1126 21:00,4421:11,6711,230 31:03,5631:39,7836,220 00,00 31:51,3332:48,6757,3420 00,00 33:23,6734:28,0064,3348 00,00 11FUZZY 99% 34:28,5634:51,6723,1140 21:12,2221:13,3301,110 00,00 12FUZZY 74% 34:52,3335:14,5622,230 21:14,1121:25,1111,000 35:41,1137:17,8996,7890 00,00 37:46,4438:04,3317,892 00,00 43:11,5643:19,5608,000 00,00 55:10,6755:51,8941,2238 13EXACT 55:52,4457:12,2279,7850 21:25,7821:39,5613,780 00,00 14EXACT 57:12,7857:23,2210,443 Researcher interrupts subject to tell he has to leave the room for a while21:40,0021:44,3304,330 57:35,8958:25,2249,3335 00,00 59:10,2259:31,8921,67 00,00 15NO MATCH 59:32,5600:44,8972,3336 21:44,8921:58,7813,890 00,00 16EXACT 00:45,5601:27,4441,88 21:59,4422:04,1104,670 02:28,7802:48,2219,44 00,00 05:22,4405:33,5611,12 00,00 05:37,7806:25,3347,55 00,00 06:54,0007:17,8923,89 00,00 09:11,7809:31,0019,22 00,00 11:05,6711:24,2218,5510 00,00 12:07,8912:28,1120,22 00,00 13:15,6713:32,1116,44 00,00 13:44,2214:24,1139,8975 00,00 258,20 00,00 17FUZZY 86% 14:24,6715:01,4436,7721 22:04,6722:17,1112,440 00,00 18NO MATCH 15:02,0015:14,0012,0011 22:17,6722:18,8901,220 00,00 19FUZZY 93% 15:14,6715:35,2220,5531 22:19,4423:00,4441,0023Check sound here! 15:56,0016:24,0028,0044 00,00 20FUZZY 72% 16:24,5617:11,5647,0039 23:01,1123:04,3303,220 00,00 21EXACT 17:12,2217:47,1134,899 23:04,8923:14,4409,550 00,00

11 Data treatment SOURCE WORDS TIME (sec) 1 st rendition SPEED (words/h) 1 st rendition TIME (sec) Proof- reading SPEED (words/h) Combined TARGET CHARS TYPED CHARS 1 st rendition AMOUNT OF EDITING 1 st rendition TYPED CHARS 2 nd rend AMOUNT OF EDITING Combined TRANSLATION BLIND (Text12) EXACT (100%) MATCHES SEGMENT #130111,2397117,1184215212078,95%984,87% SEGMENT #23079,78135413,7811541785028,09%0 SEGMENT #32581,4411054,3310491503825,33%0 SEGMENT #418258,22514,672471568554,49%0 SEGMENT #52534,8925809,55202515695,77%0 TOTAL128565,5481549,4474979230238,13%939,27% 90-99% MATCHES SEGMENT #13829746111,234443089430,52%0 SEGMENT #2723,1110901,111040414097,56%0 SEGMENT #32048,551483418041197563,03%2382,35% TOTAL65368,6663553,3455546820944,66%2349,57% 80-89% MATCHES SEGMENT #12757,2216991,6616511083936,11%0 SEGMENT #224115,8874616,1265516010163,13%566,25% SEGMENT #32636,77254612,4419021482114,19%0 TOTAL77209,87132130,22115541616138,70%539,90% 70-79% MATCHES SEGMENT #11640,3314285,4512581081110,19%0 SEGMENT #244186,128511180421613060,19%0 SEGMENT #3174713023,221219943941,49%0 TOTAL77273,45101419,6794641818043,06%0 NO MATCHES (MT FEEDS) SEGMENT #131121,3492032,44726219177,76%1916,44% SEGMENT #230138,997778,227341629458,02%0 SEGMENT #32681,55114814,899711532214,38%819,61% SEGMENT #42880126015,5610552234821,52%0 SEGMENT #51585,3363342,45423957882,11%26109,47% SEGMENT #62972,33144313,8912111843619,57%0 SEGMENT #761218001,221634331133,33%0 TOTAL165591,541004128,67825106930628,62%5333,58%

12 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili SOURCE WORDS TIME (sec) 1 st rendition SPEED (words/h) 1 st rendition TIME (sec) 2 nd rendition SPEED (words/h) Combined TARGET CHARS TYPED CHARS 1 st rendition AMOUNT OF EDITING 1 st rendition TYPED CHARS 2 nd rend AMOUNT OF EDITING Combined COPY 135207,892337,77478 752806107,18% TRANSL W/O CAT 79380,89746,672268 602703116,78% VISUAL EXACT (100%) MATCHES 1311553036941895792 90-99% MATCHES 912341397101977464 80-89% MATCHES 511531197271019376 70-79% MATCHES 8745468988577457 NO MATCHES (MT FEEDS) 1507836901325911018 510178010314418273107 BLIND EXACT (100%) MATCHES 1285668154974979230238,13%939,27% 90-99% MATCHES 653696355355546820944,66%2349,57% 80-89% MATCHES 77210132130115541616138,70%539,90% 70-79% MATCHES 7727310142094641818043,06%0 NO MATCHES (MT FEEDS) 1655921004129825106930628,62%5333,58% 51220099172818053163 102,71% Preliminary results

13 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 1: Translation speed (words/hour)

14 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 1: Translation speed (words/hour)

15 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 1: Translation speed (words/hour)

16 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 2: Translation speed (words/hour)

17 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 2: Translation speed (words/hour)

18 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Subject 2: Translation speed (words/hour)

19 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Quality

20 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Preliminary results Conclusions:  Testing of first hypothesis (speed) is inconclusive if we take the whole texts as a reference.  Subject1 was slightly faster (5.2 percent) in environment V, while Subject2 was slightly faster (5.6 percent) in environment B.  Overall speed depends on the distribution of different types of translation suggestions in the texts (besides individual-specific differences).

21  Small number of subjects  Small number of segments  Irregular segments  Terminology  Segment identification  Experience increases over time  Subject variability  Quality assessment? Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

22 Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili Conclusions ?Generalisations? Specific type of text Particular subject Given fuzzy match grid A particular MT engine

23  Quality assessments  Retrospective interviews  Statistical analysis  MT trust scores?  Eye-tracking?  Translog?  Implications/Applications of findings Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

24  O’Brien, Sharon. 2006. Eye-tracking and translation memory matches. Perspectives: Studies in Translatology 14, n. 3: 185-205.  Guerberof, Ana. 2009. Productivity and quality in the post-editing of outputs from translation memories and machine translation. Localisation Focus - The International Journal of Localisation 7, n. 1: 11-21.  Christensen, Tina Paulsen & Anne Schjoldager. 2010. “Translation-Memory (TM) Research: What Do We Know and How Do We Know It?” Hermes – Journal of Language and Communication Studies. Carlos S. C. Teixeira © 2011 Universitat Rovira i Virgili

25 Thank you! carlostx@linguanativa.com.br Carlos S. C. Teixeira Intercultural Studies Group Universitat Rovira i Virgili (Tarragona, Spain)


Download ppt "Carlos S. C. Teixeira Intercultural Studies Group Universitat Rovira i Virgili (Tarragona, Spain) Knowledge of provenance."

Similar presentations


Ads by Google