Download presentation
Presentation is loading. Please wait.
Published byChristian Crawford Modified over 9 years ago
1
Impact of automated translation on mining knowledge from text data 19. 11. 2015, Brno Luděk Svozil
2
strana 2 Introduction Statistical and hybrid machine translation systems are gaining more attention Apart from commercial services like Google Translate and Bing, there are number of projects aiming to bring the benefits of big data knowledge to end- users Kapitola 1
3
EU projects on horizon Modern MT – aims to bring powerful, ready to use MT system to desktop users http://www.modernmt.eu/ LTI cloud – gathers language technology components for easy use in information systems http://www.ltinnovate.org/lticloud strana 3
4
If machine translation is part of preprocessing, would it benefit the text-mining procces? And how? Earlier experiments have shown that when combining scarce data across different languages, MT provides great simplification of problem strana 4
5
Test data and experiment 20 000 reviews in 5 languages from booking.com were subjected to Google machine translation, stemming and then c5.0 decision tree was trained on them and evaluated using cross-validation strana 5
6
Results – % decrease in attributes count strana 6 ESFRPLCSDE translation24%17%42%40%29% stemming37%31%20%33%16% translation and stemming 41%35%56%53%44%
7
Results – avg. classification error strana 7 ESFRPLCSDE Original14,10% 12,40%14,60%12,70% Translated14,10%13,30%11,30%12,70%12,00% Stemmed15,30%14,00%11,90%11,80%13,50% Translated and stemmed15,50% 12,80%13,70%14,10%
8
To observe how well the translated data would combine with native English, another experiment was made 10 000 English documents were combined with another 10 000 from different language, the other language was then Google translated strana 8
9
Results – avg. classification error strana 9 EN+FREN+PLEN+DEEN+ES original16,10%14,80%14,60%17,30% non-English language translated 33,50%33,90%37,70%36,10%
10
Conclusions MT simplifies problem (reduces dictionary) while doesn’t increase classification error Attention must be paid, while combining native and translated documents strana 10
11
Další detaily, testy a porovnání rule- based a MT translátorů najdete v mé bakalářské práci „Dolování znalostí z vícejazyčných textových dat“, která bude k dispozici během ledna-února 2016 strana 11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.