Smart Computer-Aided Translation Environment Project Progress and State of Affairs Vincent Vandeghinste
Overview per work package WP0 – Project management WP1 – Translation Technology Improvements WP2 – Evaluation of Computer-Aided Translation WP3 – Terminology Extraction WP4 – Speech Recognition WP5 – Workflows and Personalized User-Interfaces WP6 – Integration, Evaluation, Valorisation, and Dissemination SCATE T24 Meeting2
WP0. Project management Project progress of the last six months: described in D Activity Report T24 Technical description of the second project year: D0.3.2 Technical report T24 (Available upon request for IAC members) SCATE T24 Meeting3
Overview per work package WP0 – Project management WP1 – Translation Technology Improvements WP2 – Evaluation of Computer-Aided Translation WP3 – Terminology Extraction WP4 – Speech Recognition WP5 – Workflows and Personalized User-Interfaces WP6 – Integration, Evaluation, Valorisation, and Dissemination SCATE T24 Meeting4
WP1. Translation Technology Improvements Task 1.1 Improve Fuzzy Matching human evaluation of previous work Task 1.2 Improve Machine Translation Task Improve Transduction D Report on improved transduction preliminary version was delivered will be updated later Task 1.3 Integration of TM into MT work has started Presentations Collaborative Translation by Tom Vanallemeersch Syntactic Concordancing with Poly-GrETEL by Liesbeth Augustinus SCATE T24 Meeting5
Overview per work package WP0 – Project management WP1 – Translation Technology Improvements WP2 – Evaluation of Computer-Aided Translation WP3 – Terminology Extraction WP4 – Speech Recognition WP5 – Workflows and Personalized User-Interfaces WP6 – Integration, Evaluation, Valorisation, and Dissemination SCATE T24 Meeting6
WP2 Evaluation of Computer-Aided Translation Task 2.1 Taxonomy and annotated data set of typical translation errors Rule-based MT errors have been added to the corpus Task 2.2 Logging and analysis of human-machine interaction Presentation of Arda Tezcan Task 2.3 Confidence estimation of MT currently ongoing SCATE T24 Meeting7
Overview per work package WP0 – Project management WP1 – Translation Technology Improvements WP2 – Evaluation of Computer-Aided Translation WP3 – Terminology Extraction WP4 – Speech Recognition WP5 – Workflows and Personalized User-Interfaces WP6 – Integration, Evaluation, Valorisation, and Dissemination SCATE T24 Meeting8
WP3 Terminology Extraction from Comparable Corpora Task 3.1 Study of translator’s methods to acquire domain knowledge and terminology Presentation of Iulianna van der Lek Task 3.3 Terminology extraction from comparable text Use of compound splitter as preprocessing for cross-lingual term extraction Is recent progress in neural word embedding applicable to terminology extraction Presentation by Sien Moens SCATE T24 Meeting9
Overview per work package WP0 – Project management WP1 – Translation Technology Improvements WP2 – Evaluation of Computer-Aided Translation WP3 – Terminology Extraction WP4 – Speech Recognition WP5 – Workflows and Personalized User-Interfaces WP6 – Integration, Evaluation, Valorisation, and Dissemination SCATE T24 Meeting10
WP4 Speech Recognition Task 4.2 Automatic Domain Adaptation Presentation of Joris Pelemans and Geert Heyman SCATE T24 Meeting11
Overview per work package WP0 – Project management WP1 – Translation Technology Improvements WP2 – Evaluation of Computer-Aided Translation WP3 – Terminology Extraction WP4 – Speech Recognition WP5 – Workflows and Personalized User-Interfaces WP6 – Integration, Evaluation, Valorisation, and Dissemination SCATE T24 Meeting12
WP5 Work Flows and Personalised User Interfaces Task 5.2 Modelling Personalised Workflows for Translation Practices Task 5.3 Visualisation of Translation Features Task 5.4 Interfaces for Translation Work Presentation by Jan Van den Bergh SCATE T24 Meeting13
Overview per work package WP0 – Project management WP1 – Translation Technology Improvements WP2 – Evaluation of Computer-Aided Translation WP3 – Terminology Extraction WP4 – Speech Recognition WP5 – Workflows and Personalized User-Interfaces WP6 – Integration, Evaluation, Valorisation, and Dissemination SCATE T24 Meeting14
WP6 Integration, Evaluation, Valorisation, and Dissemination Task 6.1 Integration: first demonstrator Task 6.4 Dissemination scientific publications SCATE project presentation at EAMT 2015 Antalya, Turkey at EAMT 2016 Riga, Latvia at LT-Innovate 2016 Brussels SCATE T24 Meeting15
Publications of the last six months Geert Heyman, Ivan Vulic & Marie-Francine Moens (2015) C-BiLDA Extracting Cross-lingualTopics from Non-Parallel Texts by Distinguishing Shared from Unshared Content. Data Mining and Knowledge Discovery. pp Douwe Kiela, Ivan Vulic & Stephen Clark (2015) Transferring Features from a Convolutional Neural Network to Perform Bilingual Lexicon Induction. In Proceedings of EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Lisboa, Portugal September Lieve Macken & Arda Tezcan (in press). Dutch Compound Splitting for Bilingual Terminology Extraction. In R. Mitkov, J. Monti, G.C. Pastor & V. Seretan (editors), Multi-word Units in Machine Translation and Translation Technology. John Benjamins. Ayla Rigouts Terryn, Lieve Macken & Els Lefever (submitted). Dutch Hypernym Detection: Does Decompounding Help? In: Proceedings of the Joint Second Workshop on Language and Ontology & Terminology and Knowledge Structures (LangOnto2 + TermiKS). LREC Portoroz, Slovenia. Joris Pelemans, Tom Vanallemeersch, Kris Demuynck, Lyan Verwimp, Hugo Van hamme & Patrick Wambacq (accepted). Language Model Adaptation for ASR of Spoken Translations using Phrase-based Translation Models and Named Entity Models. In: Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016). Shanghai, China, March Joris Pelemans, Tom Vanallemeersch, Kris Demuynck, Hugo Van hamme, Patrick Wambacq (2015). Efficient Language Model Adaptation for Automatic Speech Recognition of Spoken Translations. In: Proceedings of Interspeech 2015 (annual conference of the International Speech Communication Association). Dresden, Germany, September pp Arda Tezcan, Véronique Hoste, Lieve Macken (accepted). SCATE Taxonomy and Corpus of Machine Translation Errors. Translation and Interpreting Technologies. Brill Academic Publishers. Leiden, the Netherlands. Liesbeth Augustinus, Vincent Vandeghinste & Tom Vanallemeersch (accepted). Poly-GrETEL: Cross-Lingual Example-based Querying of Syntactic Constructions. In: Proceedings of the 10th International Conference on Language Resouces and Evaluation (LREC). Portoroz, Slovenia. Jan Van den Bergh, Eva Geurts, Donald Degraen, Mieke Haesen, Iulianna van der Lek-Ciudin and Karin Coninx (2015). Recommendations for Translation Environments to Improve Translators‘ Workflows. Translating and the Computer 37. Asling. London, UK, November 2015 (pp ). Iulianna van der Lek-Ciudin, Tom Vanallemeersch & Ken De Wachter (2015). Contextual Inquiries at Translators’ Workplace. In: Benjamin Phister, Carmelo Cancio, Laurence Cuzzolin (eds) Proceedings of TAO-CAT 2015, a conference on Computer Aided Translation tools. Angers, France, June pp Ivan Vulic & Marie-Francine Moens (2016). Bilingual Distributed Word Representations from Document-Aligned Comparable Data. Journal of Artificial Intelligence Research. SCATE T24 Meeting16
Questions SCATE T24 Meeting17