Information Security and Privacy Aspects of Using Online Machine Translation in CAT Tools About TeMpTations & Masks Translating and the Computer 2018 Christine Bruckner Nov 15, 2018 Freelance Translation Technology Consultant
Machine Translation and the Translator Today … and MT plugins like DeepL Pro are fully secure and GDPR compliant (?!) Translating and the Computer 2018 Christine Bruckner
MT Integration in CAT Tools … and many more batch mode / via pre-translation => “post-editing” interactive lookup during translation at: segment level sub-segment level (AutoSuggest, MatchRepair, MT-based fuzzy match repair, TM validated MT match…) SDL Trados Studio Across memoQ STAR Transit CAT Tools These scenarios are not really new, see e. g. Heyn, Matthias (1996): Integrating machine translation into translation memory systems. http://www.mt-archive.info/EAMT-1996-WS.pdf, pp. 111-123 Translating and the Computer 2018 Christine Bruckner
CAT+MT Integration – Example 1 Microsoft MT Plug-in and MT pre-translation in memoQ Translating and the Computer 2018 Christine Bruckner
CAT MT Integration - Example 2 Interactive MT proposals (via. AutoSuggest) in SDL Trados Studio 2019 via DeepL Pro plug-in Translating and the Computer 2018 Christine Bruckner
Free / Cheap Cloud MT Plug-ins in Common CAT Tools MT Provider CAT Tool Integration Customization / Adaptive MT Google Cloud Translation SDL Trados Studio Across memoQ Transit NXT not in standard version TMX import via Google AutoML Translation DeepL no(t yet) MyMemory TMX import Adaptive MT (in Pro version) Microsoft MT via Category ID (=domain) in standard version for SMT TMX import via Microsoft Custom Translator … Translating and the Computer 2018 Christine Bruckner
Information Security Aspects with Cloud MT Solutions Confidentiality Availability Integrity Data encryption Avoid use / storing of data input by third parties Confidentiality: The “wrong” people can see information in transit. MT sites can use your data in ways you did not intend. (Don de Palma, 2014: http://www.tcworld.info/e-magazine/translation- and-localization/article/free-machine-translation-can-leak-data/ ) Translating and the Computer 2018 Christine Bruckner
Data Protection Aspects with Cloud MT Solutions Server location Data Processing Agreements Anonymization / Pseudonymization Data Protection / Privacy: “data protection by design will become a legal obligation once the GDPR starts to apply in May 2018. Those who process personal data of individuals will have to take data protection into account “both at the time of the determination of the means for processing and at the time of the processing itself”, as Article 25 of the GDPR puts it.” https://edps.europa.eu/press-publications/press-news/blog/crucial-moment- communications-privacy_en MT user is the data controller and thus responsible and liable for what happens with third-party personal data Translating and the Computer 2018 Christine Bruckner
Warn the User before Using Cloud MT in CAT Tools (too few) warnings and links to Terms of Service of cloud MT providers in CAT environments Warnings when setting up / configuring MT plug-ins Warning only when setting up Google MT plug-in SDL Trados Studio Across memoQ STAR Transit CAT Tools Warnings configurable: always never per project No warnings in UI Translating and the Computer 2018 Christine Bruckner
Examples from MT Terms of Services - MyMemory Oct 2018 MyMemory “We collect any segment submitted and store it on a long term basis, whether it’s public or private. [..] The contributions to the archive, whether they are “Public Data” or “Private Data”, are collected, processed and used by Translated to create statistics, set up new services and improve existing ones.” […] MyMemory uses external partners to outsource some developments and provide some functionalities: this involves data sharing. External Machine Translation providers is the most obvious example. Translated can entitle its partners of a usage license over “Public Data” or “Private Data” in order to improve the quality of our and/or 3rd parties' services (eg. machine translation suggestions, glossaries, language models, spell checkers...). (https://mymemory.translated.net/doc/en/tos.php, Oct 2018) Confidentiality Translating and the Computer 2018 Christine Bruckner
Examples from MT Terms of Service – DeepL Pro Oct 2018 DeepL Pro Data Confidentiality: Your texts are deleted immediately after you've received the translation Your Data is Secure: “DeepL Pro never stores the texts you are translating, and the connection to our servers is always encrypted. This means that your texts are not used for any purposes other than your translation, nor can they be accessed by third parties. As a company based in Germany, all our operations comply with European Union Data Protection laws.” BUT: Servers are located in Iceland (https://www.datacenterdynamics.com/news/deepl- deploys-51-petaflop-supercomputer-on-verne-global-campus/ Sept 2017) “7.12 Customer is obligated to observe all legal requirements for the collection, processing and use of data which is transmitted to and processed by DeepL for the Customer in connection with the provision of its services under this Agreement. In particular, Customer shall immediately inform DeepL if Customer intends to transmit personal data to DeepL using the API. Customer guarantees not to collect, process or use any personal data in connection with the API without the express consent of the data subject or sufficient other legal authorisation. […].” (https://www.deepl.com/pro-license.html, Oct 2018) Privacy Translating and the Computer 2018 Christine Bruckner
Masking of Personal Data in the Translation Process Masking = anonymization or pseudonymization Translation task assignment Review Content creation Pre-production & file preparation Project creation Project preparation Project finalization When? ? ? ? ? Process flow inspired by SDL Studio training presentation ? manually in source texts semi-automatically via search+replace strings+regex file automatically via tools How? Translating and the Computer 2018 Christine Bruckner
Masking during Translation Project Creation The data is protected for the (whole) translation process – in source and target Translating and the Computer 2018 Christine Bruckner
Pseudonymization - Negative Side-Effects on Content Google NMT Issues with: understandability MT quality potential adaptation needs (date format, transliteration, …) TM content re-use wrong 100% matches … DeepL Translating and the Computer 2018 Christine Bruckner
Masking of Personal Data before Using Cloud MT Project creation Project preparation Translation task assignment Content creation Pre-production & file preparation Review Project finalization When? Process flow inspired by SDL Studio training presentation Personal data is only protected during transmission to the cloud MT provider Translating and the Computer 2018 Christine Bruckner
Cloud MT Masking Options in CAT Tools only few available, for example: MT Enhanced plugin (Google MT, Microsoft MT) – Studio RyS plugin (Google MT) – Studio regex knowledge required for configuration source text untouched meaningful tags Config file for MT Enhanced plugin Translating and the Computer 2018 Christine Bruckner
Example Masking with MT Enhanced Plug-in Google MT output in SDL Trados Studio with masking via MT Enhanced plug-in Translating and the Computer 2018 Christine Bruckner
Some Conclusions on Personal Data Masking in CAT Tools Additional efforts during preparation (tools, regex knowledge) Tagging of personal data decreases understandability of source text and MT output quality “meta-information” (type, gender etc.) should be preserved for better understanding by translators 100 % matches need more revision after un-masking Translating and the Computer 2018 Christine Bruckner
Some Conclusions on Personal Data Masking in CAT Tools With a little additional background knowledge, it is often possible to identify the individuals you thought had been anonymized Use secure offline MT solutions Avoid masking efforts and disadvantages via - clear information about document classification and personal data content - guidelines+agreements with translators regarding cloud MT systems Translating and the Computer 2018 Christine Bruckner
Thank you! Christine.Bruckner@CATTMaTTers.de