About TeMpTations & Masks

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
Usage of the memoQ web service API by LSP – a case study
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Exchange Product Overview Secure Transmission for Transaction-based Documents.
How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
An innovative platform to allow translation and indexing of internet sites Localization World
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
Purpose Intended Audience and Presenter Contents Proposed Presentation Length Intended audience is all distributor partners and VARs Content may be customized.
ArcGIS Workflow Manager An Introduction
 Trends: › usual trio: desktop version, server version, cloud version › cloud version + free editor › industry standards adopted (XLIFF, TMX, TBX)
FLAVIUS Technical presentation (Overblog, Qype, TVTrip) - WP2 Platform architecture.
DIGIT Directorate-General for Informatics DIGIT Directorate-General for Informatics EUSURVEY Creating online surveys DIGIT EUSURVEY SUPPORT.
Content Strategy.
© 2008 IBM Corporation ® IBM Cognos Business Viewpoint Miguel Garcia - Solutions Architect.
…. PrePlanPrepareMigratePost Pre- Deployment PlanPrepareMigrate Post- Deployment First Mailbox.
U.S. Department of Commerce Web Advisory Group Minding Your Own Business The Platform for Privacy Preferences Project.
Afresco Overview Document management and share
2004/051 >> Supply Chain Solutions That Deliver Users.
Metatexis “the easy way to translate” By: Diana Delgado Ma. Victoria Porro Master en Traduction – TAO ETI – automne 2009.
The information contained in this document represents the current view of Microsoft Corp on the issues discussed as of the date of publication. Because.
Protection of Personal Information Act An Analysis on the impact.
Principles Identified - UK DfT -
Independent Centre for Privacy Protection Schleswig-Holstein
WHY VIDEO SURVELLIANCE
Connectivity to bank and sample account structure
David Hatten Developer, UrbanCode 17 October 2013
All about Ashley GmbH COMMUNICATION PARTNERS Partner overview.
Understanding The Cloud
Brussels Privacy Symposium on Identifiability
UNECE-CES Work session on Statistical Data Editing
About me Civil engineer (not in IT) and self-taught developer
Microsoft Virtual Academy
Language Technologies Institute Carnegie Mellon University
Creating the world’s largest Translation Memory
Understanding EU GDPR from an Office 365 perspective
Towards more flexibility in responding to users’ needs
Get to know SQL Manager SQL Server administration done right 
A secure communication platform
Viewing the GDPR Through a De-Identification Lens
8. Translation resources
One vs. two production environments
Working with Sensitive or Confidential Data John Southall Bodleian Data Librarian Subject Consultant for Economics, Sociology, Social Policy and.
Shavonne Henry, Nikia Clarke, David Heymann, Brandon Knight
THE STEPS TO MANAGE THE GRID
FICEER 2017 Docker as a Solution for Data Confidentiality Issues in Learning Management System.
Translating and the Computer London, 16 November 2017
Offline Auditing for Privacy
12: :00     Welcome   13: :55     Terumo and Flexso will share insights on the successful implementation of SuccessFactors Compensation module.
RMS with Microsoft SharePoint
Microsoft Virtual Academy
Axel Polleres Technical aspects vs. Innovation challenges of Enabling and Enhancing Privacy Axel Polleres
Securely run and grow your business with Microsoft 365 Business
Microsoft Services Provider License Agreement Program reference card
doc.: IEEE <doc#>
Ethical questions on the use of big data in official statistics
Systems analysis and design, 6th edition Dennis, wixom, and roth
Systems analysis and design, 6th edition Dennis, wixom, and roth
ISI Web of Knowledge EndNote® Web and EndNote® Integrated solutions for research and publishing October 2006.
How to upgrade your RSFORM!PRO forms for GDPR compliance
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Dissemination Working Group Luxembourg, May 2009
WHY VIDEO SURVELLIANCE
ONLINE SECURE DATA SERVICE
Technical Integration Guide
What does that have to do with me?
AI Discovery Template IBM Cloud Architecture Center
Make it real: Help your customers comply with the GDPR
MS Confidential : SharePoint 2010 Developer Workshop (Beta1)
Presentation transcript:

Information Security and Privacy Aspects of Using Online Machine Translation in CAT Tools About TeMpTations & Masks Translating and the Computer 2018 Christine Bruckner Nov 15, 2018 Freelance Translation Technology Consultant

Machine Translation and the Translator Today … and MT plugins like DeepL Pro are fully secure and GDPR compliant (?!) Translating and the Computer 2018 Christine Bruckner

MT Integration in CAT Tools … and many more batch mode / via pre-translation => “post-editing” interactive lookup during translation at: segment level sub-segment level (AutoSuggest, MatchRepair, MT-based fuzzy match repair, TM validated MT match…) SDL Trados Studio Across memoQ STAR Transit CAT Tools These scenarios are not really new, see e. g. Heyn, Matthias (1996): Integrating machine translation into translation memory systems. http://www.mt-archive.info/EAMT-1996-WS.pdf, pp. 111-123 Translating and the Computer 2018 Christine Bruckner

CAT+MT Integration – Example 1 Microsoft MT Plug-in and MT pre-translation in memoQ Translating and the Computer 2018 Christine Bruckner

CAT MT Integration - Example 2 Interactive MT proposals (via. AutoSuggest) in SDL Trados Studio 2019 via DeepL Pro plug-in Translating and the Computer 2018 Christine Bruckner

Free / Cheap Cloud MT Plug-ins in Common CAT Tools MT Provider CAT Tool Integration Customization / Adaptive MT Google Cloud Translation SDL Trados Studio Across memoQ Transit NXT not in standard version TMX import via Google AutoML Translation DeepL no(t yet) MyMemory TMX import Adaptive MT (in Pro version) Microsoft MT via Category ID (=domain) in standard version for SMT TMX import via Microsoft Custom Translator … Translating and the Computer 2018 Christine Bruckner

Information Security Aspects with Cloud MT Solutions Confidentiality Availability Integrity Data encryption Avoid use / storing of data input by third parties Confidentiality: The “wrong” people can see information in transit. MT sites can use your data in ways you did not intend. (Don de Palma, 2014: http://www.tcworld.info/e-magazine/translation- and-localization/article/free-machine-translation-can-leak-data/ ) Translating and the Computer 2018 Christine Bruckner

Data Protection Aspects with Cloud MT Solutions Server location Data Processing Agreements Anonymization / Pseudonymization Data Protection / Privacy: “data protection by design will become a legal obligation once the GDPR starts to apply in May 2018. Those who process personal data of individuals will have to take data protection into account “both at the time of the determination of the means for processing and at the time of the processing itself”, as Article 25 of the GDPR puts it.” https://edps.europa.eu/press-publications/press-news/blog/crucial-moment- communications-privacy_en MT user is the data controller and thus responsible and liable for what happens with third-party personal data Translating and the Computer 2018 Christine Bruckner

Warn the User before Using Cloud MT in CAT Tools (too few) warnings and links to Terms of Service of cloud MT providers in CAT environments Warnings when setting up / configuring MT plug-ins Warning only when setting up Google MT plug-in SDL Trados Studio Across memoQ STAR Transit CAT Tools Warnings configurable: always never per project No warnings in UI Translating and the Computer 2018 Christine Bruckner

Examples from MT Terms of Services - MyMemory Oct 2018 MyMemory “We collect any segment submitted and store it on a long term basis, whether it’s public or private. [..] The contributions to the archive, whether they are “Public Data” or “Private Data”, are collected, processed and used by Translated to create statistics, set up new services and improve existing ones.” […] MyMemory uses external partners to outsource some developments and provide some functionalities: this involves data sharing. External Machine Translation providers is the most obvious example. Translated can entitle its partners of a usage license over “Public Data” or “Private Data” in order to improve the quality of our and/or 3rd parties' services (eg. machine translation suggestions, glossaries, language models, spell checkers...). (https://mymemory.translated.net/doc/en/tos.php, Oct 2018) Confidentiality Translating and the Computer 2018 Christine Bruckner

Examples from MT Terms of Service – DeepL Pro Oct 2018 DeepL Pro Data Confidentiality: Your texts are deleted immediately after you've received the translation Your Data is Secure: “DeepL Pro never stores the texts you are translating, and the connection to our servers is always encrypted. This means that your texts are not used for any purposes other than your translation, nor can they be accessed by third parties. As a company based in Germany, all our operations comply with European Union Data Protection laws.” BUT: Servers are located in Iceland (https://www.datacenterdynamics.com/news/deepl- deploys-51-petaflop-supercomputer-on-verne-global-campus/ Sept 2017) “7.12 Customer is obligated to observe all legal requirements for the collection, processing and use of data which is transmitted to and processed by DeepL for the Customer in connection with the provision of its services under this Agreement. In particular, Customer shall immediately inform DeepL if Customer intends to transmit personal data to DeepL using the API. Customer guarantees not to collect, process or use any personal data in connection with the API without the express consent of the data subject or sufficient other legal authorisation. […].” (https://www.deepl.com/pro-license.html, Oct 2018) Privacy Translating and the Computer 2018 Christine Bruckner

Masking of Personal Data in the Translation Process Masking = anonymization or pseudonymization Translation task assignment Review Content creation Pre-production & file preparation Project creation Project preparation Project finalization When? ? ? ? ? Process flow inspired by SDL Studio training presentation ? manually in source texts semi-automatically via search+replace strings+regex file automatically via tools How? Translating and the Computer 2018 Christine Bruckner

Masking during Translation Project Creation The data is protected for the (whole) translation process – in source and target Translating and the Computer 2018 Christine Bruckner

Pseudonymization - Negative Side-Effects on Content Google NMT Issues with: understandability MT quality potential adaptation needs (date format, transliteration, …) TM content re-use wrong 100% matches … DeepL Translating and the Computer 2018 Christine Bruckner

Masking of Personal Data before Using Cloud MT Project creation Project preparation Translation task assignment Content creation Pre-production & file preparation Review Project finalization When? Process flow inspired by SDL Studio training presentation Personal data is only protected during transmission to the cloud MT provider  Translating and the Computer 2018 Christine Bruckner

Cloud MT Masking Options in CAT Tools only few available, for example: MT Enhanced plugin (Google MT, Microsoft MT) – Studio RyS plugin (Google MT) – Studio regex knowledge required for configuration source text untouched meaningful tags Config file for MT Enhanced plugin Translating and the Computer 2018 Christine Bruckner

Example Masking with MT Enhanced Plug-in Google MT output in SDL Trados Studio with masking via MT Enhanced plug-in Translating and the Computer 2018 Christine Bruckner

Some Conclusions on Personal Data Masking in CAT Tools Additional efforts during preparation (tools, regex knowledge) Tagging of personal data decreases understandability of source text and MT output quality “meta-information” (type, gender etc.) should be preserved for better understanding by translators 100 % matches need more revision after un-masking Translating and the Computer 2018 Christine Bruckner

Some Conclusions on Personal Data Masking in CAT Tools With a little additional background knowledge, it is often possible to identify the individuals you thought had been anonymized Use secure offline MT solutions Avoid masking efforts and disadvantages via - clear information about document classification and personal data content - guidelines+agreements with translators regarding cloud MT systems Translating and the Computer 2018 Christine Bruckner

Thank you! Christine.Bruckner@CATTMaTTers.de