Félix do carmo, luís trigo AND BELINDA MAIA London, november 2016

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
Requirements Engineering n Elicit requirements from customer  Information and control needs, product function and behavior, overall product performance,
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Information Retrieval in Practice
1 © Ramesh Jain Social Life Networks: Ontology-based Recognition Ramesh Jain Contact:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Requirements Analysis Concepts & Principles
UNDERSTANDING BILINGUAL TRANSLATION OF SPECIALIZED TEXTS.
Teaching with Depth An Understanding of Webb’s Depth of Knowledge
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
Overview of Search Engines
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Meta-Knowledge Computer-age study skill or What kids need to know to be effective students Graham Seibert Copyright 2006.
Analyzing and Interpreting Quantitative Data
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Mining By Dave Maung.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Copyright 2006 John Wiley & Sons, Inc Chapter 5 – Cognitive Engineering HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane Carey.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Introduction to Machine Learning, its potential usage in network area,
Information Retrieval in Practice
Information Retrieval in Practice
Machine Learning for Computer Security
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Project Objectives Publish to a remote server
^ Reviewer’s Workbench
MIS2502: Data Analytics Advanced Analytics - Introduction
Muneo Kitajima Human-Computer Interaction Group
CGS 2545: Database Concepts Fall 2010
Report Writing Three phases of report writing Exploratory phase (MAPS)
Personalized Social Image Recommendation
Analyzing and Interpreting Quantitative Data
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
ELT. General Supervision
Web Data Extraction Based on Partial Tree Alignment
Overview of System Engineering
Ontology Evolution: A Methodological Overview
Computer Literacy BASICS: A Comprehensive Guide to IC3, 3rd Edition
Tools of Software Development
Part of the Multilingual Web-LT Program
Statistical NLP: Lecture 9
Graph transformation in a Nutshell
CSc4730/6730 Scientific Visualization
Data Warehousing and Data Mining
Machine Learning in Practice Lecture 11
Tomás Murillo-Morales and Klaus Miesenberger
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
InnovationQ Plus Quick Start Guide
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
CS 8532: Advanced Software Engineering
CSE 635 Multimedia Information Retrieval
The Translation Management System for Global Enterprises
Chapter 12 Analyzing Semistructured Decision Support Systems
Presentation transcript:

Félix do carmo, luís trigo AND BELINDA MAIA London, november 2016 From CATs to KATs Félix do carmo, luís trigo AND BELINDA MAIA London, november 2016

CAT – Computer-Assisted Translation KAT – Knowledge-Assisted Translation

The evolution from CATs to KATs Translators’ reality Translation, Revision, Post-editing, Project management Constant need for better tools Specifications for new tools Based on processes and procedures References Translation Studies research Translation processes Translation tools Computer Sciences research Machine Translation (MT) -> Statistical Machine Translation (SMT) Natural Language Processing, Machine Learning, Artificial Intelligence, Augmented Intelligence

Translator’s reality

Processes and procedures Three processes Translation Revision Post-editing Translation process Cognitive approaches Decision – outside the scope of this paper Four translation procedures/stages Management Research Writing/editing Revising/checking

Translating with CATs Translating new segments Editing fuzzy matches Support of terminology and concordancers (word-level) Overwriting source text Editing fuzzy matches Support by highlighting differences Edit differences on translation suggestion Re-edit when these are repeated in the text Checking repeated content Support from the TM (segment-level) Validate or edit Example of re-edits: Copyright notices with different dates, lists of product names and brands

Revising with CATs Read the whole translation Source and target text Check references Repeat the translation process Validate decisions Decide differently Support for revision The same as for translation Terminology, condordances QA checks Maintain reference material

Post-editing with CATs Translating new segments MT translation hypothesis replaces the source text Support of terminology and concordancers (word-level) Editing by overwriting the translation hypothesis Editing fuzzy matches Support by highlighting differences Edit differences on translation suggestion Re-edit when these are repeated in the text Checking repeated content Support from the TM (segment-level) Validate or edit There is no specific support for Post-editing or Editing: Post-edited segments have the same CAT word-level support as in the translation process Fuzzy matches, which have most of the editing effort, still only have the CAT tool “difference highlight” support

Advanced support in CATs QA, spelling and grammar checks Non-linguistic details Based on word processors Predictive writing Auto text suggestions, based on terminology or bilingual databases Web search simplified Embedded web favourite lists Fuzzy match composition Create translation suggestions from fragments Project management Package together texts, TMs and Termbases Control productivity, quality, etc. Beyond CATs: Web and MT tools Interactive Machine Translation

Recommended specifications for new tools

4 procedures/stages 1) Management 2) Research 3) Writing/editing Connect texts to resources 2) Research Learn from searches 3) Writing/editing Translation - overwriting Editing - 4 editing actions 4) Revising/checking Focus on specific tasks

1) Management Tasks and stages: Leverage the linguistic knowledge existing in texts, TMs and other resources Use “Technical domain” as the central feature to allocate the right contents to the right projects and to the right people Tasks and stages: For each new project, use classification techniques to identify the technical domain Use clustering techniques to associate the appropriate bilingual data (TMs, MT engines, references) to the project based on technical domain Associate people to domains based on the analysis of texts they produced Classify people according to specialisations, based on networks of texts Assign people to projects based on specialisations Management system by visual representations of proximity scales, or connections based on similarity

MDS representation The first approach is the Multidimensional Scaling (MDS) representation, which transforms the n-term dimensions that characterize documents/resources in a 2D representation, providing a spatial proximity insight for each. In this picture, researchers were clustered based on the proximity of their abstracts. (Trigo) Figure 1: MDS representation of colour-coded clusters of documents/resources

Graph-based approach Figure 2: Network of people in a R&D unit/center The second option is the graph-based approach, which represents similarities between “documents” as links. This concept is developed in the Affinity Miner project (Trigo et al., 2015), where it is applied to grasp affinity groups in and between publications of research centres. Document similarity is generated by a simple Bag of Words and vector representation (Feldman and Sanger, 2007). The importance of each document is highlighted visually by the size of the node corresponding to the number of publications and centrality within the graph. The centrality measure identifies the strongest element in each node, which could be, in our translation management environment, the translator with the highest rank in terms of number of words translated in a technical domain. Figure 2: Network of people in a R&D unit/center

2) Translator’s research TMs and references (unaligned corpora) clustered by domain Specialised search engines Vertical web search engines Log and cluster web sites used as references Crawl the web for similar references Terminology and keyword extraction Extract word and multi-word units Create ontologies Visualise clusters of words Named Entity Recognition Identify names of people, institutions and places, dates and other non-translatable elements Mark them for exclusion from the MT system

Research – knowledge base Build a local knowledge-base with these details Web references Named Entities Terminology and ontologies Ownership issues Owned by the translator – empower and support translator’s specialisation Shared with the reviser or production team – useful to validate confidence in translator The more specialised the references, the more confidence in the translator

3) Writing/editing Translation Overwriting Predictive writing (typeahead) Post-editing based on four editing micro-tasks or actions Deleting Inserting Moving Replacing Support specifically these four actions Machine Learning used in Quality Estimation Present suggestions for the editing actions

Interactive DELETION Learn “deletion” and suggest the same action in similar contexts SOURCE MT SUGGESTION POST-EDITED Acquire - to obtain possession of something Adquirir - para obter a posse de algo Adquirir - obter a posse de algo Align - to place something in an orderly position in relation to something else Alinhar - para colocar algo em uma posição ordenada em relação a outra coisa Alinhar - colocar algo em uma posição ordenada em relação a outra coisa Allocate - to divide something between different people or projects Alocar - para dividir algo entre diferentes pessoas ou projetos Alocar - dividir algo entre diferentes pessoas ou projetos

Interactive INSERTION Learn “insertion” and suggest the same action in similar contexts SOURCE MT SUGGESTION POST-EDITED User Name/ID Nome de utilizador Nome/ID de utilizador Patient Name/ID Nome do paciente Nome/ID do paciente Item Name/ID Nome do item Nome/ID do item

Interactive MOVEMENT Learn “movement” and suggest the same action in similar contexts Multi-word units validated when translators “move” them Added to translation model with a higher (101%) confidence score SOURCE MT SUGGESTION POST-EDITED VEC 1 controller pin 7 (BK) wire Controlador VEC 1 fio do pino 7 (BK) Fio do pino 7 (BK) do Controlador VEC 1 VEC 1 controller + (RD) wire 1 Controlador VEC + (RD) Fio + (RD) do Controlador VEC 1 VEC 1 controller – (BL) wire VEC 1 controlador - (BL) Fio - (BL) do Controlador VEC 1

Interactive REPLACEMENT Learn “replacement” and suggest the same action in similar contexts Multi-word units validated when translators “replace” them Added to translation model as high-score alternatives to units replaced SOURCE MT SUGGESTION POST-EDITED Users must be set up and maintained at the console. Os utilizadores têm de estar configurado e mantido na consola. Os utilizadores têm de estar configurados e mantidos na consola. Assess - to examine something in order to judge or evaluate it Avaliar - examinar algo para juiz ou avaliar Avaliar - examinar algo para ajuizar ou avaliar Act - to do something to change a situation Ato - fazer algo para mudar uma situação Atuar - fazer algo para mudar uma situação

4) Revising and checking Study the specificities of this stage Eliminate the need for the reviser to repeat translator’s research, decisions and edit efforts Setup QA checks Manage reference material

Support to revision/checking Validate the references used by the translators -> quicker confidence assessment Highlight edits made to fuzzy matches -> quicker edit validation Present suggestions to the 4 editing actions -> quicker editing Build sub-segment glossaries -> quicker consistency checking Improve QAs Build QAs according to client instructions and Style Guides QAs that learn from false positives E.g. Intentional diverging translations due to different contexts Clean and maintain TMs Excluding misaligned translations (due to segmentation issues) Excluding low-repeatable translations (very context-dependent)

How did we get here?

New and intelligent tools Build better Interactive Translation tools Instead of trying to emulate human actions, apply the best technologies to support translation, revision and editing Understand processes and procedures Support specific actions performed by translators Manage knowledge Take advantage of all the knowledge that may be extracted from data collected in operations and actions performed by translators KATs = Knowledge-Assisted Translation tools

Thank you.