Meghan Dowling Teresa Lynn Andy Way

Slides:



Advertisements
Similar presentations
Patent documentation - comparison of two MT strategies Lene Offersgaard, Claus Povlsen Center for Sprogteknologi, University of Copenhagen
Advertisements

Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Competitive OCTs for a sustainable development Olivier GASTON Representative of OCTA Chairman & Saint-Pierre-et-Miquelon.
METIS-II: a hybrid MT system Peter Dirix Vincent Vandeghinste Ineke Schuurman Centre for Computational Linguistics Katholieke Universiteit Leuven TMI 2007,
Sign Language Representation for Machine Translation Sara Morrissey NCLT/CNGL Seminar Series 1 st April, 2009.
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
02/08/2015Regional Writing Centre2 02/08/2015Regional Writing Centre3.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Achieving Domain Specificity in SMT without Over Siloing William Lewis, Chris Wendt, David Bullock Microsoft Research Machine Translation.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
Sanjay Chatterji Dev shri Roy Sudeshna Sarkar Anupam Basu CSE, IIT Kharagpur A Hybrid Approach for Bengali to Hindi Machine Translation.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Translating from Morphologically Complex Languages: A Paraphrase-Based Approach Preslav Nakov & Hwee Tou Ng.
02/19/13English-Indian Language MT (Phase-II)1 English – Indian Language Machine Translation Anuvadaksh Phase – II - The SMT Team, CDAC Mumbai.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Ibrahim Badr, Rabih Zbib, James Glass. Introduction Experiment on English-to-Arabic SMT. Two domains: text news,spoken travel conv. Explore the effect.
Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Adaptation Overview Adaptation Process Using Adapt-It and Paratext.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
NICILT Conference ‘Languages for Careers’ QUB, 4 th March 2016 Ciarán Mac Giolla Bhéin Advocacy Manager Conradh na Gaeilge.
Build MT systems with Moses MT Marathon Americas 2016 Hieu Hoang.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Is Neural Machine Translation the New State of the Art?
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Centre for Translation Studies FACULTY OF ARTS
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
Approaches to Machine Translation
Statistical Machine Translation
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Sub heading for your presentation – Arial 20pt
Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU
The ACCEPT Project Enabling machine translation for the emerging community content paradigm. Allowing citizens across the EU better access to communities.
Neural Machine Translation by Jointly Learning to Align and Translate
An Overview of Machine Translation
KantanNeural™ LQR Experiment
Task 1 Activities Achievements Pictures
Suggestions for Class Projects
Neural Lattice Search for Domain Adaptation in Machine Translation
--Mengxue Zhang, Qingyang Li
Triangular Architecture for Rare Language Translation
Deep Learning based Machine Translation
Warm-Up.
Tagging and Statistically Translating Latin Sentences
Build MT systems with Moses
Terminology translation accuracy in SMT vs. NMT
©2018 Graphical Research. All rights reserved. Europe Machine Translation Market size may exceed $390mn by 2024: Graphical Research.
Yuri Pettinicchi Jeny Tony Philip
Surafel Demissie, Frank Keenan, Özden Özcan-Top and Fergal McCaffery
ONEs - OHT NMT Evaluation score
On the Impact of Various Types of Noise on Neural Machine Translation
Approaches to Machine Translation
Languages of Europe.
Memory-augmented Chinese-Uyghur Neural Machine Translation
The XMU SMT System for IWSLT 2007
An Empirical Comparison of Domain Adaptation Methods for
SCALING UP CAPACITY ENHANCEMENT: BID, BIFA AND SUPPLEMENTARY FUNDING
(ii) PhDs and Postdocs Janice Carruthers
Neural Machine Translation by Jointly Learning to Align and Translate
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

Leveraging backtranslation to improve machine translation for Gaelic languages Meghan Dowling Teresa Lynn Andy Way The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

Irish and Scottish Gaelic The question Can we use existing datasets in one language to create artificial datasets for another closely related language? Irish and Scottish Gaelic

Overview Linguistic background MT background Data Method Results and Conclusions Future Work

Linguistic background

Word order

Craggy Island Inflection Oileán an Chreagáin ‘Rocky Island’ creag a’ chreag creagan na creige rock/a rock the rock rocks of the rock carraig an charraig carraigeacha na carraige

MT background

Backtranslation Creation of artificial bilingual data through the machine translation of monolingual data Can combine 2 different types of MT, e.g. RBMT, SMT, NMT MT might benefit from more data, even if of low quality

MT background RBMT Pipeline of rules etc (Scannell, 2006) SMT (Scannell, 2014) NMT (Chen, 2018)

Data

Data

Method

GA<->GD Method

Default Moses parameters Experiment set-up Experiment 1: GD->GA Authentic data: Ubuntu + GNOME Artificial data: Uicipeid Test data: Tatoeba-ga Experiment 2: GA->GD Authentic data: Ubuntu + GNOME Artificial data: GA dataset Test data: Tatoeba-ga Default Moses parameters

GD<->EN Method

Experiment set-up Experiment 3: GD->EN Experiment 4: EN->GD Authentic data: Ubuntu + GNOME Artificial data: GA dataset Test data: Tatoeba-en Experiment 4: EN->GD Authentic data: Ubuntu + GNOME Artificial data: GA dataset Test data: Tatoeba-en Parameters: 6-gram language model Hierarchical reordering tables

Part A: Baseline (authentic only) Part B: Artificial only Part C: Authentic + artificial

Results

Results

Results

Conclusions BLEU of artificial data only > BLEU of authentic data only highest BLEU = artificial + authentic combined backtranslation usable for low resource MT - even when MT used to create data of low quality

Future work Human evaluation Other MT to create artificial data (e.g. Scannell, 2006) Assess quality in cases where GA & GD differ linguistically Different domains Extend to other Celtic languages, e.g. Manx

Go raibh míle maith agaibh! @ismisemeg @adaptcentre @cigilt @andyway