Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.

Slides:



Advertisements
Similar presentations
Translation in the 21 st Century Impacts of MT and social media on language services.
Advertisements

1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Silvia Mosso 1 Research Connection 2009: the LUNA project 1 Prague, 7 May 2009 Research Connection 2009 Silvia Mosso LUNA: the Power of Understanding.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
The Living Literacy Framework and the E&I Literacy Action Plan Valerie Neaves Alberta Works Programs Alberta Asset Building Collaborative March 17, 2011.
Bridging the Gap between Academia and the Market Laila Galal Rizk Faculty of Al Alsun (Languages) Ain Shams University.
The Centre for Next Generation Localisation LRC XIII Conference Dublin October 2008 Prof. Josef van Genabith.
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
THE TRANSLATION NETWORK Overview  Easily manage your multilingual sites  Synchronize content and manage changes  Translate content on the fly  Use.
Machine Translation Anna Sågvall Hein Mösg F
1 MT in the NCLT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System interfaces Updated: November 2014.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Introductory Remarks Robust Intelligence Solicitation Edwina Rissland Daniel DeMenthon, George Lee, Tanya Korelsky, Ken Whang (The Robust Intelligence.
Microsoft Office PerformancePoint Server 2007 Planning Module Sony Jose
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 Tools of Software Development l 2 types of tools used by software engineers:
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Funded under the EU ICT Policy Support Programme Automated Solutions for Patent Translation John Tinsley Project PLuTO WIPO Symposium of.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Gabriela Contreras, Continental Airlines Yvan Hennecart, SDL
Claudia Marzi Institute for Computational Linguistics (ILC) National Research Council (CNR) - Italy.
Streamlining the Review Cycle Michael Oettli, nlg GmbH Santa Clara, October 10 th.
Centralizing Dell Marketing Translations Wayne Bourland Sr. Manager, Global Localization Team.
Stephen Doherty, CNGL/SALIS
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
Case Study Summary Link Translation entered a partner agreement with Autodesk to provide translation solutions integrating human and machine translation.
The worldwide language services market 2.33% Revenue from post-edited machine translation 53.5% Consumers using free MT on the web Copyright © 2011 by.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
PASSOLO ® Makes Your Software Ready for the Global Market Localisation Standards The Tools Developer’s Perspective.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
The Multilingual Web – Where Are We? Next Generation Localisation Josef van Genabith, CNGL & NCLT, DCU.
Keeping up with translation technologies: a call for experimental pedagogies Anthony Pym.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Language Technologies in the ICT Work Programme Hanna Klimek Directorate-General for Information Society & Media Unit E.1 “Language Technologies,
Evaluating Translation Memory Software Francie Gow MA Translation, University of Ottawa Translator, Translation Bureau, Government of Canada
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
NCP meeting Jan 27-28, 2003, Brussels Colette Maloney Interfaces, Knowledge and Content technologies, Applications & Information Market DG INFSO Multimodal.
Setting up localization collaboration for successful globalization. Sanghwan Lee.
Is Neural Machine Translation the New State of the Art?
Centre for Translation Studies FACULTY OF ARTS
Introduction to Machine Translation
The ACCEPT Project Enabling machine translation for the emerging community content paradigm. Allowing citizens across the EU better access to communities.
Continuous Automated Chatbot Testing
Part of the Multilingual Web-LT Program
Continuous Client Side Localization
ITS 2.0 Enriched Terminology Annotation Showcase
PeopleSoft Grants Julie Gustafson Product Strategy Manager
The Translation Management System for Global Enterprises
Potential impact of QT21 Eleanor Cornelius
Presentation transcript:

Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation Speech & Language Technologies in the NGL CSET

Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications

Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges

Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks

Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks Typical LSP’s Translation Process

Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks Typical LSP’s Translation Process Key Integration Challenges

Overview of Presentation Speech & Language Technologies in the NGL CSET Facilitating Optimal Multilingual NGL Applications Key Research Challenges Novel Research Tracks Typical LSP’s Translation Process Key Integration Challenges Concluding Remarks

ILT - Integrated Language Technologies Next Generation Localisation Systems Framework Enterprise Localisation Personalised Localisation Unified Model Digital Content Management Integrated Language Technologies Prof. Andy Way ILT Area Coordinator

ILT: Facilitating Optimal Multilingual NGL Applications Machine Translation Text Input Text Output Text Processing e.g. bulk localisation

ILT: Facilitating Optimal Multilingual NGL Applications Speech Technologies Machine Translation Text Input Text Output Speech Output Speech Input Text Processing e.g. bulk localisation e.g. personalisation

Machine Translation: Significance For our industrial partners, volume of material needing translation increasing, while budgets remain the same In the EU, now 23 official languages (506 language pairs), and expanding … In the US, huge investment in translation between Arabic , Chinese  and Urdu  English …

Machine Translation: Significance For our industrial partners, volume of material needing translation increasing, while budgets remain the same In the EU, now 23 official languages (506 language pairs), and expanding … In the US, huge investment in translation between Arabic , Chinese  and Urdu  English …  Automation the only option (especially for PL) …

Enhanced Translation Quality MT: Key Research Challenges Enhanced Translation Quality Faster Translation Times Scalability Other Modalities (Speech, SMS etc.)

The State-of-the-Art Source: Reference: The two sides highlighted the role of the World Trade Organization (WTO) Baseline: The two sides on the role of the WTO

Improving the State-of-the-Art Our MT systems have knowledge of syntax Parts of speech (nouns, verbs etc.) Roles in sentences (subject, object etc.)  better translation quality Source: Reference: The two sides highlighted the role of the World Trade Organization (WTO) Baseline: The two sides on the role of the WTO Our System: The two sides reaffirmed the role of the WTO

The State-of-the-Art Source: Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel

Improving the State-of-the-Art  better translation quality (especially where end-users are concerned) DCU Arabic  English system ranked first at international MT evaluation in Oct Source: Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel

MT Novel Research: Handling Different Types of Text Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different … So is the form …

MT Novel Research: Handling Different Types of Text Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different … So is the form …  Build different MT systems for each different task, using our industrial partners’ documentation

Text Processing: Significance and Challenges If texts are automatically annotated with: syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)

Text Processing: Significance and Challenges If texts are automatically annotated with: syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM) text-type and genre information, this helps our MT systems disambiguate text and improve translation quality

Text Processing: Significance and Challenges If texts are automatically annotated with: syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM) text-type and genre information, this helps our MT systems disambiguate text and improve translation quality localisation information (e.g. Andy Way ), then the workflows of our industrial partners (currently done manually) can be significantly improved (cf. LOC)

Speech Technology : Significance Speech interfaces for eyes-busy, hands-busy scenairos Speech recognition and synthesis systems which can deal with potentially an unlimited vocabulary multiple (and non-native) speakers multiple languages and can be tightly integrated with MT  localisation & personalisation  volume & scalability  access

the more it snows the more it goes… them ore its nows them ore it goes? themoreitsnows themoreitgoes Speech Technology: Challenges

the more it snows the more it goes… them ore its nows them ore it goes? themoreitsnows themoreitgoes demoreisnows demoregoes Speech Technology: Challenges

themoreitsnows themoreitgoes linguistic competence of native speaker “rules” and vocabulary of system performance of (native) speaker Speech Technology: Challenges the more it snows the more it goes… them ore its nows them ore it goes? demoreisnows demoregoes

themoreitsnows themoreitgoes the more it snows the more it goes… linguistic competence of native speaker them ore its nows them ore it goes? “rules” and vocabulary of system performance of (native) speaker Speech Technology: Innovations which integrates explicit linguistic knowledge Robust & Novel Speech Recognition Engine demoreisnows demoregoes

themoreitsnows themoreitgoes detverkarhavarite nstorstormhurmån the more it snows the more it goes… linguistic competence of native speaker them ore its nows them ore it goes? “rules” and vocabulary of system Jemehreschneit destomehres geht Innovations: Speech Recognition & MT Robust & Novel Speech Recognition Engine Tight coupling with MT Engines which integrates explicit linguistic knowledge

themoreitsnows themoreitgoes detverkarhavarite nstorstormhurmån Jemehreschneit destomehres geht Innovations: MT & Speech Synthesis Robust & Novel Speech Synthesis Engine which integrates explicit linguistic knowledge Tight coupling with MT Engines

Typical LSP’s Translation Process Freelance Translators Step 2: Post- editing & translation In-house Translators Incoming documents (segmented) Partially Translated Documents, with confidence rating for segments Translation Memory DB Step 1: Translation Memory Step 3: Documents Validation & Finalization Requirement: minimal disruption of this process & Machine Translation TM match score < 50 %: expensive 50 % < TM match score < 70 %: medium TM match score > 70 %: cheap

Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost

Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted

Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology

Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology Deal with markup, tags …

Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology Deal with markup, tags … Produce true-cased translations

Key Integration Challenges Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008] Linking MT automatic evaluation metrics with post- editing cost Ensuring that MT omissions are highlighted Enforcing customer terminology Deal with markup, tags … Produce true-cased translations Integrate into pre-existing workflows!

Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small

Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF

Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF Significant role in CNGL demonstrators

Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF Significant role in CNGL demonstrators Research tools  Industrial prototypes

Concluding Remarks For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students Large interest from industrial partners, both large and small Input from LOC, DCM and SF Significant role in CNGL demonstrators Research tools  Industrial prototypes Well placed to succeed in going ‘beyond TMs’ …

Speech & Language Technologies in the NGL CSET Thanks for listening! Questions?