Use Cases Simple Machine Translation (using Rainbow)

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Advanced XSLT II. Iteration in XSLT we sometimes wish to apply the same transform to a set of nodes we iterate through a node set the node set is defined.
Bringing Procedural Knowledge to XLIFF Prof. Dr. Klemens Waldhör TAUS Labs & FOM University of Applied Science FEISGILTT 16 October 2012 Seattle, USA.
Oracle SQL Developer Data Modeler 3.0: Technical Overview March 2011.
AS ICT Finding your way round MS-Access The Home Ribbon This ribbon is automatically displayed when MS-Access is started and when existing tables.
SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
Integrated Imaging and Document Management System Product Demonstration.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Components and Architecture CS 543 – Data Warehousing.
Tutorial 11: Connecting to External Data
XIS™ XML Intranet System. XIS, the XML Intranet System provides the foundation for your database production and management. XIS maximizes the flexible.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries.
Obsydian OLE Automation Ranjit Sahota Chief Architect Obsydian Development Ranjit Sahota Chief Architect Obsydian Development.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Metadata Normalisation in Europeana The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Tyler Snow Brigham Young University Translation Research Group.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
(C) 2014 Logrus International Visualizing ITS 2.0 Categories for the localization process.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
PATROL® Enterprise Manager
Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.
Presentation Name / 1 Visual C++ Builds and External Dependencies NAME.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Machine Translate Post Edit Quality Check Extract Content I18N Text Analysis Curate Corpora Workflow Analysis Segment Identify Terms Translate Provenance.
ITS 2.0 in XLIFF 2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Steps to integrate XML How does XML processing work? Simple uses of passive DOM objects Adding behaviour to information A converter and translator subsystem.
F EDORA 4 – R UMORS & T HOUGHTS Mark Bussey Chief Information Leafblower.
SNU OOPSLA Lab. A Tour of XML © copyright 2001 SNU OOPSLA Lab.
A report by Olaf-Michael Stefanov to the JIAMCATT community
Product Training Program
Integrating ArcSight with Enterprise Ticketing Systems
Unit 4 Representing Web Data: XML
RSA Model Builder B-Spec Review
z/Ware 2.0 Technical Overview
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Ikan Software NV CWD4ALL
Play Framework: Introduction
A Lightweight Structured Data Implementation Using JSON-LD and Schema
Accelerate define.xml using defineReady - Saravanan June 17, 2015.
课程名 编译原理 Compiling Techniques
Introduction to Database Systems
Knowledge Byte In this section, you will learn about:
Microsoft Office Illustrated
Workshop on XML-Based Library Applications 5
Database Processing with XML
Chapter 7 Representing Web Data: XML
Populating a Data Warehouse
Translation Workspace File Filters
Part of the Multilingual Web-LT Program
Populating a Data Warehouse
DITA Translation Management Challenges in Japan
What is XML?.
Web Development Using ASP .NET
Database Design Hacettepe University
IBM SCPM PIT Data Download/Upload
Part of the Multilingual Web-LT Program
Principles of Programming Languages
Prepared by Peter Boško, Luxembourg June 2012
An Introduction to JavaScript
Eurostat Unit B3 – IT and standards for data and metadata exchange
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

Some Use Cases with the Current Okapi Framework Implementation of ITS 2.0 Prague – September 2012

Use Cases Simple Machine Translation (using Rainbow) Translation Package Creation (using Rainbow) Moses Translation (M4Loc, sort of…) (using Tikal, a command-line tool) Quality Check (using CheckMate)

ConformanceTest Output File with ITS Markup Filters Design XML / HTML5 Filters XML or HTML5 Parser ITS Engine Decorated Tree Extractor / Mapper Test Output Generator ConformanceTest Output Know about XML or ITS Do not know about XML Know about ITS data categories (not their notation) Events-Driven API

Simple Machine Translation XML and HTML5 documents are translated using a machine translation system, such as Microsoft Translator. The documents are extracted based on their ITS properties and the extracted content is send to the translation server. The translated content is then merged back into its original XML or HTML5 format. Translate Locale Filter Element Within Text Preserve Space (Domain) The ITS markup provides the key information that drives the extraction in both XML and HTML5. Information such as preserving white space can also be passed on to the extracted content and insure a better output.

Simple Machine Translation Translate - The non-translatable content is protected. Locale Filter - Only the parts in the scope of the locale filter are extracted, the others are treated as 'do not translate' content. Element Within Text - The information is used to decide what elements are extracted as in-line codes and sub-flows. Preserve Space - The information is passed on to the extracted text unit. (Domain) - The domain values are placed into a property that can be used to select an MT engine.

Demonstration… Simple Machine Translation File with ITS Markup XML / HTML5 Filters Know about XML or ITS Do not know about XML or ITS notation Raw Document to Filter Events Filter Events to Raw Document MS Batch Translation Extracted Resources Original Format Demonstration…

Translation Package Creation XML and HTML5 documents are extracted into a translation package based on XLIFF. The documents are extracted based on their ITS properties. The extracted content goes through various preparation steps and save into an XLIFF package. The ITS metadata passed on and carried by the extracted content are used by some steps. Translate Locale Filter Element Within Text Preserve Space Id Value Domain Storage Size External Resource Terminology Localization Note Allowed Characters The ITS markup provide the key information that drives he extraction in both XML and HTML5. The documents to localize can be compared against older version of the same documents using ID to retrieve match the entries, and existing translations can be retrieved automatically. Information such as the domain of the content, external references, localization notes are available in the XLIFF document so any tool can make use of them to provide various translation assistance. Terms in the source content are identified and can be matched against a terminology database. Constraints about storage size and allowed characters can be verified directly by the translators as they work.

Translation Package Creation Translate - The non-translatable content is protected. Locale Filter - Only the parts in the scope of the locale filter are extracted, the others are treated as 'do not translate' content. Element Within Text - The information is used to decide what elements are extracted as in-line codes and sub-flows. Preserve Space - The information is mapped to xml:space. Id Value – The value is mapped to the name of the extracted text unit. Domain – The values are placed into an <okp:itsDomains> element. Storage Size – The size is placed in maxbytes, and the native ITS markup is used for the other properties. External Resource - The URI is placed in a okp:itsExternalResource attribute. Terminology - The terminology information is placed into a specialized XLIFF note element. Localization Note - The text is placed into an XLIFF note. Allowed Characters - The pattern is placed in its:allowedCharacters.

Raw Document to Filter Events Translation Package Creation File with ITS Markup XML / HTML5 Filters Know about XML or ITS Do not know about XML or ITS notation Raw Document to Filter Events Package Creation ID-Based Leveraging Extracted Resources XLIFF Package Demonstration…

Moses Translation (M4Loc) XMl and HTML5 documents are translated using Moses through the M4Loc scripts. Note: In this demo we use sed instead of M4Loc scripts The documents are extracted based on their ITS properties by Tikal and converted into an intermediate format. The temporary files are run through the translation process. Tikal is then used again to create a translated version of the XML and HTML5 documents based on the original source documents and the translated intermediate files. Translate Locale Filter Element Within Text Preserve Space Domain The ITS markup provides the key information that drives the extraction in both XML and HTML5. Information such as preserving white space can also be passed on to the extracted content and insure a better output.

Moses Translation (M4Loc) Translate - The non-translatable content is protected. Locale Filter - Only the parts in the scope of the locale filter are extracted, the others are treated as 'do not translate' content. Element Within Text - The information is used to decide what elements are extracted as in-line codes and sub-flows. Preserve Space - The information is passed on to the extracted text unit. (Domain) - The domain values are placed into a property that can be used to select an MT engine.

Demonstration… Moses Translation (M4Loc) File with ITS Markup Raw Document to Filter Events Moses File Creation File with ITS Markup Tikal -xm Moses Source M4Loc (sed in this demo) Translated Document Tikal -lm Moses Translation Filter Events to Raw Document Moses Leveraging Raw Document to Filter Events Demonstration…

Quality Check Translate Locale Filter Element Within Text XML, HTML5 and XLIFF documents are read with ITS and loaded into CheckMate, a tool that performs various quality verifications. The XML and HTML5 documents are extracted based on their ITS properties, and their ITS metadata are mapped into the extracted content. The XLIFF document is also extracted and its ITS-equivalent metadata also mapped. The constraints defined with ITS are verified using checkMate. Translate Locale Filter Element Within Text Preserve Space Id Value Storage Size Allowed Characters The ITS markup provides the key information that drives the extraction in both XML and HTML5. The set of ITS metadata carried in the files allows the three file formats to be handled the same way by the verification tool.

Quality Check Translate - The non-translatable content is protected. Locale Filter - Only the parts in the scope of the locale filter are extracted, the others are treated as 'do not translate' content. Element Within Text - The information is used to decide what elements are extracted as in-line codes and sub-flows. Preserve Space - The information is mapped to the preserveSpace field in the extracted text unit. Id Value - The ids are used to identify the entries with an issue. Storage Size - The content is verified against the storage size constraints. Allowed Characters - The content is verified against the pattern matching allowed characters.

Raw Document to Filter Events Quality Check On-Screen List File with ITS Markup CheckMate Report Output Raw Document to Filter Events Quality Check Demonstration…

More Information Project wiki: http://www.opentag.com/okapi/wiki/ Project source code: http://code.google.com/p/okapi/ Continuous integration: https://okapi.ci.cloudbees.com/ Maven repositories: http://repository-okapi.forge.cloudbees.com/release/ http://repository-okapi.forge.cloudbees.com/snapshot/ Developers mailing list: https://groups.google.com/group/okapi-devel/