L10N Standards Warszawa
Why Standards?
Why have Standards?
L10N Standards What are we going to cover: 1.Why L10N standards are important 2.The role XML has to play 3.Key L10N standards data standards 4.How to leverage L10N standards 5.Creating a totally data driven automated L10N process 6.Interoperability
Why have Standards?
Current State of Art
L10N Typical Workflow
What you need is a better crane!???
Localization without Standards Customer source text extract extracted text tm process prepared text translate translated text target text merge target text QA
True Cost of Translation
Standards = Uniform Data
ISO Standard
Standards = Efficiency
Standards = Lower Costs
Standards = Safe to Implement
Standards = Greater Interoperability
Standards: Unforeseen Benefits
Standards: Misuse imap://azydron%40xml-intl%40xml- intl.com:143/fetch%3EUID%3E.INBOX%3 E87222?part=1.2&filename=image003.jpg
Standards: Abuse
Standards: Sabotage Sabotaged Standards: Proprietary extensions Bad implementations
The importance of XML Everything is now XML HTML/XHTML Web Services Adobe FrameMaker Microsoft Office Open Office ASP XAML Java Properties DITA Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm OAXAL Open Architecture for XML Authoring and Localization
The power of XML Any electronic format not in XML can be converted to XML Frame Maker RTF Microsoft Office pre 2007 Quark Express Windows resource files Java resources PO/POT YAML Etc. And then back into the original format
Benefits of XML for L10N Separation of form and content Should make documents easier to translate There are some critical design decisions Mistakes can hinder translatability XML can bootstrap its own localization
The significance of XML XML is not just another electronic format XML is an eXtensible syntax XML is a formal IT grammar XML is programmable XML is can bootstrap its own localization
Benefits of XML for L10N Why use XML for Localization? Most localizable documents are now in XML One input format Elegant Uses the latest IT technology Separation of source and content One single data bus Open Standards based You can use XML assist its own localization One extraction + TM + SMT engine
Core L10 Standards W3C ITS Document Rules ETSI LIS SRX ETSI LIS xml:tm ETSI LIS TMX ETSI LIS TBX ETSI LIS GMX OASIS XLIFF W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary) Linport Interoperability: TIPP XLIFF:doc
ITS Internationalization and Localization Tag Set – Internationalization Tag Set – Document Rules for a given XML vocabulary: – Inline elements (within text) – Sub flows – Non-translatable – Translatable attributes Guidelines for localizing XML documents Internationalization and Localization Markup Requirements Version 1.0, 2008 Version 2.0, 2013
p.pdf Translation Memory Exchange Current version 1.4b, 2.0 undergoing review Allows for the interchange of translation memories between different vendor systems – No translation vendor lock-in – Free exchange of translation assets TMX
First LISA OSCAR Standard – Version – Version – Version – Version 1.4b 2002 Moved to ETSI/LIS 2012 – Version ? Two level of implementation: – Level 1 (Plain Text Only) – Level 2 (Content Markup) TMX History
Segmentation Rules Exchange Current version How sentences are segmented Allows for the exchange of segmentation rules using regular expressions Complements TMX standard Quoted XLIFF, TMX and xml:tm SRX
Unicode Regular expression syntax defined Meta characters – Unicode regular expressions: "\X", "\s", "\S" etc. Operators – "*", "|", "?", "+" etc. Defines: – Language rules: segmentation rules – Map rules: how to apply the segmentation rules SRX Key Concepts
GMX Global Information Management Metrics eXchange GMX/V Approved LISA OSCAR Standard February 2007 Tripartite – GMX-V : Volume, published for public comment – GMX-C : Complexity, initial specification – GMX-Q : Quality Standard for defining a L10N job Allows for quantifying job complexity GMX/V 2.0 Approved ETSI LIS – added support for CJK word counts – overall character count including white space characters
GIM Metrics eXchange – Volume Objectives: – Unambiguous and verifiable definition of word and character counts – A method of exchanging counts within an XML framework Two types of count: – Verifiable, based on electronic documents – Non-verifiable Canonical form: XLIFF based Word boundaries: Unicode TR29 Unicode character encoding Minimum conformance – Total Character Count – Total Word Count GMX-V
XLIFF XLIFF – XML Localization Interchange File Format Current status – XLIFF 1.1 Committee Specification (31 Oct 2003) – XLIFF 1.2 Approved as an OASIS Standard 2008 Segmentation support (X)HTML XLIFF 1.1 Representation Guide PO / POT XLIFF 1.1. Representation Guide Java / Windows /.Net Representation Guide – XLIFF 2.0 currently out for public comment (not backwards compatible)
XLIFF
Single format for exchanging L10N from disperate sources Loss-less Tool-neutral Formalized as an XML vocabulary Can embed skeleton file XLIFF
xml:tm XML based Text Memory – Radical rethink of how to handle Translation Memory – Donated by XML INTL to LISA OSCAR – OSCAR Standard Feb 2007 – Adopted by ETSI LIS, version 2.0 ready for adoption Takes the DITA reuse principle down to sentence level – Author Memory – Translation Memory
xml:tm - Namespace Namespace is a major feature of XML Allows the mapping of different ontological entities onto the same representation Allows different ways to look at the same data Namespaces can be made transparent
xml:tm XML based text memory Revolutionary approach to translating XML documents First significant advance in translation memory technology Uses XML namespace to transparently embed contextual information The one ring that binds them all
xml:tm namespace Example of the use of tm namespace in an XML document: Namespace is very flexible. It is very easy to use.
xml:tm namespace doc title section para tm te sentence tu te sentence tu te sentence tu Source document tm namespace view te text tu text te sentence tu para text para text para text para text para text te sentence tu te sentence tu text Source document view
xml:tm Text Memory Author memory Maintain memory of source text Authoring statistics Authoring tool input Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs
xml:tm DOM differencing tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Original Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” deleted tu id=”8” modified new Updated Source Document DOM Differencing
xml:tm translated document in Polish doc title section para tm te zdanie tu te zdanie tu te zdanie tu Translated document tm namespace view te tekst tu tekst te zdanie tu para tekst para tekst para tekst para tekst para tekst te zdanie tu te zdanie tu tekst Translated document view
Putting It All Together
Open Architecture for XML Authoring and Localization (OAXAL) –
OAXAL 2.0
OAXAL Benefits SOA (Service Oriented Architecture) Open Architecture Open Standards - Open APIs Easy Exchange Modular design Interoperability Very high level of automation
Interoperability Now!/Linport Interoperability Now! Born out of frustration and necessity Early 2012 Members Bioloom Group Kilgray Medtronic Ontram Spartan Software XTM-INTL The goal: True 100% roundtrip interoperability between TMS/CAT tools Now part of Linport
Interoperability Now!/Linport Linport LINPortLanguage INteroperability Portfolio Created in 2012 by the merging of two initiatives: Multilingual Electronic Dossier The Container Project Sponsored: the European Union DG Translation JAIMCATT ( - Joint Inter-Agency Meeting on Computer-Assisted Translation and Terminology
OAXAL in Action
Translating English Soccer Articles into Arabic 24x7
Browser-Based Workbench
OAXAL In Action
Contact details: Andrzej Zydroń