L10N Standards Warszawa 2014

Slides:



Advertisements
Similar presentations
Can I Use It, and If so, How? Christian Lieske SAP AG – MultiLingual Technology Discussion of Consortium Proposal for OLIF2 File Header.
Advertisements

Using Open Standards: Save Money and Meet Customer Needs Using Open Standards: Save Money and Meet Customer Needs John Watkins, President, ENLASO
Josep Bonet Heras & Olaf-Michael Stefanov On behalf of Alan Melby and Tomas Carrasco Benitez May 2013, Nairobi, Kenya.
Implementing the XLIFF Format Dell Inc. and Adams Globalization Michael MacGregor – Dell Inc. Vivek Anand – Adams Globalization LISA Summit June, 2006.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
Tutorial 1 Getting Started with HTML5
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
XML Primer. 2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)
XML Introduction By Hongming Yu Feb 6 th, Index Markup Language: SGML, HTML, XML An XML example Why is XML important XML introduction XML applications.
Developing a Basic Web Page with HTML
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
CNIT 133 Interactive Web Pags – JavaScript and AJAX Review HTML5.
Skip 2013 Inline XBRL vs. XBRL for Financial Reporting UWCISA Symposium on Information Integrity & IS Assurance - Toronto, Oct 3, Clinton E. White,
By: Shawn Li. OUTLINE XML Definition HTML vs. XML Advantage of XML Facts Utilization SAX Definition DOM Definition History Comparison between SAX and.
JXON An Architecture for Schema and Annotation Driven JSON/XML Bidirectional Transformations David A. Lee Senior Principal Software Engineer Slide 1.
San José, CA – September, 2004 Localizing with XLIFF and ICU Markus Scherer Raghuram (Ram) Viswanadha IBM San.
Creating a Simple Page: HTML Overview
XP 1 HTML: The Language of the Web A Web page is a text file written in a language called Hypertext Markup Language. A markup language is a language that.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XML eXtensible Markup Language w3c standard Why? Store and transport data Easy data exchange Create more languages WSDL (Web Service Description Language)
HTML (HyperText Markup Language)
Copyright OASIS, 2002 OASIS - LISA Global e-Business Survey.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Open Standards A winner or a loser? Terence Mac Goff, 3 rd June 2004.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
From Code to XLIFF Bridging the Chasm Dr. Stephen Flinter Connect Global Solutions LRC Conference – 19 November 2003.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.
Internationalization: Implementing the XLIFF Standard Jon Allen, Producer instructional media + magic, inc. JA-SIG Summer Conference 2003 June 10, 2003.
XP 2 HTML Tutorial 1: Developing a Basic Web Page.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
Tuesday, November 12, 2002 LRC 2002 Conference XLIFF An XML standard for localisation Tony Jewtushenko – Oracle Peter Reynolds – Bowne Global Solutions.
DITA Single Source technology. What is Single Source? Single source technology is a concept of publishing documents when same content can be used in different.
(C) 2014 Logrus International Visualizing ITS 2.0 Categories for the localization process.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Open Source CAT Tool Patrícia Azeredo Ivone Ferreira IT for Translation 2009/2010.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
XLIFF 2.0 vs XLIFF 1.2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.
PASSOLO ® Makes Your Software Ready for the Global Market Localisation Standards The Tools Developer’s Perspective.
Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Andy Dawson LIS1510 Library and Archives Automation Issues XML and extensible systems Andy Dawson School.
Standards that might come up in discussion today EN 15038: quality standard developed especially for translation services providers, including regular.
Unit 3 — Advanced Internet Technologies Lesson 10 — Introduction to XHTML.
Module Road Map Assignment Road Map Notice we have linked the conduit directly to the presentation layer. This is normally a bad idea!
Copyright OASIS, 2002 OASIS - LISA Global e-Business Survey.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Extensible Markup Language
1 Model Driven Health Tools Design and Implementation of CDA Templates Dave Carlson Contractor to CHIO
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
© 2005 KPIT Cummins Infosystems Limited We value our relationship XML Publisher Prafulla Kauthalkar RJTSB – Oracle Apps Consultant We value our relationship.
A report by Olaf-Michael Stefanov to the JIAMCATT community
Unit 4 Representing Web Data: XML
Introducing the technology
Open Source CAT Tool.
Localization industry Word Count Standard Andrzej Zydroń CTO XTM Intl
Markup Languages Gilok Choi 9/17/2018
Building the Localization Web
Chapter 7 Representing Web Data: XML
Translation Workspace File Filters
Part of the Multilingual Web-LT Program
DITA Translation Management Challenges in Japan
XML Introduction By Hongming Yu Feb 6th, 2002.
Use Cases Simple Machine Translation (using Rainbow)
DITA Overview – Build the case for DITA
Presentation transcript:

L10N Standards Warszawa

Why Standards?

Why have Standards?

L10N Standards What are we going to cover: 1.Why L10N standards are important 2.The role XML has to play 3.Key L10N standards data standards 4.How to leverage L10N standards 5.Creating a totally data driven automated L10N process 6.Interoperability

Why have Standards?

Current State of Art

L10N Typical Workflow

What you need is a better crane!???

Localization without Standards Customer source text extract extracted text tm process prepared text translate translated text target text merge target text QA

True Cost of Translation

Standards = Uniform Data

ISO Standard

Standards = Efficiency

Standards = Lower Costs

Standards = Safe to Implement

Standards = Greater Interoperability

Standards: Unforeseen Benefits

Standards: Misuse imap://azydron%40xml-intl%40xml- intl.com:143/fetch%3EUID%3E.INBOX%3 E87222?part=1.2&filename=image003.jpg

Standards: Abuse

Standards: Sabotage Sabotaged Standards: Proprietary extensions Bad implementations

The importance of XML Everything is now XML HTML/XHTML Web Services Adobe FrameMaker Microsoft Office Open Office ASP XAML Java Properties DITA Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm OAXAL Open Architecture for XML Authoring and Localization

The power of XML Any electronic format not in XML can be converted to XML Frame Maker RTF Microsoft Office pre 2007 Quark Express Windows resource files Java resources PO/POT YAML Etc. And then back into the original format

Benefits of XML for L10N Separation of form and content Should make documents easier to translate There are some critical design decisions Mistakes can hinder translatability XML can bootstrap its own localization

The significance of XML XML is not just another electronic format XML is an eXtensible syntax XML is a formal IT grammar XML is programmable XML is can bootstrap its own localization

Benefits of XML for L10N Why use XML for Localization? Most localizable documents are now in XML One input format Elegant Uses the latest IT technology Separation of source and content One single data bus Open Standards based You can use XML assist its own localization One extraction + TM + SMT engine

Core L10 Standards W3C ITS Document Rules ETSI LIS SRX ETSI LIS xml:tm ETSI LIS TMX ETSI LIS TBX ETSI LIS GMX OASIS XLIFF W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary) Linport Interoperability: TIPP XLIFF:doc

ITS Internationalization and Localization Tag Set – Internationalization Tag Set – Document Rules for a given XML vocabulary: – Inline elements (within text)‏ – Sub flows – Non-translatable – Translatable attributes Guidelines for localizing XML documents Internationalization and Localization Markup Requirements Version 1.0, 2008 Version 2.0, 2013

p.pdf Translation Memory Exchange Current version 1.4b, 2.0 undergoing review Allows for the interchange of translation memories between different vendor systems – No translation vendor lock-in – Free exchange of translation assets TMX

First LISA OSCAR Standard – Version – Version – Version – Version 1.4b 2002 Moved to ETSI/LIS 2012 – Version ? Two level of implementation: – Level 1 (Plain Text Only) – Level 2 (Content Markup)‏ TMX History

Segmentation Rules Exchange Current version How sentences are segmented Allows for the exchange of segmentation rules using regular expressions Complements TMX standard Quoted XLIFF, TMX and xml:tm SRX

Unicode Regular expression syntax defined Meta characters – Unicode regular expressions: "\X", "\s", "\S" etc. Operators – "*", "|", "?", "+" etc. Defines: – Language rules: segmentation rules – Map rules: how to apply the segmentation rules SRX Key Concepts

GMX Global Information Management Metrics eXchange GMX/V Approved LISA OSCAR Standard February 2007 Tripartite – GMX-V : Volume, published for public comment – GMX-C : Complexity, initial specification – GMX-Q : Quality Standard for defining a L10N job Allows for quantifying job complexity GMX/V 2.0 Approved ETSI LIS – added support for CJK word counts – overall character count including white space characters

GIM Metrics eXchange – Volume Objectives: – Unambiguous and verifiable definition of word and character counts – A method of exchanging counts within an XML framework Two types of count: – Verifiable, based on electronic documents – Non-verifiable Canonical form: XLIFF based Word boundaries: Unicode TR29 Unicode character encoding Minimum conformance – Total Character Count – Total Word Count GMX-V

XLIFF XLIFF – XML Localization Interchange File Format Current status – XLIFF 1.1 Committee Specification (31 Oct 2003)‏ – XLIFF 1.2 Approved as an OASIS Standard 2008 Segmentation support (X)HTML XLIFF 1.1 Representation Guide PO / POT XLIFF 1.1. Representation Guide Java / Windows /.Net Representation Guide – XLIFF 2.0 currently out for public comment (not backwards compatible)

XLIFF

Single format for exchanging L10N from disperate sources Loss-less Tool-neutral Formalized as an XML vocabulary Can embed skeleton file XLIFF

xml:tm XML based Text Memory – Radical rethink of how to handle Translation Memory – Donated by XML INTL to LISA OSCAR – OSCAR Standard Feb 2007 – Adopted by ETSI LIS, version 2.0 ready for adoption Takes the DITA reuse principle down to sentence level – Author Memory – Translation Memory

xml:tm - Namespace Namespace is a major feature of XML Allows the mapping of different ontological entities onto the same representation Allows different ways to look at the same data Namespaces can be made transparent

xml:tm XML based text memory Revolutionary approach to translating XML documents First significant advance in translation memory technology Uses XML namespace to transparently embed contextual information The one ring that binds them all

xml:tm namespace Example of the use of tm namespace in an XML document: Namespace is very flexible. It is very easy to use.

xml:tm namespace doc title section para tm te sentence tu te sentence tu te sentence tu Source document tm namespace view te text tu text te sentence tu para text para text para text para text para text te sentence tu te sentence tu text Source document view

xml:tm Text Memory Author memory Maintain memory of source text Authoring statistics Authoring tool input Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs

xml:tm DOM differencing tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Original Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” deleted tu id=”8” modified new Updated Source Document DOM Differencing

xml:tm translated document in Polish doc title section para tm te zdanie tu te zdanie tu te zdanie tu Translated document tm namespace view te tekst tu tekst te zdanie tu para tekst para tekst para tekst para tekst para tekst te zdanie tu te zdanie tu tekst Translated document view

Putting It All Together

Open Architecture for XML Authoring and Localization (OAXAL) –

OAXAL 2.0

OAXAL Benefits SOA (Service Oriented Architecture) Open Architecture Open Standards - Open APIs Easy Exchange Modular design Interoperability Very high level of automation

Interoperability Now!/Linport Interoperability Now! Born out of frustration and necessity Early 2012 Members Bioloom Group Kilgray Medtronic Ontram Spartan Software XTM-INTL The goal: True 100% roundtrip interoperability between TMS/CAT tools Now part of Linport

Interoperability Now!/Linport Linport LINPortLanguage INteroperability Portfolio Created in 2012 by the merging of two initiatives: Multilingual Electronic Dossier The Container Project Sponsored: the European Union DG Translation JAIMCATT ( - Joint Inter-Agency Meeting on Computer-Assisted Translation and Terminology

OAXAL in Action

Translating English Soccer Articles into Arabic 24x7

Browser-Based Workbench

OAXAL In Action

Contact details: Andrzej Zydroń