Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Catalyst Preview Enda McDonnell Alchemy User Conference London 2012 London Science Museum 31 May 2012.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
Addingvelocityraisingqualitycuttingcosts. Opticentre is the first BPO Globalization technology centre of excellence Opticentre helps clients make lasting.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Implementing the XLIFF Format Dell Inc. and Adams Globalization Michael MacGregor – Dell Inc. Vivek Anand – Adams Globalization LISA Summit June, 2006.
Computer Assisted Translation CAT Alexander C. Wu
L10N Standards Warszawa 2014
THE TRANSLATION NETWORK Overview  Easily manage your multilingual sites  Synchronize content and manage changes  Translate content on the fly  Use.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Information Retrieval in Practice
Publishing Workflow for InDesign Import/Export of XML
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Requirements Specification
Computer Assisted Translation CAT Alexander C. Wu Fall 2004.
1 COS 425: Database and Information Management Systems XML and information exchange.
Data Warehouse success depends on metadata
System Integration (Cont.) Week 7 – Lecture 2. Approaches Information transfer –Interface –Database replication –Data federation Business process integration.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Overview of Search Engines
An innovative platform to allow translation and indexing of internet sites Localization World
Lecture 04.  DTP  Some features and their configuration  Fields and Filters  Summary.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Gabriela Contreras, Continental Airlines Yvan Hennecart, SDL
XP Tutorial 7New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with Cascading Style Sheets Creating a Style for Online Scrapbooks.
15 November 2005Linking Outside the Box1 Cross referencing between XML documents Bob Stayton Sagehill Enterprises
FLAVIUS Technical presentation (Overblog, Qype, TVTrip) - WP2 Platform architecture.
How Global Companies Can Close the Globalization Gap with DITA
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Open Standards A winner or a loser? Terence Mac Goff, 3 rd June 2004.
MultilingualWeb – Language Technology A New W3C Working Group Felix Sasaki, David Filip, David Lewis.
26 June 2008 DG REGIO Evaluation Network Meeting Ex-post Evaluation of Cohesion Policy Programmes co-financed by the European Fund for Regional.
Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.
Translingual Europe Panel Discussion Competition of MT Paradigms Where is RBMT? Translingual Europe Prague May 13, 2009 Daniel Grasmick.
TRANSLATION MEMORY TECHNOLOGY
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
Open Source CAT Tool Patrícia Azeredo Ivone Ferreira IT for Translation 2009/2010.
PASSOLO ® Makes Your Software Ready for the Global Market Localisation Standards The Tools Developer’s Perspective.
Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
Introduction to the Semantic Web and Linked Data
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
©Silberschatz, Korth and Sudarshan10.1Database System Concepts W3C - The World Wide Web Consortium W3C - The World Wide Web Consortium.
Standards that might come up in discussion today EN 15038: quality standard developed especially for translation services providers, including regular.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Combining GATE and UIMA Ian Roberts. 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE and UIMA.
XP Tutorial 7New Perspectives on HTML and XHTML, Comprehensive 1 Working with Cascading Style Sheets Creating a Style for Online Scrapbooks Tutorial 7.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
© 2005 KPIT Cummins Infosystems Limited We value our relationship XML Publisher Prafulla Kauthalkar RJTSB – Oracle Apps Consultant We value our relationship.
Information Retrieval in Practice
Working with Cascading Style Sheets
Advanced Programing practices
Introducing the technology
Open Source CAT Tool.
Building the Localization Web
Translation Workspace File Filters
DITA Translation Management Challenges in Japan
Localization Summit 1.
Patents e-Commerce Update: Public and Private PAIR
Advanced Programing practices
The Translation Management System for Global Enterprises
Patents e-Commerce Update: Public and Private PAIR
Use Cases Simple Machine Translation (using Rainbow)
User’s Perspective Laurie Gerber.
DITA Overview – Build the case for DITA
Presentation transcript:

xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents

Machine Translation Translation Memory Hybrid Linguistic Inferencing Engines Terminology Computational Linguistic Methodologies

Automating Translation Machine translation 40 year history Rigorous control of grammar and terminology can produce very good results Enormous amount of work left to achieve free format translation.

Translation Memory Align source and target text Look up new text against memory Relatively primitive technology No advance over past 30 years Need for proofing Proprietary translation memory formats

XML inherently easier to translate Separation of form and content Support for Unicode and other international encoding formats. Allows multiple output formats - PDF, XHTML, WAP Translating XML Documents

XML Translation Standards LISA - Localization Industry Standards Association: OASIS - Organization for the Advancement of Structured Information Standards: W3C - World Wide Web Consortium: OLIF Consortium:

LISA Standards TMX - Translation Memory Exchange format: TBX - Termbase Exchange format: SRX - Segmentation Rules Exchange format: GMX - GILT Metrics Exchange format:

OASIS L10N Standards XLIFF - XML Localization Interchange File Format: open.org/committees/tc_home.php?wg_abbrev=xli ff TransWS - Translation Web Services: open.org/committees/tc_home.php?wg_abbrev=tra ns-ws open.org/committees/tc_home.php?wg_abbrev=tra ns-ws DITA – Darwin Information Technology Architecture open.org/committees/tc_home.php?wg_abbrev=dit ahttp:// open.org/committees/tc_home.php?wg_abbrev=d

W3C and OLIF W3C ITS OLIF - Open Lexicon Interchange Format:

XML namespace Major feature of XML Allows the mapping of different ontological entities onto the same representation Allows different ways to look at the same data Namespaces can be made transparent

xml:tm XML based text memory Revolutionary approach to translating XML documents First significant advance in translation memory technology Uses XML namespace to transparently embed contextual information

xml:tm namespace Text Memory namespace Can be mapped onto any XML document Vertical view of document in terms of ‘text segments’ Can be totally transparent xml:tm

xml:tm namespace xml:tm Example of the use of tm namespace in an XML document: Namespace is very flexible. It is very easy to use.

xml:tm namespace doc title section para text tm te sentence tu te sentence tu te sentence tu tm namespace view original document view te text tu text te sentence tu para text para text para text para text para text te sentence tu te sentence tu

xml:tm namespace Namespace is very simple. It is easy to use. te sentence tu original document view tm namespace view Namespace is very simple. It is easy to use. text

xml:tm Text Memory Author memory Maintain memory of source text Authoring statistics Authoring tool input Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs xml:tm

Updated Source Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” new Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” xml:tm DOM differencing origid=”5” modified

xml:tm Author Memory Namespace aware DOM differencing Identify changes from the previous version Unique text unit identifiers are maintained Modification history Text units can be loaded into a database Authoring environment integration xml:tm

xml:tm Translation Memory The tm namespace can be used to create XLIFF files Automatic alignment of source and target languages Allows for more focused translation matching –Perfect matching –Leveraged matching from document - identical text –Leveraged matching from database –Modified text unit matching –Linguistically enhanced fuzzy matching –Non translatable text unit identification xml:tm

xml:tm translation Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” XLIFF Document trans-unit id=”1” trans-unit id=”2” trans-unit id=”3” trans-unit id=”4” trans-unit id=”5” trans-unit id=”6”

doc title section para tekst tm te zdanie tu te zdanie tu te zdanie tu translated tm namespace view translated document view te tekst tu tekst te zdanie tu para tekst para tekst para tekst para tekst para tekst te zdanie tu te zdanie tu xml:tm translated document

Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Perfect alignment xml:tm perfect alignment

Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” non trans tu id=”8” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” DB requires proofing DB leveraged match tu id=”2” requires no translation non translatable xml:tm matching

Traditional Translation Scenario xml:tm source text PublishingTranslation source text extract Extracted text tm process Prepared text Translate Translated text target text merge target text QA

xml:tm xml source text Publishing Translator extract Extracted text tm process Prepared text Translate xml target text merge Web perfect matching leveraged matching Automatic Process Web service/ interface QA Automatic Process xml:tm Translation Scenario

xml:tm benefits Enterprise level scalability Totally integrated within the XML framework Source text is automatically extracted and matched Word counts are controlled by the customer Text can be presented for translation via the web Online composition The most up to date translation is held by the customer Data is merged automatically at end of translation cycle All memory operations are totally automated Can be used transparently for relay translations Much cheaper to run More accurate – better matching xml:tm

Fully specified XML based standard: – xml-tm.html Maintained by xml-intl.com – – Detailed article on Offered for consideration as a Lisa standard xml:tm

Any questions? xml:tm