XLIFF 2.0 vs XLIFF 1.2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.

Slides:



Advertisements
Similar presentations
EIONET Training Zope Page Templates Miruna Bădescu Finsiel Romania Copenhagen, 28 October 2003.
Advertisements

Table, List, Blocks, Inline Style
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
HTML: HyperText Markup Language Hello World Welcome to the world!
XHTML Basics Web pages used to be written exclusively in html
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
An Introduction to XML Based on the W3C XML Recommendations.
Tutorial 6 Creating a Web Form
History Leading to XHTML
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Guide To UNIX Using Linux Third Edition
XML Primer. 2 History: SGML vs. HTML vs. XML SGML (1960) XML(1996) HTML(1990) XHTML(2000)
Introduction to XML This material is based heavily on the tutorial by the same name at
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
JXON An Architecture for Schema and Annotation Driven JSON/XML Bidirectional Transformations David A. Lee Senior Principal Software Engineer Slide 1.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Creating a Simple Page: HTML Overview
Chapter 3 Working with Text and Cascading Style Sheets.
Unit 1 – Developing a Web Page. Objectives:  Learn the history of the Web and HTML  Describe HTML standards and specifications  Understand HTML elements.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
ULI101 – XHTML Basics (Part II) What is Markup Language? XHTML vs. HTML General XHTML Rules Block Level XHTML Tags XHTML Validation.
Lesson 4: Using HTML5 Markup.  The distinguishing characteristics of HTML5 syntax  The new HTML5 sectioning elements  Adding support for HTML5 elements.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
Tutorial 1 Developing a Basic Web Page. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives – Lesson 1 Introduction to the.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
HTML history, Tags, Element. HTML: HyperText Markup Language Hello World Welcome to the world!
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
INTRODUCTION. What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language,
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
Waqas Anwar Next SlidePrevious Slide. Waqas Anwar Next SlidePrevious Slide XML XML stands for EXtensible Markup Language.
1 ADVANCED MICROSOFT WORD Lesson 14 – Editing in Workgroups Microsoft Office 2003: Advanced.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
What it is and how it works
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML for Text Markup An introduction to XML markup.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
INT222 - Internet Fundamentals Shi, Yue (Sunny) Office: T2095 SENECA COLLEGE.
Introduction to XML XML – Extensible Markup Language.
ITS 2.0 in XLIFF 2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XP Review 1 New Perspectives on JavaScript, Comprehensive1 Introducing HTML and XHTML Creating Web Pages with HTML.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
XML Extensible Markup Language
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
NOTEPAD++ Lab 1 1 Riham ALSmari. Why Notepad++ ?  Syntax highlighting  Tabbed document interface  Zooming  Indentation code  Find and replace over.
SNU OOPSLA Lab. A Tour of XML © copyright 2001 SNU OOPSLA Lab.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Creating a Well-Formed Valid Document
Part of the Multilingual Web-LT Program
Fundamentals of Python: First Programs
Part of the Multilingual Web-LT Program
Use Cases Simple Machine Translation (using Rainbow)
Presentation transcript:

XLIFF 2.0 vs XLIFF 1.2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by

Why a version specification had ambiguities and lacked constraints and processing requirements Issues to fix (e.g. cannot overlap) New features needed Portions of the 1.2 specification rarely used Despite those issues XLIFF 1.2 has been used successfully by many, in many scenarios  So XLIFF 2.0 should be used even more.

1.2 Issue: Specification ambiguities 2.0 remedies: Definitions try to be clearer Constraints and processing requirements are everywhere in the specification Provide more examples to illustrate the intent of the elements/attributes A test suite is available More and earlier implementations

1.2 Issue: Features creep 2.0 remedies: Mitigated through modularity (Descartes’ Method: break down large problems into small ones)  Allows to addresses smaller more specific problems one at a time: XLIFF 2 is a mosaic of “small standards” Plan for incremental changes of the specification. This is possible because of the modularity (Modifications have no or less impact on tools)

1.2 Issue: Too complex 2.0 remedies: Modularity again: it allows for simple to complex content: – Simple XLIFF = just the Core – Additional features added through modules  You implement/use only what you need in your workflow One way to do one thing: e.g. inline codes

2.0 vs 1.2 main differences 2.0 is not backward compatible with 1.x  Allows deep re-design - to break things into modules - different segmentation representation 1.2 has many features and they have not all been ported to 2.0  idea is to add specialized modules over time

Language pairs In 1.2: Each in the document can be in a different language pair. In 2.0: All in the document are in the same language pair. The srcLang and trgLang attributes are set on the element.

Start of document in 1.2 xliff 1 file+ header? skl? phase-group* (glossary|reference|note|tool count-group|prop-group| extension)* body 1 (group|trans-unit|bin-unit)*

Start of document in 2.0 xliff 1 file+ skeleton? (module|extension)* notes? (group|unit)+

The trans-unit in 1.2 trans-unit source 1 seg-source? mrk* target? mrk* (context-group|count-group| prop-group|note|alt-trans)* extension*

The unit in 2.0 unit (module|extension)* notes? originalData? (segment+|ignorable)* source 1 target?

Segmentation in 1.2 Sentence 1. Sentence 2. Sentence 1. Sentence 2. Phrase 1. Phrase 2.

Segmentation in 2.0 Sentence 1. Phrase 1. Sentence 2. Phrase 2.

Segmentation in 2.0 New canResegment attribute to allow or not to re-segment. Available on,, and (not available in 1.2) New order attribute on to have the target in a different order than the source (not needed in 1.2)

Target state 1.2 has state (final, needs-adaptation, needs-l10n, needs-review-adaptation, needs- review-l10n, needs-review-translation, needs- translation, new, signed-off, translated) 2.0 has state (initial, translated, reviewed, final) + subState with a custom value

Target state qualifier In 1.2: Mix of values for target in and (e.g. where the translation comes from as well as why it was rejected) In 2.0: – Translation Candidates module’s has a type and subType – No specific qualifier for the current translation in but subState is available

Inline content – Original codes In 1.2: Many elements (,,,,,,, ) and storing the original code is done within the segment. Also: conversion between equivalent elements (like and / ) is not lossless. In 2.0: Fewer elements:,, and. Original codes optionally stored outside the segment. Lossless conversion.

Inline content – Original codes In 1.2: Line 1. <BR>Line 2. In 2.0: <BR> … Line 1. Line 2.

Inline content – Original codes 1.2 has only one editing hint: clone (and not on all inline elements) 2.0 has canCopy (equivalent to clone ), canOverlap, canDelete and canReorder as well as mechanism to create new inline codes. Has also constraints and processing requirements associated with these flags.

Inline content – Sub-flows In 1.2: in element within the code content (possibly recursively) Or in a separate but without interoperable link. In 2.0: Elements for inline codes have a subFlows attribute to point to another where the sub-flow text is located.

Codes – Text representation 1.2: equiv-text attribute 2.0: two attributes: – equiv provides a text equivalent (same as 1.2: provides empty string, spaces, line breaks, etc. in plain text) – disp provides a user-friendly display (e.g. to display some context for a variable)

Inline content – Annotations In 1.2: In 2.0: (structural, not annotation) In 1.2: In 2.0:

Inline content – Annotations In 1.2: No way to have overlapping  Important obstacle to implement any type of annotation, for example another standard such as ITS. In 2.0: Use and lossless conversion with...

The translate attribute In 1.2: translate="yes|no" on, and (but no way to un- protect nested content) In 2.0: translate="yes|no" on:,, and  does not mean there is nothing to translate in the unit. More difficult to implement, but more powerful.

proposal In 1.2: Candidates are in with alttranstype=proposal (default) In 2.0: Candidates are marked up using the Translation Candidates module ( )

proposal In 1.2: applies to the whole source if it doesn’t have the mid attribute, to a segment when it does have it. In 2.0: applies to any span in the content. This allows candidates to match across segments, on segments, on sub- segments parts.

proposal In 1.2: match-quality is the only measurement available and it is equivalent to a similarity score. In 2.0: Several distinct values: – similarity (how source candidate is similar to source) – matchQuality (how “good” is the translation) – matchSuitability (overall indicator, can be used to sort candidates from the same origin).

previous-version In 1.2: The element with alttranstype="previous-version", etc. allows to store some level of track changes. In 2.0: The Change Tracking module allows to record successive versions of changes for various items.

Glossary In 1.2: The element is just a place to store custom glossary (no specification about the format, etc.) In 2.0: The Glossary module offers a simple format with the basic information: source, definition (optional), translations (optional).

element Supported through to the 2.0 Resource Data module (used along with ). The Resource Data module can also provides context information for the translators, such as screen shots, etc.

Size and Length Restriction 1.2: Attributes maxwidth, minwidth, maxbytes, minbytes and size-unit 2.0: The Size and Length Restriction module is a full-fledged specialized module allowing for profiles, specification of the encoding to use, the normalization to perform, handling on the inline code size/length, etc.

Extensions Not allowed everywhere and they have constraints and processing requirements Simply treat them like modules you don’t implement. The main difference between a module you don’t implement and an extension is that you MUST preserve the module and (only) SHOULD preserve the extension.

Extension points for elements In 1.2:,,,,, and In 2.0 Core:,, and

Extension points for attributes In 1.2:,,,,,,,,,,,,,,,,,,, and In 2.0 Core:,,,,, and

Metadata module Not in 1.2, but similar to the and elements in 1.0 (was deprecated in 1.1) Allows to carry basic custom information without defining your own namespace. Tools can offer generic edit/display for such metadata.

Inline content – Invalid characters No equivalent in 1.2 Some special characters cannot be represented in XML (e.g. control characters) In 2.0: Use where HHHH is the hexadecimal Unicode code of the character. Same as in LDML (Unicode’s Locale Data Markup Language)

Fragment identifiers No equivalent in 1.2 Several sets of IDs in XLIFF  Cannot use the usual #id notation (because id may be duplicated) More and more needed for linked data In 2.0: Specific fragment identifier syntax defined for XLIFF MIME type. Syntax supports modules and extensions.

Format Style module No equivalent in 1.2 In 2.0: Aim at offering a place where to define metadata needed to output an HTML “preview” of the document. The fs and subFs attributes fill that role.

Validation module No equivalent in 1.2 A “must-support” for QA tools Allows to check for basic presence or absence of strings or sub-strings, number of occurrences, etc. Options for normalization, case-sensitivity One aspect missing: regular expression (very difficult to standardize)

What about features not in 2.0? XLIFF 2 is meant to evolve TC needs to get the requirements, the proposals and the implementations for new features Future modules can be implemented using extensions first, then moved to a module (e.g. ITS Allowed Characters to replace the 1.2 charclass attribute).

Overall 2.0 is a better foundation for tools It can evolve incrementally with less (and even no) disruptions to existing 2.0 tools Somewhat more difficult to implement (many constraints and processing requirements)  But one can use libraries May take up more disk space  But packages are often zipped nowadays

Links XLIFF 2.0 Specification: XLIFF 1.2 Specification: TC Comment Mailing List: