From Code to XLIFF Bridging the Chasm Dr. Stephen Flinter Connect Global Solutions LRC Conference – 19 November 2003.

Slides:



Advertisements
Similar presentations
EIONET Training Zope Page Templates Miruna Bădescu Finsiel Romania Copenhagen, 28 October 2003.
Advertisements

Bringing Procedural Knowledge to XLIFF Prof. Dr. Klemens Waldhör TAUS Labs & FOM University of Applied Science FEISGILTT 16 October 2012 Seattle, USA.
Programming Paradigms and languages
Chapter 51 Scripting With JSP Elements JavaServer Pages By Xue Bai.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Using Schema Matching to Simplify Heterogeneous Data Translation Tova Milo, Sagit Zohar Tel Aviv University.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Aki Hecht Seminar in Databases (236826) January 2009
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
1 CA201 Word Application Creating Document for the Web Week # 9 By Tariq Ibn Aziz Dammam Community college.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
System Integration (Cont.) Week 7 – Lecture 2. Approaches Information transfer –Interface –Database replication –Data federation Business process integration.
Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.
Efficient XML Interchange. XML Why is XML good? A widely accepted standard for data representation Fairly simple format Flexible It’s not used by everyone,
Overview of Search Engines
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
By: Shawn Li. OUTLINE XML Definition HTML vs. XML Advantage of XML Facts Utilization SAX Definition DOM Definition History Comparison between SAX and.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
San José, CA – September, 2004 Localizing with XLIFF and ICU Markus Scherer Raghuram (Ram) Viswanadha IBM San.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Overview of Previous Lesson(s) Over View  ASP.NET Pages  Modular in nature and divided into the core sections  Page directives  Code Section  Page.
Overcoming the limitation of XML Documentation Type Definition XML schema generator CS689 Hae-Soon Kwon 11/16/2000.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 1: A First Program Using C#. Programming Computer program – A set of instructions that tells a computer what to do – Also called software Software.
Topic S Program Analysis and Transformation SEG 4110: Advanced Software Design and Reengineering.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
CSCI 6962: Server-side Design and Programming Web Services.
An overview of scripting languages Alexander Kanavin Teachers: Barbara Miraftabi, Jan Voracek.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
JAVA SERVER PAGES. 2 SERVLETS The purpose of a servlet is to create a Web page in response to a client request Servlets are written in Java, with a little.
Natural and programming languages v0.2 – initial draft, Pikaro Tarmo v0.3 – updated, Pikaro Tarmo.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Introduction to Android (Part.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
PASSOLO ® Makes Your Software Ready for the Global Market Localisation Standards The Tools Developer’s Perspective.
Jennifer Widom XML Data Introduction, Well-formed XML.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Introduction to Compiling
Intermediate CGI & CGI.pm Webmaster II - Fort Collins, CO Copyright © XTR Systems, LLC CGI Programming & The CGI.pm Perl Module Instructor: Joseph DiVerdi,
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Representing data with XML SE-2030 Dr. Mark L. Hornick 1.
Accessing XML Documents Using DOM ©NIITeXtensible Markup Language/Lesson 8/Slide 1 of 23 Objectives In this lesson, you will learn to: * Use XML DOM objects.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
1 Java Server Pages A Java Server Page is a file consisting of HTML or XML markup into which special tags and code blocks are inserted When the page is.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
©SoftMoore ConsultingSlide 1 Structure of Compilers.
10 Copyright © 2004, Oracle. All rights reserved. Building ADF View Components.
ICS312 Introduction to Compilers Set 23. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Getting Your Content in the Penn State Student Portal Presented By James Leous, Program Manager James Vuccolo, Lead Research Programmer.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
Customizing Share Document Previews Will Abson Senior Integrations Engineer and Share Extras Project Lead
Rendering XML Documents ©NIITeXtensible Markup Language/Lesson 5/Slide 1 of 46 Objectives In this session, you will learn to: * Define rendering * Identify.
XML 1.Introduction to XML 2.Document Type Definition (DTD) 3.XML Parser 4.Example: CGI Gateway to XML Middleware.
Chapter 1 Introduction.
Chapter 1 Introduction.
Compiler design Bottom-up parsing: Canonical LR and LALR
Competitor Price Monitoring
Part of the Multilingual Web-LT Program
XML Data Introduction, Well-formed XML.
CS 3304 Comparative Languages
Lightweight tools for on-line course development
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

From Code to XLIFF Bridging the Chasm Dr. Stephen Flinter Connect Global Solutions LRC Conference – 19 November 2003

Agenda The XLIFF Transformation Problem Current approaches Grammar based approach – XPG XPG & XML Summary

The Problem The XLIFF Transformation Problem Current approaches Grammar based approach – XPG XPG & XML Summary

The Problem XLIFF has made the representation of resources translation/localisation friendly Non-trivial to convert existing files to XLIFF Adding new file formats can be painful

XLIFF Transformation Definition: XLIFF Transformation is the process by which native file formats are transformed into XLIFF, and from XLIFF back to its native format (after translation). File formats include: Java,.properties, XML, HTML, custom.

Architecture

.com Business Model Parody of the.com business model that has been floating around the web: –Get lots of users –??? –Profit

XLIFF Transformation Model The XLIFF transformation model could be described in similar terms: –Native file format –??? –XLIFF

Architecture

Current Approaches The XLIFF Transformation Problem Current approaches Grammar based approach – XPG XPG & XML Summary

Current Approaches to XLIFF Use XLIFF as native format Use commercial tools Use regular expressions & scripts

XLIFF as Native Format Use XLIFF from software development onwards No transformation required Preferred approach in the long term

Disadvantages Requires significant changes to the software development process How to handle legacy resources? –Back to the original problem

Commercial Tools Tool support for XLIFF is improving all the time. Advantages of support and expertise of tool developer.

Disadvantages However, many tools still only read XLIFF, and won’t generate XLIFF from native formats Won’t necessarily support all formats required Can be difficult to identify in-line tags

Scripts and Regular Expressions Use a scripting language (e.g. perl, python, WordBasic) Encode rules to extract translatable resources using regular expressions

Examples StringRegular Expression “Translatable text” /”([^”]*)”/ id1 = Translatable text /.* = (.*)/

Advantages Superficially simple to develop Plenty of powerful RE languages (especially perl) available Full control and ownership of how the formats are managed

Disadvantages Error prone – difficult to cover all situations To remove all errors, often have to add many parsing rules Has to be redone for every new file type RE’s have to change for inline tags

Other Examples print(“First string”); print(“Second” + “ string”); print(“Third \”string\””); print(“Fourth {0} string”);

Summary This approach is doomed to failure because of the disconnect between the grammar of the language, and the regular expressions used to identify strings.

Grammar Based Approach The XLIFF Transformation Problem Current approaches Grammar based approach – XPG XPG & XML Summary

A New Approach With this approach, we look at the language grammar (EBNF) Identify grammar productions that can hold translatable text Generate a parser that accepts instances of the grammar and emits XLIFF

Grammar-based Architecture

Architecture New component: XLIFF parser generator (XPG) Accepts a JavaCC grammar Allows one or more productions to be marked as translatable Generate the “extract” and “merge” programs

JavaCC JavaCC: Java Compiler Compiler Modelled after lex & yacc Works on EBNF-type grammars rendered as JavaCC.jj files JavaCC grammar available for most modern programming languages.

Big Win Direct, one-to-one correspondence between the grammar and the mechanism for identifying strings.

Advantages Consistent high quality –Guaranteed to work in every case – for all instances of the grammar. Painless –No scripting/regular expressions required –Extractor and merger generated automatically Fast –Just need to identify the strings in the grammar

Example Extract from Java BNF ::= | | ::= " ?" ::= | ::= except " and \ |

JavaCC Extract void Literal() : {} { | BooleanLiteral() | NullLiteral() }

< STRING_LITERAL: "\"" ( (~["\"","\\","\n","\r"]) | ("\\" ( ["n","t","b","r","f","\\","'","\""] | ["0"-"7"] ( ["0"-"7"] )? | ["0"-"3"] ["0"-"7"] ["0"-"7"] ) )* "\"" >

Identifying We identify the as a language item that may contain strings XPG then generates a new grammar, which compiles to the extractor. The extractor then generates XLIFF.

Modified JavaCC Grammar void Literal() : {} { | StringLiteral() | BooleanLiteral() | NullLiteral() }

StringLiteral() void StringLiteral() : { Token t; } { t = { String s = t.image.substring(1, t.image.length() - 1); pw.println(" "); pw.println(" " + s + " "); pw.println(" "); }}

Other XPG Tasks Create XLIFF surrounding tags Create skeleton file Embed code for handling inline tags

Inline Tags Example: –“Click on the {0} button to start the {1} job” The {0} and {1} constitute inline tags Not part of grammar itself Can vary from application to application We must be able to extract these based on regular expressions: –{[0-9]+}

XPG and Inline Tags Embeds code to read a set of regular expressions from a file. When the extractor identifies a string: –Executes RE on string –Moves matches to XLIFF inline tag

Final Architecture

XPG & XML The XLIFF Transformation Problem Current approaches Grammar based approach – XPG XPG & XML Summary

XPG and XML Applications A similar approach can be applied to XML Schemas Uses XSTL & DOM rather than JavaCC Can identify XML tags and attributes that may contain text

Summary XPG is an approach to XLIFF transformation that corresponds to the grammar of the language being transformed. This ensures consistent, error free and rapid XLIFF transformation. The XPG approach is suitable for computer languages and markup