Digital Information and Heritage INFuture Zagreb, 7-9.11.2007. Sentence Alignment as the Basis For Translation Memory Database Sanja Seljan Faculty of.

Slides:



Advertisements
Similar presentations
What You See Is What You Get? Access to Visual Information in Translation Interfaces: A Pilot Experiment José Ramón Biau Gil 2007 Intercultural Studies.
Advertisements

How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
Leveraging TM Technology to Improve Translatability & Usability Dr Jody Byrne University of Sheffield.
HIEROGLIFS TRANSLATIONS Feel the power of word!. WHAT CLIENTS EXPECT FROM TRANSLATION AGENCIES AND HOW HIEROGLIFS TRANSLATIONS MEETS THEIR EXPECTATIONS.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Computer Assisted Translation CAT Alexander C. Wu
Multilingual multimedia thesaurus for conservation and restoration collaborative networked model of construction Lucijana Leoni University of Dubrovnik.
30. Conference of Directors of EU Paying Agencies Workshop1: The possibilities for optimizing the processes of implementation of direct payments Agency.
Computer Assisted Translation CAT Alexander C. Wu Fall 2004.
Financial Audit of Autonomous Bodies Reporting 1 Reporting Standards and Drafting of Audit Reports and Audit Comments.
Concepts of Database Management Sixth Edition
Wikipedia's Influence on the Evolution of Encyclopedia Sara Librenjak Faculty of Humanities and Social Sciences, Zagreb Zdenko Jecić The Miroslav Krleža.
QUALETRA “QUALITY IN LEGAL TRANSLATION” WS2 Final Conference KU Leuven Antwerpen October 2014 QUALETRA JUST/2011/JPEN/AG/2975 With financial support.
Automatic translation quality control using Eurovoc descriptors Marko Tadić, Božo Bekavac
Priorities in the Study of Information Sciences Faculty of Humanities and Social Sciences, University of Zagreb, Croatia Ph.D. Sanja Seljan, associate.
Pržno, Republic of Montenegro 8 October 2007 TRANSLATION FOR EU ACCESSION TRANSLATION FOR EU ACCESSION Jasminka Novak, Head of Service Independent Service.
Place holder 1 MLA Modern Language Association Workshop Part II: The Mechanics of Writing.
Gail Palmer Mechanics and Style School of Electrical and Computer Engineering Georgia Institute of Technology.
Working freelance for an international organisation.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice SISP Training Documentation Template.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Lecture 01 (Tuesday 18 September).  Lecture 01 What is a TM, some tools Getting started (UI, create a TM, open file, translate, edit, preview)  Lecture.
practical aspects1 Translation Tools Translation Memory Systems Text Concordance Tools Useful Websites.
Microsoft Word 2000 Presentation 2 Microsoft Word Topics  Tools –Spelling/Grammar Check –Thesaurus –AutoCorrect –Word Count –Change Case –Background.
Translation Technologies Računalne tehnologije za prevo đ enje dr. Špela Vintar Department of Translation Studies Faculty of Arts University of Ljubljana.
Twinning Project No 00MAC01/02/006: Approximation of Legislation to the Internal Market Acquis An EU-funded project managed by the European Agency for.
Sofia Garcia/Roberto Silva Tutorial Workshop, GrenobleDate: 31/Jan/2007 The work of a professional translator and the translation agency V1.0.
Writing© Dr. Ayman Abdel-Hamid, CS5014, Fall CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia Tech Writing.
Concepts of Database Management Seventh Edition
Legislative Texts. The legislative process in the EU Proposal, recommendation, communication from Commission, Green Paper, consultation, studies, draft.
FF & FER INFuture2009: Digital Resources and Knowledge Sharing, 4-7 November 2009 Comparative Analysis of Automatic Term and Collocation Extraction Sanja.
Click to edit Master title style Evaluation of Electronic Translation Tools Through Quality Parameters Vlasta Kučiš University of Maribor, Department of.
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Legislation Drafting guidelines and tools
1 Machine Assisted Human Translation (MAHT) (…aka “Translation Memory” or “CAT tool”) …and what it does for the translator…
Compiling, processing and accessing the collection of legal regulations of the Republic of Croatia T. Didak Prekpalaj, T. Horvat, D. Miletić, D. Mokriš.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
Legislative Bill Writing YMCA Texas Youth & Government.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Twinning Project No 00MAC01/02/006: Approximation of Legislation to the Internal Market Acquis An EU-funded project managed by the European Agency for.
Development of an Intelligent Translation Memory MorphoLogic SZAK Publishers Balázs Kis
Specifications …writing descriptive detail. Specifications: Purpose Document a product in enough detail that someone else could create or maintain it.
SDL Trados Studio 2014 Getting Started. Components of a CAT Tool Translation Memory Terminology Management Alignment – transforming previously translated.
CERTIFICATE IV IN BUSINESS JULY 2015 BSBWRT401A - Write Complex Documents.
WLTP gtr Annex 9 Determination of Method Equivalency - Progress Report
SDL Trados Studio 2014 Creating and Managing TMs Alignment Reviewing translations.
Abstracting.  An abstract is a concise and accurate representation of the contents of a document, in a style similar to that of the original document.
Evaluating Translation Memory Software Francie Gow MA Translation, University of Ottawa Translator, Translation Bureau, Government of Canada
Metatexis “the easy way to translate” By: Diana Delgado Ma. Victoria Porro Master en Traduction – TAO ETI – automne 2009.
Year 6 Assessment and SATs Information Monday 9 th May – Thursday 12 th May 2016.
1 January 31, Documenting Software William Cohen NCSU CSC 591W January 31, 2008.
ADMINISTRATIVE LAW AND CONSTITUTIONAL LAW
Writing Technical Reports
Year 6 Objectives: Writing
Snježana Husinec, PhD; Unit 1 LANGUAGE AND LAW Snježana Husinec, PhD;
REPORT WRITING Many types but two main kinds:
STANDARD OPERATING PROCEDURE
Using Translation Memory to Speed up Translation Process
Technical translation
Credits. Credits Random question generator Credits G1 Grammatical terms and word classes G2 Functions of sentences G3 Combining words, phrases and.
Preparing Conference Papers (1)
Snježana Husinec, PhD; Unit 1 LANGUAGE AND LAW Snježana Husinec, PhD;
Preparing Conference Papers (1)
1 Word Processing Part I.
WIGOS regulatory and guidance material
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 4 prof. ssa Laura Liucci –
EUROPEAN UNION CITIZENSHIP
European Code for Inland waterways (CEVNI)
Presentation transcript:

Digital Information and Heritage INFuture Zagreb, Sentence Alignment as the Basis For Translation Memory Database Sanja Seljan Faculty of Humanities and Social Sciences – University of Zagreb Department of Information Sciences Angelina Gašpar SOA Centre Split Damir Pavuna Integra d.o.o.

Digital Information and Heritage INFuture Zagreb, Overview I Introduction II When to use TMs? Text preparation III Corpus used Text characteristics IV Research Tools used Automatic and manual alignement Comparison of TMs Results V Conclusion

Digital Information and Heritage INFuture Zagreb, Sentence alignment (SA) basis for computer-assisted translation (CAT) terminology management term extraction word alignment cross-linguistic information retrieval Sentence alignment (SA) -> translation memory (TM) basis for further research in translation equivalencies

Digital Information and Heritage INFuture Zagreb, Problems in automatic SA: robustness discrepancies in layout and omissions -> influence on accuracy and TM

Digital Information and Heritage INFuture Zagreb, Research: SA on Cro-Eng parallel texts (laws, regulations, acts, decisions) alignment tool WinAlign by SDL Trados 2006 Professional

Digital Information and Heritage INFuture Zagreb, Aim: impact of SA process on the creation of TM comparison of 3 types of TMs Differences: –in levels of expert intervention in set up of the alignment program –in preparation of the source text for the segmentation

Digital Information and Heritage INFuture Zagreb, II When to use TMs? Fast and consistent translation (e.g. EU, multinational agencies) Voluminous texts Highly repetitive types of texts Use of specialized and consistent terminology Several languages Sharing of common resources (cooperation) Time-saving (Speed up the translation process) Cost-saving Consistent translation

Digital Information and Heritage INFuture Zagreb, Directly through translation Use of already translated material (alignment process) Creation of TM

Digital Information and Heritage INFuture Zagreb, III Corpus used 9 parallel legislative Croatian-English texts or bitexts related to: acts, laws, regulations, decisions and ordinances; The sake of uniformity: standard presentation and standard formulas; 33.15% - percentage ratio for word count in English translations;

Digital Information and Heritage INFuture Zagreb, Reasons: –English-an analytic type of language, use of passive voice, –Croatian - a highly flective system, use of active voice, Repetitive legal terms, phrases, sentences A regulation main components: the title, preamble, enacting terms, addresee, place, date and signature.

Digital Information and Heritage INFuture Zagreb, Enacting terms - strict rules of presentation: -subject matter and scope, -definitions, -provisions conferring implementing power, -penalties or legal remedies, -transitional and final provisions. Standard form prescribes the layout on the page: spacing, paragraphing, punctuation and even typographic characteristics (capitalisation, typeface, boldface and italics)

Digital Information and Heritage INFuture Zagreb, Use of verbs in enacting terms -Binding Croatian legislation: -declarative terms (definitions, amendments) -and imperative terms (commands, prohibitions) - English “shall”= Croatian present tense, modals (morati, trebati) - English “may” for prohibition, permission and authorisation = Croatian present tense (“ne može se”, “može se”).

Digital Information and Heritage INFuture Zagreb, Bitexts similarities : –punctuation, numbers, dates, foreign words; Differences: –capital letters, hyphens, compound words, synonyms (avoided in target language); Common points: –consistent terminology, a uniform manner, gender-neutral language;

Digital Information and Heritage INFuture Zagreb, IV Alignment research Texts: –Croatian legislative acts translations Cr->En; Tools: –AnyCount 4.0 (version 405) – for document structure analysis –SDL Trados 2006 Professional (WinAlign ) – for alignment process;

Digital Information and Heritage INFuture Zagreb, Alignment research PREPARATORY ACTIVITIES: –comparison of the source and target texts (whether all text is translated) –defining set up of end and skip rules (delimiters, creating abbreviation user list) –preparation of the source text for better segmentation (spelling, automatic bullets and numbering, deleting of soft returns, hyphens, certain punctuation, tables created with tabs and revision marks) –modification of set up rules –verification of the alignment (especially 1:2 and 2:1 pairs and commitment of pairs) –creation of translation memory and verification

Digital Information and Heritage INFuture Zagreb, Alignment research Automatic alignment WinAlign has language independent algorithms that count: –the quality of translation units which can have tree levels (low, medium, high) –translation units aligning 1:2 or 2:1 pairs –unconnected target segments

Digital Information and Heritage INFuture Zagreb, Alignment research Manual alignment –source text corresponds to translated target segment (Aligned TM) –set up of the alignment program (Aligned TM + set up rules, e.g. segment and skip rules, abbreviation user list) –segmentation of the source text (e.g. changes of soft returns, check of colon segmentation)

Digital Information and Heritage INFuture Zagreb, Alignment research

Digital Information and Heritage INFuture Zagreb, Alignment research Raw TMAligned TM+ Setup rules ++ Segmented source 100% %-99% %-94% %-84% %-74%1120 No match Total Percent91.67% 80.30%88.89%100%

Digital Information and Heritage INFuture Zagreb, Alignment research Conclusion –The translation memories created in this study out of different types of the alignment processes give different results regarding the quality of the translated material. –The results show necessary interventions of an expert when defining the set up rules, in preparation activities for the source text segmentation and in the verification of suggested translation units.