IATE EU tool for translation-oriented terminology work Christine Herwig Terminology Coordination DGT-D.3.2 European Commission Vienna, April 2011
Unique? 8.6 million terms 23 EU languages + others CAT tool for over 4 400 EU translators
History IATE = “Inter-Active Terminology for Europe” Objective: creation of an interactive terminology database for the consultation, storage and joint management of terminological data IATE partners: European Parliament, Council, Commission, Court of Justice, Court of Auditors, European Economic and Social Committee, Committee of the Regions, European Investment Bank, European Central Bank, Translation Centre Development work started in January 2000. Internally IATE became operational at the beginning of 2005.
Interactive tool serving translators European Commission 1750 Council of the EU 650 European Parliament 760 110 Translation Centre 100 Court of Auditors 350 Committee of the Regions and European Economic and Social Committee 30 Court of Justice 620 European Investment Bank 70 European Central Bank Interpretation services Lawyer-linguists
Interinstitutional management IATE Management Group - IMG (one official representative per IATE partner) Common rules and procedures (“Best Practice”, “Input Manual”) Coordination of consolidation efforts Coordination of terminology work for the “new languages” Decisions on technical development Technical support, maintenance and development
How DGT organises terminology work IATE Other EU institutions and bodies “Terminology Coordination” Terminology services and organisations Terminology projects Terminology Board Terminologists Terminologists Terminologists Terminologists Terminologists Terminologists Translators
How multilingual work is planned Terminologists ask translators about their needs Needs Terminology Coordination Team + “WP Advisory Committee” Draft “Terminology Board” Work programme
Translation-oriented work systematic + ad hoc subject-driven + text-driven prescriptive + descriptive multilingual + language-specific
European Commission - Right of initiative The Commission’s ‘right of initiative’ means it is the main source of draft legislation. It drafts and then implements acts adopted by the Council and the European Parliament. New subject fields New terminology
IATE - Present situation Total Commission Concepts 1.5 million 0.9 million approx. 60 % Terms 8.6 million 5.7 million Abbreviations 0.5 million > 0.2 million Phrases 0.15 million < 0.1 million 7 March 2011
Language coverage 1958 German, French, Italian and Dutch 4 official languages 1973 English and Danish 6 official languages 1981 Greek 7 official languages 1986 Spanish and Portuguese 9 official languages 1995 Finnish and Swedish 11 official languages 2004 Czech, Estonian, Hungarian, Lithuanian, Latvian, Maltese, Polish, Slovak, Slovenian 20 official languages 2007 Bulgarian, Irish, Romanian 23 EU official languages + + Latin Languages of candidate and potential candidate countries: e.g. Croatian, Turkish + Languages relevant to the European External Action Service: e.g. Russian, Chinese, Arabic
Terms per language Language Number of terms BG 22 092 LA 64 623 CS 31 316 LT 43 652 DA 604 572 LV 25 491 DE 1 039 010 MT 23 239 EL 522 877 NL 695 772 EN 1 416 823 PL 48 850 ES 616 088 PT 530 230 ET 28 293 RO 22 986 FI 327 598 SK 28 626 FR 1 354 170 SL 30 531 GA 37 030 SV 314 086 HU 37 341 Others 24 661 IT 705 681 7 March 2011
Number of potential duplicates in IATE (Jan. 2011)
Main challenges for DGT terminology Identify the most efficient way of incorporating data for the “new languages” Respond to upcoming terminology needs (in relation to the Commission’s right of initiative) Reduce the number of duplicates in the legacy data (consolidating IATE content)
Structure of IATE records Language-independent information E.g.: Domain, Domain note, Origin, Problem language, Cross reference Language 1 EN Language 2 FR E.g.: Definition, Reference, Note E.g.: Term, Reference, Term type, Context, Context reference, Grammatical information Term 1 Term 2 Term 1 Term 2
“Language-independent level” and “Language level”
“Term level”
Subject field classification - EuroVoc http://eurovoc.europa.eu At generic level, the thesaurus EuroVoc has a two-tier hierarchical classification: fields, identified by two-digit numbers and titles in words, e.g.: 10 EUROPEAN COMMUNITIES microthesauri, identified by four-digit numbers — the first two digits being those for the field containing the microthesaurus — followed by titles in words, e.g.: 1011 European Union law Struktur Sachbereiche und Mikrothesauri Der Thesaurus EUROVOC ist auf generischer Ebene durch eine hierarchische Klassifizierung auf zwei Ebenen strukturiert: die Sachbereiche, gekennzeichnet durch eine zweistellige Nummer und eine Bezeichnung, z.B.: 10 EUROPÄISCHE GEMEINSCHAFTEN die Mikrothesauri, gekennzeichnet durch eine vierstellige Nummer, deren zwei erste Ziffern dem Sachbereich entsprechen, unter den der Mikrothesaurus fällt, und durch eine Bezeichnung, z.B.: 1011 GEMEINSCHAFTSRECHT Die Numerierung der Sachbereiche und der Mikrothesauri ist in allen Sprachfassungen gleich. Äquivalenzrelation "UF" (Used for = verwendet für), zwischen dem Deskriptor und dem bzw. den Nichtdeskriptoren, für die er steht. "USE" (zu verwenden), zwischen einem Nichtdeskriptor und dem Deskriptor, für den er steht. Quantitative Merkmale Alle Sprachfassungen des Thesaurus EUROVOC enthalten : 21 Sachbereiche 127 Mikrothesauri 6645 Deskriptoren (davon 519 Top terms) 6669 gegenseitige Hierarchierelationen (BT/NT) 3636 Assoziationsrelationen.
In-house IATE Is password-protected Allows interactive use Has a “virtual antechamber”, Pre-IATE, for raw data and multilingual batch imports Glossaries Collections ... Raw data Pre-IATE
Interactive Use New entries and modifications Communication Validation “Write” rights for translators and terminologists Communication A “mark” function allows users to add comments to specific entries Validation Is mother-tongue-based to ensure data quality
IATE Public: http://iate.europa.eu Officially opened on 28 June 2007 Search words, wildcards and special characters If you specify several search terms, IATE will retrieve entries that contain all terms, whether they occur in the given order or not. E.g. looking for European Commission will retrieve… European Commission European Travel Commission Commission of the European Communities Putting your search term between quotes - “ - will allow you to find it in the way you specified it. E.g. looking for “European Commission” will retrieve… European Commission on Agriculture You can use the wildcard characters “*” (asterisk) and “_” (underscore), e.g. The asterisk can be used to replace any number of characters in your search term. E.g. searching for “termino*” will find all IATE entries that contain words starting with “termino” The underscore can be used to replace exactly one character. Search is not case-sensitive; commission will find Commission and commission. It is not necessary to type vowels with diacritical marks ( ´ ` ^ ¨ ˜ ) and special characters (e.g.: ñ, ç). You can also use plain vowels and n, c instead. ELμηλο finds μήλοESaccion finds acciónITuniversita finds universitàNLideeen finds ideeënPLczesc finds część PTameijoa finds amêijoa Shares all validated EU terminology with the public
Thank you for your attention.