1 LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard Budin University of Vienna Alan Melby Brigham Young University

2 LREC 2000 Athens; Gerhard Budin and Alan Melby Diversity Problems of MTRs Incompatible ontologies Diverse categorizations of terminological information Varieties of data models Multitude of formats and ‚standards‘ -> lack of interoperability, portability across applications, domains, platforms, etc.

3 Terminology Interchange Pre-requisite for –knowledge sharing –co-operative work flows –marketing, distribution –maintenance –interoperability (data management across MT, TM, CL, TA, IM, KM, etc.) R&D since 1980s (EU, ISO, TEI)

4 LREC 2000 Athens; Gerhard Budin and Alan Melby Barriers to terminological knowledge sharing Legal barriers (copyright, IPR) economic barriers (pricing, billing) information barriers (lack of information) technical barriers (lack of cross-platform/- system/-format (im-/ex-)portability, etc.) methodological barriers (data modelling, diversity in work principles, methods)

5 Multitude of Formats Document formats Database formats Mark-up formats for lexical/terminological data MATER, TEI-lex/term, NTRF, OLIF, MARTIF, TBX, IIF, TRANSTERM, GENETER, EURAMIS etc.)

6 SALT-XLT Standards-based Access to Multilingual Lexicons and Terminologies - a broad-based initiative aiming at CONVERGENCE, INTEROPERABILITY International Consortium of industry partners, universities, NGOs/IOs/IGOs, professional associations –European group: shared-cost RTD project called SALT in the 5th Framework Programme (IST-HLT), started in January 2000 (funding for 2 years) –US group (funding expected)

7 Features of the SALT Initiative User-oriented (industry, administration, multiple user-groups) Oriented towards integrating applications Ontology mapping component Web-based free-ware approach XML, XLST, Java Standards-based (integrating HLT standards, concurrent development with ISO/TC 37)

8 LREC 2000 Athens; Gerhard Budin and Alan Melby XLT XML-based Lexical/Terminological framework format A FAMILY of (interoperable) formats –includes or is based on or overlaps with TEI MARTIF MSC OLIF Geneter TBX, etc.

9 XLT Lex/term Resources, Diverse Formats Industry Sectors Language Server /Toolkit Information Technology Developers Consulting Services Broader Social Impact EnhancedAccess to Multilingual Resources for LanguageTechnology TRANSTERM OLIF MARTIF INTERVAL GENETER PROPRIETARY FORMATS EXPORT TOOLS IMPORT TOOLS VIEWERS MERGE/QUERY FUNCTIONS FACILITATION ACCESS TAGGING CONVERSION INFO BROKERAGE MARKUP ONTOLOGIES AUTHORING MT TM IM TMS TRANSLATION L10N I18N I N T E G R A T I O N A C C E S S

10 Workflow in SALT Analysis of existing formats (sample data sets, data elements/structures, ontologies) PM Mapping Clustering QM Utilities, tools, website external assessment, evaluation dissemination, implementation

11 LREC 2000 Athens; Gerhard Budin and Alan Melby Features of XLT XML-based (since this is the dominant data exchange transport mechanism today) standards-based corresponding relational data model for integrated database to facilitate loading flexible in order to support maintenance of the format as needs evolve language industry support

12 LREC 2000 Athens; Gerhard Budin and Alan Melby Levels of Modelling in the SALT Initiative Level 1: meta-model consisting of a –structural meta-model (ORM, UML) and a –and a content meta-model: metadata registry based on ISO 12620, following the methods of ISO 11179 co-operation with the SCHEMAS project (registry of XML schemas), JTC 1/SC 32, etc.)

13 LREC 2000 Athens; Gerhard Budin and Alan Melby

14 Levels of Modelling in the SALT Initiative Level 2: conceptual data model (user-group needs analysis level) –implementation modality (e.g. XML intermediate format or relational database) is selected for user group –a core structure compatible with the meta-model but going into more detail is defined for each modality –particular set of data categories and constraints on them is selected according to user needs e.g. Reltef (E-R diagram), XLT (DTD, XML schema, data-category specifications)

15 LREC 2000 Athens; Gerhard Budin and Alan Melby Levels of Modelling in the SALT Initiative Level 3: Specific data model / format –core structure, a data category specification, and a representation style are combined to define a member of the SALT family –each member is fully interoperable with other members that use the same data category specification e.g. concrete relational database implementations, specific XLT implementations, subsets for industrial user groups such as TBX

16 LREC 2000 Athens; Gerhard Budin and Alan Melby Cooperation and Concertation The SALT consortium (U Vienna, U AS Cologne, U Surrey, LORIA Nancy, Termisti Brussels, EA Bozen/Bolzano, BYU Provo) cooperates with other HLT or IST projects (TQPro, Schemas, etc.) other EU-projects (MLIS) (TDCNet, GEMA, DINT, etc.) ELRA, EAFT EU Commission, UN-Jiamcatt group TEI, ISO, JTC 1, W 3 C LISA (OSCAR) including companies other than IT from other industries (telecom, automotive eng.) FIT, etc.

17 LREC 2000 Athens; Gerhard Budin and Alan Melby Conclusions The SALT project contributes to a convergence process that is badly needed in the area of multilingual lex/term resources technical/methodological convergence resulting in interoperability and accessibility of MTRs supports language industry markets

