What is TBX? TermBase eXchange

Slides:



Advertisements
Similar presentations
Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Advertisements

XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
History Leading to XHTML
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
Interchange using TBX 8 th Metadata conference Berlin April 2005 Alan K. Melby Brigham Young University, Provo campus.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
Digital Encoding What’s behind E-text Resources?.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Tyler Snow Brigham Young University Translation Research Group.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
XLIFF 2.0 GLOSSARY MODULE / TBX-BASIC Facilitating Interoperability and Compatibility.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Accessing Data Using XML CHAPTER NINE Matakuliah: T0063 – Pemrograman Visual Tahun: 2009.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
Tutorial 13 Validating Documents with Schemas
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Working with XML Schemas ©NIITeXtensible Markup Language/Lesson 3/Slide 1 of 36 Objectives In this lesson, you will learn to: * Declare attributes in an.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
ITS 2.0 in XLIFF 2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.
XML Extensible Markup Language
SNU OOPSLA Lab. A Tour of XML © copyright 2001 SNU OOPSLA Lab.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
1 XML and XML in DLESE Katy Ginger November 2003.
A report by Olaf-Michael Stefanov to the JIAMCATT community
In this session, you will learn to:
Creating a Well-Formed Valid Document
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion
The MEDIN stylesheet and ESRI Arc 10: metadata format conversion
Information Delivery Manuals: Functional Parts
XML QUESTIONS AND ANSWERS
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Active Data Management in Space 20m DG
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Database Processing with XML
eXtensible Markup Language
Introducing HTML & XHTML:
The Re3gistry software and the INSPIRE Registry
Chapter 2 Database Environment.
Part of the Multilingual Web-LT Program
Application of Dublin Core and XML/RDF standards in the KIKERES
Data Model.
Accommodating local cataloguing traditions in a global context
What is XML?.
Introduction to DTDs.
Metadata The metadata contains
CSE591: Data Mining by H. Liu
NIEM Tool Strategy Next Steps for Movement
Allyson Falkner Spokane County ISD
SDMX IT Tools SDMX Registry
Palestinian Central Bureau of Statistics
New Perspectives on XML
Presentation transcript:

TBX version 3 – Learning from users Alan Melby, developed with Hanne Smaadahl

What is TBX? TermBase eXchange XML-based framework for representing structured terminological data. Independent of programming language and operating systems. Flexible enough to represent most of the information in a variety of terminology databases. What is TBX? 2018 Smaadahl / Melby TBX, or TermBase eXchange, is the open, XML-based standard for exchanging structured terminological data. It is independent of programming language or operating systems. And flexible enough to support a variety of terminology databases. When we say “Flexible enough to represent most of the information in a variety of terminology databases”, we mean: 1 - structural relationships >> Here we assume the termbase has three levels: concept, language, and term. 2 - data categories >> All the information in a termbase can be represented in TBX and much of it can be represented in industry-standard data categories (aka "data element types" or "fields"), depending on the target TBX dialect, and the rest can be represented in note elements, using the "steamroller" approach.

Saving information in a termbase to a separate file separating content from tool to support future software change Exchanging information between systems (3 examples) authoring translation data mining Guiding the design of a new termbase for interoperability Why do we have TBX? 2018 Smaadahl / Melby TBX is designed to satisfy a number of use cases. The main ones are: Saving or Archiving the information in a termbase. This allows you to separate the valuable content of your termbase from any specific tool. Especially important to support future software change. Exchanging information between systems. Here are three examples: Sending monolingual information from a termbase to an authoring tool (authoring) Sending a subset of the information from a termbase to a translator. This can be both human and machine translation. Export most or all information from a termbase for analysis using XML when you need more advanced analytics. When your terminology management tool doesn’t give you all the answers you need. Guiding the design of a new termbase for interoperability with other termbases, but also with other tools that are used to repurpose terminology data.

2002 2008 2019 History of TBX TBX 1.0 LISA-OSCAR TBX 2.0 TBX 3.0 ISO 30042:2008 2019 TBX 3.0 ISO 30042:2019 History of TBX 2018 Smaadahl / Melby TBX has a long history dating back to the 1980’s when the need for a terminology format was first recognized. It has gone through several major iterations, evolving from SGML to XML. Including close cooperation within the TEI (Text Encoding Initiative) during the 1990’s. It was first published as an industry standard in 2002, when the OSCAR (Open Standards for Container/Content Allowing Re-use) working group of the now-disbanded Localisation Industry Standards Association (LISA) came out with the first version of TBX. The second generation of TBX was co-published in 2008 by LISA and ISO. LISA was disbanded in 2011. The third generation will be published in 2019 by ISO. TBX as an ISO standard is maintained in ISO Technical Committee 37, Language and terminology. TBX predecessors: MATER (ISO 6156:1986), MicroMATER (Melby, 1991), cooperation within the TEI culminated in ISO 12200:1999 -- MAchine-Readable Terminology Interchange Format (MARTIF).

Too powerful (complex) TBX 2.0: no easy way to know what to expect* What is the chaos we wish to tame? Too powerful (complex) *People didn’t know what to expect because they didn’t include an XCS file with each TBX document instance. 2018 Smaadahl / Melby The “chaos” that we are referring to is the result of the fact that when you receive a file that is claimed to be compliant with TBX version 2.0, it is usually hard for import software to know what to expect. The import process often fails.  This is because second-generation TBX used a complex mechanism, called an XCS (eXtensible Constraint Specification), to dynamically indicate what to expect in the file (by specifying constraints on metadata). The chaos was created because people did not follow the rule: They didn’t include an XCS file with each TBX document instance.

<?xml version="1.0"?> <TBXXCS name='DXFd-supplier' version="1.0" lang='en' xmlns="x-schema:TBX-XCS-XDRschema-v-0- 1.xml"> <header><title>subset DCS file for the Supplier example</title></header> <datCatSet> <termNoteSpec name="termType" datcatId="ISO12620A-0201"> <contents datatype="picklist" targetType="none">fullForm abbreviatedForm</contents> </termNoteSpec> <descripSpec name="subjectField" datcatId="ISO12620A-04"> <contents datatype="picklist" targetType="none">manufacturing finance</contents> <levels>termEntry</levels> </descripSpec> <descripSpec name="definition" datcatId="ISO12620A-0501"> <contents datatype="noteText" targetType="none"/> <levels>termEntry </descripSpec> </datCatSet> </TBXXCS> Sample XCS 2018 Smaadahl / Melby Sample snippet of an XCS file.

How is complexity reduced? No “generic” TBX Specify the dialect No more chaos! A TBX file must belong to a dialect TBX dialects are strictly constrained to certain data categories 3 current, public dialects: TBX-Core, TBX-Min, TBX-Basic Public dialects follow a “telescoping” principle How is complexity reduced? 2018 Smaadahl / Melby TBX 3.0, on the other hand, uses a simple mechanism: A required dialect name on the root element of each TBX file. TBX 3.0 files cannot be “generic”, but must now be instances of specific TBX dialects, for example “TBX-Basic” (it can no longer simply be “TBX”). This change will address the single most common complaint about version 2.0: lack of predictability. Going forward, tools supporting the same dialect will know what to expect. TBX dialects are strictly defined to include only certain data categories. The dialect name is linked to a formal description of what to expect. Public dialect descriptions are stored on freely available industry-standard websites, such as www.tbxinfo.net . Now all that an import routine needs to do is look at the dialect name to decide whether it is prepared to deal with that TBX file.  Chaos has been tamed through a simple mechanism. 

Telescoping principle Date Term Note TBX-Core Core + Administrative Status Customer Subset Part of Speech Subject Field TBX-Min TBX-?? TBX-Basic Core + Min + Context Definition External xref Gender Geographical Usage Project Subset Related Concept Related Term Responsibility Source Term Location Term Type Transaction Type xGraphic Core + Min Basic Needs of a given user community Telescoping principle 2018 Smaadahl / Melby There are three current public dialects of TBX: TBX-Core, TBX-Min, TBX-Basic. These public dialects are built on a “telescoping” principle. TBX-Basic contains the data categories of TBX-Min. TBX-Min contains the data categories of TBX-Core. If software supports TBX-Min, it can therefore partially support TBX-Basic. If it supports TBX-Basic, it can support TBX-Min. This “telescope” can be extended to future dialects. The requirement for a dialect to be considered public is that is responds to the needs of a specific user community. For example, TerminOrgs, an organization that represents Terminology In Large Organizations, is working on a dialect that would further expand on TBX-Basic, to meet the specific needs of terminology management in large organizations (compared to the more LSP or translation oriented scenario of TBX-Basic). (Photo: Canva, free stock photo)

Features of 3.0 VALIDATION INTEROPERABILITY TBX 3.0 EASE OF USE MODERNIZATION EASE OF USE TBX 3.0 Features of 3.0 2018 Smaadahl / Melby VALIDATION: This version of the standard clearly defines the requirements for: the Core, for a dialect to be compliant, and for document instance validation using off-the-shelf XML tools (e.g. Oxygen). INTEROPERABILITY features added in version 3.0 are: The name of a TBX dialect must be declared as the value of the type attribute on the <tbx> root element, “TBX-Basic”. TBX dialects are strictly defined to include certain data categories. There are currently 3 public dialects: Core, Min, Basic. TBX 3.0 coordinates with certain aspects of OASIS XLIFF. This version implements the XLIFF 2.0 inline markup model. This version of TBX has been MODERNIZED with features such as: Introducing a simplified, more “modern” XML style, DCT (Data Category as Tag), alongside the traditional TBX style of DCA (Data Category as Attribute). Preserving the latter for legacy support. TBX dialects may convert from one style to another without data loss (isomorphic). Full elaboration of DCT is for future versions. A preview is available on TBXinfo. Introducing a permanent, default xml namespace for TBX 3.0. Core namespace URN is urn:iso:std:iso:30042:ed:3.0 Adding a @dir attribute for text directionality (similar to HTML or XLIFF dir) that will be allowed to inherit structurally. The dir values are ltr (left-to-right), rtl (right-to-left), and auto (default). Version 3.0 is EASIER TO USE because : the XCS file in version 2.0 has been replaced by the requirement to declare the dialect name on the <tbx> root element. It allows for validation using RNG Schema (or XSD), instead of DTD + XCS. There’s free access to machine readable artifacts and supporting documentation on public websites, e.g. tbxinfo.net.

Dialect toolkit for public dialects Core, Min and Basic TBX “Spyglass” (analyzing TBX files without looking at XML) MultiTerm to TBX conversion Mapping Wizard MultiTerm-to-TBX Converter collaboration with Glossary Converter TBX “Steamroller” TBX v2-to-v3 Conversion TBX v3 Validation Tools to help you You only need the ISO 30042 standard to define a new dialect 2018 Smaadahl / Melby There are many tools available to help you, free of charge. A good starting point is TBXinfo.net. This is a community effort. Resources are freely available. We invite you to join and contribute to this community. Dialect toolkits for public dialects Core, Min and Basic. TBX “Spyglass” allows you to analyze TBX files without looking at XML. MultiTerm to TBX conversion (this is a plugin for MultiTerm, and is the result of collaboration with Gerhard Kordmann’s Glossary Converter). It is the perhaps the most downloaded MultiTerm plugin. TBX “Steamroller” helps you convert any TBX file into a valid TBX-Basic output, including the ability to convert invalid TBX into valid TBX. TBX v2-to-v3 Conversion TBX v3 Validation

See www.tbxinfo.net (TBX website) We help you! Import/export to/from various CAT tools We need you! What’s next? 2018 Smaadahl / Melby Photo: SAP image library (royalty free)

Demos 2018 Smaadahl / Melby Photo: SAP image library (SAP owned)

Thank You Hanne Smaadahl Alan K. Melby Senior Terminologist, SAP & Project Lead ISO 30042 v3.0 hanne.smaadahl@sap.com LTAC Global President & Professor Emeritus, Brigham Young University (BYU) akm@ltacglobal.org 2018 Smaadahl / Melby

Appendix 2018 Smaadahl / Melby

TBX sample 2018 Smaadahl / Melby