ISO TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria
Background - ISO etc. The need for abstraction Structure and content of terminological data - picture virtual-actual The meta-model (structural skeleton) Describing data categories Styles and vocabularies XTMF as a mapping tool - examples Further work: extending the model to a wider scope (language engineering)
Overview
General principles 4 Expressing constraints on the representation of computerized terminologies What is the underlying structure of computerized terminologies? Which data-category is used and under which conditions? 4 Maintaining interoperability between representations Providing a conceptual tool to compare two given formats
Definitions 4 TMF: Terminological Mark-up Framework Definition of underlying structures and mechanisms needed for the computer representation of terminological data Independence with regards any specific format 4 TML: Terminological Mark-up Language One specific representation format generated within TMF E.g.: DXLT is a possible TML
A family of formats TMF TML 1 TML 2 TML 3 TML 1 … (DXLT)(Geneter)
Meta-model Representing the underlying structure of terminological data
* * 1 * 1 * * 1 * 1 * * 11 * 1 0:1 Terminological Data Collection Global Information Terminological Entry Complementary Information Terminology- related Information Language Section Term Section Term Component Section
The structural skeleton Terminological Data Collection (TDC) Global Information (GI)Complementary Information (CI) Terminological Entry (TE) Language Section (LS) Term Level (TL) Term Component Level (TCL) * * * *
How does this work? Walking through an example…
DXLT example manufacturing A value between 0 and 1 used in... alpha smoothing factor fullForm Alfa...
Identifying the structural skeleton id=‘ID67’ [attribute] subjectField=‘ manufacturing ’ [typedElement] definition=‘A value…’ [typedElement] lang=‘ hu ’ [attribute]lang=‘ en ’ [attribute] term=‘…’ [element] term=‘alpha smoothing factor’ [element] termType=‘fullForm’ [typedElement] TE LS TS TE: Terminological Entry LS: Language Section TS: Term Section
TMF information model TE TS LS TS id=‘ID67’ subjectField=‘ manufacturing ’ definition=‘A value…’ lang=‘ hu ’ lang=‘ en ’ term=‘…’ term=‘alpha smoothing factor’ termType=‘fullForm’
GMT representation ID67 manufacturing A value between 0 and 1 used in... en alpha smoothing factor fullForm hu Alfa...
TML à la mode ISO –Ingredients –A structural skeleton »(take the TMF Metamodel) –A reference Data Category Registry »ISO is a good place to find one –Recette –Choose some data categories from the registry »You can even constrain the values of your datcats –Associate a style and vocabulary to each datcat »You can inspire yourself from others (DXLT) –Serve it hot to your software guy with a piece of SALT software
GMT Generic Mapping Tool
Background 4 Interoperability principle –If any two TMLs have exactly the same DCS, even though they differ radically in style and vocabulary, they are equivalent. 4 Consequence –It is always possible to define a filter from one TML to another when they are interoperable GMT is the intermediate representation to do so
From one TML to another 4 GMT - Generic mapping tool –an abstract XML representation identification of levels – … »a recursive element representation of data-categories – …
GMT description cont. Bracketing features xxx Lenoc
GMT description cont Annotating information pencil whose casing is fixed around a cental graphite medium which is used for writing or making marks
The tmf element Description: –The tmf element is the root element for any valid XTMF document. It contains both the global information that corresponds to a terminological data collection, the collection itself, and the complementary information comprising external resources in particular, which are needed for describing the various terminological entries. Content model:
The struct element Description –The struct element should be used to represent a locus in a given structural skeleton. One such locus will be represented by exactly one struct element. The struct element is recursive and may also contain feat and/or brack elements to express attributes belonging to the corresponding level of the meta model. Attributes: –type: level in the meta model (TDC, TE, LS, TS or TCS) Content model:
Styles and vocabularies
Implementating a DatCat –Definitions: ‘ style ’ — The way a given DatCat is implemented as an XML object… ‘ vocabulary ’ — symbols needed to express the implementation of a given DatCat in its associated style ; –E.g.: »DatCat: /definition/ »Vocabulary = [def] »Style = Element » pencil whose casing … DatCat value
Implementating a DatCat (Cont.) –Definition: ‘ anchor ’ — the XML element(s) to which the implementation of a given DatCat can be attached –E.g.: alpha smoothing factor
Styles - element 4 Element Def.: The Datcat is implemented as an element, child of its anchor Vocabularies : the name of the corresponding element E.g.: pencil whose casing … alpha smoothing factor
Styles - attribute 4 Attribute Def.: The Datcat is implemented as an attribute of its anchor Vocabularies : the name of the corresponding attribute E.g.: … DatCat value
4 TypedElement 4 ValuedElement 4 TypedValuedElement
Data Categories A Formal Description
Data Category Registry dcsd:DataCategory rdf:about Data Category DCRegistry Description VersionNumber
Data Category description DCDefinition DCName Content dcsd:DCDefinition dcsd:DCName dcsd:Content dcsd:DCIdentifier dcsd:Level DCType (S, C) dcsd:DCType Salt /SEW dcsd:DCAdmin DCComment dcsd:DCComment Data Category Locus DCAdmin DCIdentifier DCParent dcsd:DCParent DCExample dcsd:DCExample
Levels and content Content DataType TargetType Ref to other datcat(s) dcsd:DataType dcsd:TargetType rdf:Alt rdf:li List of References Ref to other datcats rdf:Alt rdf:li Level/Loci rdf:Alt Ref to other datcat(s) rdf:li List of References
Administrative properties dcsd:DCAdmin Data Category DCAdmin Status dcsd:Status StatusDate dcsd:StatusDate StatusNote dcsd:StatusNote EditionDate dcsd:EditionDate ShortFormAdmittedNameForbiddenName Source dcsd:Source VariantNames dcsd:VariantNames Dcsd:ShortForm Dcsd:AdmittedName Dcsd:ForbiddenName
Actualizing a DatCat TMF specific properties
Styling properties dcsd:Style Data Category Style StyleName dcsd:StyleName ElementName dcsd:ElementName AttributeName dcsd:AttributeName TypeValue dcsd:TypeValue Simple Element Attribute TypedElement ValuedElement TVElement Value dcsd:Value Pour simple Anchor dcsd:Anchor
Attribute style description dcsd:StyleName=“Attribute” –Conditions of use: Not valid for annotations –Required properties dcsd:AttributeName –Example: dcsd:AttributeName=“id” …
Element style description dcsd:StyleName=“Element” –Required properties dcsd:ElementName –Example: dcsd: ElementName =“definition” …
TypedElement style description dcsd:StyleName=“TypedElement” –Required properties dcsd:ElementName, dcsd:TypeValue –Example: dcsd:ElementName =“termNote” dcsd:TypeValue=“partOfSpeech” N
ValuedElement style description dcsd:StyleName=“ValuedElement” –Conditions of use: Not valid for annotations –Required properties dcsd:ElementName –Example: dcsd:ElementName =“pos”
TVElement style description dcsd:StyleName=“TVElement” –Conditions of use: Not valid for annotations –Required properties dcsd:ElementName, dcsd:TypeValue –Example: dcsd:ElementName =“free” dcsd:TypeValue=“pos”
Simple style description dcsd:StyleName=“Simple” –Conditions of use: Express the value of simple data categories –Required properties: dcsd:Value –Example: dcsd:Value =“Nom” Nom
Bracketing information
Annotating content
Rationale 4 Why should we annotate specific content? –To identify components which are not explicitly expressed as a specific part of a terminological entry E.g.: Characteristics of a concept –To relate a component to another entry or an external resource E.g.: bibliographical reference
Dealing with languages
Two types of languages 4 Working language The language used at a given place in a document, along the XML hierarchy Representation: xml:lang 4 Object language The language about which you speak at a given place in your terminological entry (e.g. describes the Language Section level) Representation: as a data category “language”, with a narrow scope
Example — DXLT Une valeur entre 0 et 1 utilisée… alpha smoothing factor fullForm
Example — GMT en Une valeur entre 0 et 1 utilisée… alpha smoothing factor fullForm
Conclusion –A general model for analysing and representing terminological data collection –An underlying formalism expressed in XML,RDF –Associated tools (Salt project) DCSEditor, DCSBrowser, Automatic generation of XSLT filters and XML schemas from a given TML specification
Useful pointers 4 SALT project – – 4 The TMF site –