MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG3 24616 Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Representing dictionaries with the TEI Proposal for basic guidelines Laurent Romary - Max Planck Digital Library With the help of Susanne Alt - CNRS.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
LIRICS International Standards in Lexicography Gerhard Budin University of Vienna August 2005.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
WMES3103 : INFORMATION RETRIEVAL
Interchange using TBX 8 th Metadata conference Berlin April 2005 Alan K. Melby Brigham Young University, Provo campus.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Future of MDR - ISO/IEC Metadata Registries (MDR) Larry Fitzwater, SC 32 WG 2 Convener Computer Scientist U.S. Environmental Protection Agency May.
WP.5 - DDI-SDMX Integration
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
/21LIRICS IAG Meeting Barcelona LIRICS IAG Meeting /21 Universitat Pompeu Fabra Barcelona Introduction Gerhard Budin.
Provo, 16 Aug 2007 LMF meeting 1 Lexical Markup Framework: ISO Provo meeting Gil Francopoulo.
CLARIN web services and workflow Marc Kemps-Snijders.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Standards for language resources the ISO/TC 37(/SC 4) perspective
Status report of : Framework for generating ontologies ISO/IEC JTC 1/SC 32/WG 2 Interim Meeting, Redwood City, USA, November 17, 2010 Dongwon Jeong,
Accessing distributed linguistic resources An XML based architecture Laurent Romary Laboratoire Loria, Nancy (F) Samuel Cruz-Lara, Patrice Bonhomme, Christophe.
Experiments with ODD outside the TEI framework Laurent Romary & Piotr Banski The ISO-TEI connection.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini CNR-ILC - Pisa 23rd May 2006.
MLIF: The Multi Lingual Information Framework ISO WD Samuel CRUZ-LARA LORIA / INRIA, France.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
ISO TC 37 / SC4 Language Resources An overview (Ammended 2-5 février 2002) Laurent Romary.
ET-ADRS-1, April ISO 191xx series of geographic information standards.
Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.
Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.
ISO a tutorial Part 2: Representing data categories TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
Nicoletta Calzolari Berlin, October PWI ISO SC 4/WG 4 Lexicon-Ontology relations PWI Nicoletta Calzolari Exploratory meeting.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Presentation Title: Day:
Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
ISO/TC37/SC4/TDG6 Language Resource Ontologies , Pisa HASIDA Koiti CfSR, AIST, Japan.
TMF - Terminological Markup Framework Laurent Romary Laboratoire LORIA (CNRS, INRIA, Universités de Nancy) ISO meeting London, 14 August 2000.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
ISO/TC37/SC4/N377 secretary report
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
WIGOS Data model – standards introduction.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
ISO TC37/SC4 N429 ISO/TC37/SC4/TDG6 Language Resource Ontologies /12, Busan /12, Busan HASIDA Koiti HASIDA Koiti
ISO/IEC JTC 1/SC 32 Plenary and WGs Meetings Jeju, Korea, June 25, 2009 Jeong-Dong Kim, Doo-Kwon Baik, Dongwon Jeong {kjd4u,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany.
ISO TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, January 2011 TC 37/SC.
Formats, interoperability and standards Marc Kemps-Snijders.
Web Service Exchange Protocols Preliminary Proposal ISO TC37 SC4 WG1 2 September 2013 Pisa, Italy.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
XML QUESTIONS AND ANSWERS
Lirics mid-term review
The Re3gistry software and the INSPIRE Registry
Part of the Multilingual Web-LT Program
Proposal of a Geographic Metadata Profile for WISE
CSE591: Data Mining by H. Liu
Linked Data Reuse in the Language Services Industry
Presentation transcript:

MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary, Nasredine Semmar LREC 2010, 19 May 2010

Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 2

Scope of MLIF

4 Scope MLIF aims at proposing a specification platform to represent multilingual data within a large variety of applications such as translation memories, localization, computer-aided translation, multimedia or electronic document management MLIF introduces a metamodel in combination with chosen data categories in order to allow the description of any specific domain MLIF provides a way to validate any instance of this metamodel, as well as, interoperability principles with numerous translation and localization standards

Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 5

6 Purpose and Justification The evolution of Communication and Information Technologies and in particular natural language processing, makes acute the question of standardization The issues related to standardization are of an industrial, economic and cultural nature The control of the interoperability between the existing industrial standards for localization (XLIFF), translation memory (TMX), … constitutes a major objective for a coherent and global management of multilingual data MLIF could be associated to multimedia standards such as MPEG-4 [ISO/IEC ], MPEG-7 [ ISO/IEC ], and W3C SMIL, in order to handle multilingual data within several multimedia applications such as, interactive TV, video conferencing, subtitling, etc All these formats work well in the specific field they are designed for, but they lack a synergy that would make them interoperable when using one type of information in a slightly different context MLIF should be considered as a unified conceptual representation of multilingual and multimedia content

Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 7

8 Description of MLIF As with “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a metamodel in combination with chosen data categories [ ISO ] These data categories will be derived as a subset of a Data Category Registry (DCR) in order to ensure interoperability between several multilingual applications and corpora A Data Category Specification (DCS) will define, in combination with the metamodel, the various constraints that apply to a given domain-specific information structure or interchange format MLIF describes elementary linguistic segments (i.e. sentence, syntactical component, word, …)

MLDC (Multi Lingual Data Collection) GI (Global Information) HistoC (History Component) GroupC (Grouping Component) MultiC (Multilingual Component) MonoC (MonoLingual Component) SegC (Segmentation Component) 9 MLIF Metamodel

10 MLIF Metamodel Multi Lingual Data Collection (MLDC) Represents a collection of data containing global information and several multilingual units Global Information (GI) Represents technical and administrative information applying to the entire data collection. Example: title of the data collection, revision history, …

11 MLIF Metamodel History Component (HistoC) This generic component allows to trace modifications on the component it is anchored to (i.e. versioning) Grouping Component (GroupC) Represents a sub-collection of multilingual data having a common origin or purpose within a given project

MLIF Metamodel Multi Lingual Component (MultiC) This component represents a unique multilingual entry Mono Lingual Component (MonoC) Part of a multilingual component containing information related to one language Segmentation Component (SegC) A recursive component allowing any level of segmentation for textual information In order to provide a larger description of the linguistic content, the MLIF metamodel allows anchoring of other metamodels, such as MAF (morphological description), SynAF (syntactical annotation), TMF (terminological description), or any other metamodel based on ISO

Data Categories Domain Project Source sourceType sourceLanguage class duration begin next xml:id xml:lang xlink … 13 MLIF Metamodel

MLIF: a simple example 14

Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 15

16 Use Cases Interoperability Linguistic Properties Related Standards Multimedia Interactive TV

17 Interoperability TMX “ the sentence contains different formatting information ”

Interoperability 18 TMX file MLIF file

Linguistic Properties Sentences, words, … Time related issues 19

Linguistic Properties 20 TMX file produced by TRADOS MLIF file produced by CEA LIST Sentence Aligner

21 Related Standards TEI (Text Encoding Initiative) The description of all different XML elements has been done by using RelaxNG [ ISO ] with the help of ODD W3C ITS (International Tag Set) ITS is a set of rules, expressed in elements, that provide information on how parts of a given DTD or XML Schema are related to specific internationalization & localization propertie W3C SMIL SMILtext MLIF may be used to include pre-existant non-MLIF data like the ones that are produced by NLP tools

Multimedia 22

Interactive TV Timed, Multilingual Textual Descriptions W3C SMIL Standardization - Development of Interactive TV Profile - Integration of Annotation Support - Definition of Temporal Text Processing ISO MLIF Standardization - Development of MLIF format - Development of a multilingual processing pipeline - Interaction with SMIL and MPEG standards multilingual component multilingual DB linguistic segment l’histoire du courage d’une femme pour démasquer un mystère Monolingual component linguistic segment la historia da la valentía de una mujer para desenmascarar un misterio Monolingual component 23

Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 24

Current Status AWI (August 2006) CD (Mai 2009) DIS (February 2010) Ongoing ballot process 25

26 Thank you! Thank you for your attention Any question? Mailing list Web site