Formats, interoperability and standards Marc Kemps-Snijders.

Slides:



Advertisements
Similar presentations
Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Advertisements

ISO TC 37 Terminology and Other Language and Content Resources OntoIOp Telecon Sue Ellen Wright October 20,
IATI Technical Advisory Group Technical Proposals Simon Parrish IATI Technical Advisory Group, DIPR March 2010.
Status on the Mapping of Metadata Standards
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
"Safe and Secure Solutions for Smarter Cities" The value of standardization and certification in planning and managing Smart Cities 12/05/2014Euralarm.
Technical update on ISO 9001:2015 Colin MacNee Duncan MacNee Limited
Principles of ISOcat, a Data Category Registry Marc Kemps-Snijders a, Menzo Windhouwer a, Sue Ellen Wright b a Max Planck Institute for.
Improved Methods for the Assessment of the Generic Impact of Noise to the Environment 6th Framework Programme, Area 1.2.1, Policy-oriented research, contract.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
LIRICS International Standards in Lexicography Gerhard Budin University of Vienna August 2005.
ISO Current status of development
1 XEM: Managing the Evolution of XML Documents Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool and Elke A. Rundensteiner Presented by: Li Shuhong.
Philips Research France Delivery Context in MPEG-21 Sylvain Devillers Philips Research France Anthony Vetro Mitsubishi Electric Research Laboratories.
The views expressed in this presentation are those of the presenter, not necessarily those of the IASB or IFRS Foundation. International Financial Reporting.
ISO Standards: Status, Tools, Implementations, and Training Standards/David Danko.
TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
/21LIRICS IAG Meeting Barcelona LIRICS IAG Meeting /21 Universitat Pompeu Fabra Barcelona Introduction Gerhard Budin.
Provo, 16 Aug 2007 LMF meeting 1 Lexical Markup Framework: ISO Provo meeting Gil Francopoulo.
CLARIN web services and workflow Marc Kemps-Snijders.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
XML The Overview. Three Key Questions What is XML? What Problems does it solve? Where and how is it used?
Standards for language resources the ISO/TC 37(/SC 4) perspective
►Thierry Declerck (DFKI GmbH, LT Lab. Saarbrücken, Germany) Standards and Infrastructures for Language Resources.
LIRICS mid-term review 1 LIRICS WP3: Morpho-syntactic and syntactic annotations Thierry Declerck DFKI-LT - Saarbrücken 23rd May 2006.
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Interoperability: Where the irresistible force of flexibility meets the immovable.
Experiments with ODD outside the TEI framework Laurent Romary & Piotr Banski The ISO-TEI connection.
Standards and Tools: DOBES and CLARIN Views - resumé after about 8 years - Peter Wittenburg, André Moreira The Language Archive - Max Planck Institute.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini CNR-ILC - Pisa 23rd May 2006.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
Introduction of PRO WG activities Group Name: TP Source: Shingo Fujimoto, FUJITSU, Meeting Date: Agenda Item:
CLARIN work packages. Conference Place yyyy-mm-dd
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Presentation Title: Day:
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Halifax, 31 Oct – 3 Nov 2011ICT Accessibility For All ICT Accessibility Standardization Dr. Jim Carter, ISACC Document No: GSC16-PLEN-57r2 Source: ISACC.
LEXUS a flexible web based lexicon tool LEXUS a flexible web based lexicon tool, august 21 th, 2005 Marc Kemps-Snijders Peter Wittenburg
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
group ПР-09-4 м Shevchenko Lilia
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
SynAF:Provo ISO Meeting Thierry Declerck, DFKI GmbH.
ISO Process Summary for SEDRIS Dick Puk ISO/IEC SC24 SEDRIS Editor WG8 Meeting, October 1, 1999 ISO/IEC JTC1/SC24/WG8 N4.
ISO/TC37/SC4/TDG6 Language Resource Ontologies , Marrakech HASIDA Koiti CfSR, AIST, Japan.
ISO/TC37/SC4/TDG6 Language Resource Ontologies , Pisa HASIDA Koiti CfSR, AIST, Japan.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
WG9 Report ISO/TC211 Plenary Meeting Montreal, Canada
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
TMF - Terminological Markup Framework Laurent Romary Laboratoire LORIA (CNRS, INRIA, Universités de Nancy) ISO meeting London, 14 August 2000.
LIRICS mid-term review 1 WP5 Adam Funk University of Sheffield 23rd May 2006.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
ISO TC37/SC4 N429 ISO/TC37/SC4/TDG6 Language Resource Ontologies /12, Busan /12, Busan HASIDA Koiti HASIDA Koiti
The ISO Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)
ISO Current status of development ​ ​ ISO development process ​1​1.
1 ISO/PC 283/N 197 ISO Current status of development November 2015.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Object-Oriented Parsing and Transformation Kenneth Baclawski Northeastern University Scott A. DeLoach Air Force Institute of Technology Mieczyslaw Kokar.
Considerations on barriers to data sharing Elaine Collier, MD National Center for Research Resources National Institutes of Health.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
International Electrotechnical Commission International Electrotechnical Commission.
Querying GrAF data in linguistic analysis
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
Yaşar Tonta & Orçun Madran [yasartonta, Hacettepe University
The Re3gistry software and the INSPIRE Registry
2. An overview of SDMX (What is SDMX? Part I)
Presentation transcript:

Formats, interoperability and standards Marc Kemps-Snijders

Format interoperability  Interoperability is only relevant if  Resources are to be exchanged  Resources are to be combined in collections  Tools and services need to operate on resources  Results are to be compared  Standardization attempts to solve these cross resource and technology issues by  Looking at existing practices  Provide abstractions  Address sustainability aspects  Seek international consensus  Provide solid grounding through well accepted standards bodies. Increasingly the linguistic community not only presents itself from a research perspective, but also from a service provider perspective

 Basic standards  Unicode – ISO  Widely supported, some glyphs are still missing  Country codes - ISO 3166  Widely supported  Language codes – ISO 639-1/2/3  Many languages not covered, politically sensitive  XML  Widely supported, lack of generic linguistic resource models and semantic grounding  Feature Structures Part 1– ISO :2006  Reference XML vocabulary for FS representation  TEI  CLARIN should identify the extent in which competing formats are being used (DocBook, NLM DTD, …) Standardization

 Ongoing standardization projects  Morpho-syntactic Annotation Framework (MAF) – ISO/DIS  Token-word form, does not specify tag sets  Syntactic Annotation Framework (SynAF) – ISO/CD  Draft stage and not usable at this stage  Lexical Markup Framework (LMF) – ISO 24613:2008  Flexible lexicon framework, further concrete testing needed  Data Category Registry (DCR) – ISO 12620:2009 (forthcoming)  Restricted model, no relations, limited constraints specification  TEI/ODD  Combines documentation and schema  Persistent Identification – ISO/CD  Linguistic Annotation Framework (LAF) – ISO/DIS  Annotated resources as graphs, very abstract level Standardization

Pivot formats Pivot Use of accepted pivot model(s) reduces the amount of transformers needed For each combination of processes a transformer is needed

 Formats  CHAT  Shoebox/Toolbox  EAF  EXMERALDA  XCES  PAULA  TIGER  Pentree  …. Community practices Tag sets  GOLD  TDS  STTS  EUROTYP  …. Clarin will need to make statements on how to deal with these formats (inclusion versus curation)

Thank you for your attention

ISO process CD = Committee Draft DIS = Draft International Standard DPAS = Draft Publicly Available Specification DTR = Draft Technical Report DTS = Draft Technical Specification FDIS = Final Draft International Standard IS = International Standard NP = New Work Item Proposal PAS = Publicly Available Specification TR = Technical Report TS = Technical Specification WD = Working Draft