Www.isocat.org ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

ISOcat Data Model: Workflow & Guidelines Marc Kemps-Snijders a, Sue Ellen Wright b, Menzo Windhouwer a a Max Planck Institute for Psycholinguistics, b.
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Principles of ISOcat, a Data Category Registry Marc Kemps-Snijders a, Menzo Windhouwer a, Sue Ellen Wright b a Max Planck Institute for.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
CLARIN-NL/VL procedure 20 June 20131CLARIN-NL ISOcat workshop.
11 CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam
The Wichita lexicon in LEXUS Armik Mirzayan University of Colorado at Boulder Jacquelijn Ringersma Max Planck Institute for Psycholinguistics RELISH Workshop.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
ISOcat: known issues 10 May /20111CLARIN-NL ISOcat workshop.
Barcelona Meeting 21/06/05 MM 1 LIRICS WP2 LIRICS WP2 NLP LEXICA Task Leader: ILC-CNR (Pisa) presented by: Monica Monachini.
Data Category specifications 20 March 20121CLARIN-NL ISOcat workshop.
CLARIN-NL: Dealing with ISOcat Ineke Schuurman. ISOcat and CLARIN Projects call 1 CLARIN-NL Joint Flemish/Dutch pilot Whenever relevant, elements are.
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
Tutorial for SC 32/WG 1 e-Business Standards Prepared for: SC Kunming Plenary Meeting Wenfeng Sun, Convenor ISO/IEC JTC1 SC32 WG1 (eBusiness)
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Standards for language resources the ISO/TC 37(/SC 4) perspective
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
DC specifications or “Do’s and don’ts” when creating a DC.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
ISOcat: known issues 20 June 20131CLARIN-NL ISOcat workshop.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
ISO a tutorial Part 2: Representing data categories TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
CLARIN work packages. Conference Place yyyy-mm-dd
CLARIN-NL ISOcat workshop 2012 part 2 ( ) Ineke Schuurman Menzo Windhouwer.
ISOcat: known issues 19 June 20121CLARIN-NL ISOcat workshop.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ISOcat: How to create a DC (including “do’s and don’ts”) 19 June 20121CLARIN-NL ISOcat tutorial.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
The ISO Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)
ISOcat status
CLARIN Requirements for a Semantic Registry Daan Broeder The Language Archive – MPI Ineke Schuurman CLARIN-NL/VL – KU Leuven & Utrecht.
1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
1 CLARIN? ISOCAT! Ineke Schuurman Hilversum,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
The FDES revision process: progress so far, state of the art, the way forward United Nations Statistics Division.
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ISOcat: How to create a DC (including “do’s and don’ts”) 20 June 20131CLARIN-NL ISOcat tutorial.
TDS-Curator DANS MPI for Psycholinguistics Utrecht Institute of Linguistics OTS languagelink.let.uu.nl/tds/ 9/21/20101CLARIN-NL - Call 1 - ISOcat status.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
Linking to Linguistic Data Categories in ISOcat Menzo Windhouwer a, Sue Ellen Wright b a The Language Archive - MPI for Psycholinguistics,
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
Relations between Data Categories
ISOCAT ISOCAT Problems
Presentation transcript:

ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop

ISOcat: a Data Category Registry An implementation of ISO 12620:2009 – Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resources Successor to ISO 12620:1999 which contained a hardcoded list of Data Categories A data category – is the result of the specification of a given data field – an elementary descriptor in a linguistic structure or an annotation scheme 20 June 20132CLARIN-NL ISOcat workshop

What is a Data Category? The result of the specification of a given data field – A data category is an elementary descriptor in a linguistic structure or an annotation scheme. Specification consists of 3 main parts: – Administrative part Administration and identification – Descriptive part Documentation in various working languages – Linguistic part Conceptual domain(s for various object languages) 20 June 2013CLARIN-NL ISOcat workshop3

Data Category example Data category: /grammatical gender/ – Administrative part: Identifier: grammaticalGender PID: – Descriptive part: English definition: Category based on (depending on languages) the natural distinction between sex and formal criteria. French definition: Catégorie fondée (selon la langue) sur la distinction naturelle entre les sexes ou d'autres critères formels. – Linguistic part: Morphosyntax conceptual domain: /masculine/, /feminine/, /neuter/masculinefeminineneuter French conceptual domain: /masculine/, /feminine/masculinefeminine 20 June 2013CLARIN-NL ISOcat workshop4

Data Category types 20 June 2013CLARIN-NL ISOcat workshop5 writtenForm string open grammaticalGender string neuter masculine feminine closed simple: string constrained complex:

Data Category types 20 June 2013CLARIN-NL ISOcat workshop6 language alphabet writtenForm japanese ipa lexicon entry lemma container:

Which type to use? Which type is appropriate depends on the place of the data category in the structure of your resource: 1.Can it have a value? Complex Data Category with an data type – Any of the values of the data type? » Open Data Category – Can you enumerate the values? » Closed Data Category Fill its value domain with simple Data Categories – Is there a rule to constrain the values? » Constrained Data Category Express the rule/constraint in one of the rule languages 2.Is it a value? Simple Data Category 3.Does it group other (container or complex) Data Categories? Container Data Categories If a Data Category both has a value and groups Data Categories – Complex Data Category 20 June 2013CLARIN-NL ISOcat workshop7

Some examples 20 June 2013CLARIN-NL ISOcat workshop8 categorynoun phrase agreement person numbersingular third S NPVP VNP DetN Text=“John” Text=“hit” Text=“the”Text=“ball” /category/ a closed DC /noun phrase/ a simple DC /agreement/ a container DC /number/ a closed DC /singular/ a simple DC /person/ a closed DC /third/ a simple DC (Encoded as TEI P5 FSR the XML elements and attributesTEI P5 FSR are seen as syntactic sugar) /S/ a container DC /NP/ an open DC /VP/ a container DC /V/ an open DC /NP/ a container DC /Det/ an open DC /N/ an open DC (Text= is seen as syntactic sugar) N(soort,mv,basis) /CGN tag/ a constrained DC (The constraint is specified as an EBNF,EBNF which refers to the following DCs) /PoS/ a closed DC /N/ a simple DC /NTYPE/ a closed DC /soort/ a simple DC /GETAL/ a closed DC /mv/ a simple DC /GRAAD/ a closed DC /basis/ a simple DC CGN tag PoS N NTYPE soort GETAL mv GRAAD basis

20 June 2013CLARIN-NL ISOcat workshop9 Data Category relationships Value domain membership Subsumption relationships between simple data categories (legacy) Relationships between complex/container data categories are not stored in the DCR partOfSpeech string pronoun personal pronoun

20 June 2013CLARIN-NL ISOcat workshop10 No ontological relationships? Rationale: – Relation types and modeling strategies for a given data category may differ from application to application; – Motivation to agree on relation and modeling strategies will be stronger at individual application level; – Integration of multiple relation structures in DCR itself could lead to endless ontological clutter. Solution under development: RELcat a Relation Registry

How can you use Data Categories? 20 June CLARIN-NL ISOcat workshop Lexicon Lexical Entry FormSense 0..* 1..* Word Form Lemma LanguageBWOgenders grammaticalGenderwordOrder A (schema for a) lexicon A (schema for a) typological database partOfSpeech writtenForm grammaticalGender lexicalType lemma wordForm lexicalEntry lexicon Shared semantics! Explicit semantics!

20 June 2013CLARIN-NL ISOcat workshop12 What is a Data Category Registry? A (coherent) set of Data Categories, in our case for linguistic resources A system to manage this set: – Create and edit Data Categories – Share Data Categories, e.g., resolve PID references – Standardize Data Categories Grass roots approach

Standardization Submission group Data Category Registry Board Validation Thematic Domain Group Evaluation Stewardship group Decision Group rejected Publication 20 June CLARIN-NL ISOcat workshop

20 June 2013CLARIN-NL ISOcat workshop14 How can you use a Data Category Registry? You can: – Find Data Categories relevant for your resources and embed references to them so the semantics of (parts of) your resources are made explicit This can be supported by tools you use, e.g., ELAN, LEXUS and the CMDI Component Editor directly interact with ISOcat – Interact with Data Category owners to improve (the coverage of) their Data Categories – Create (together with others) new Data Categories and/or selections needed for your resources and share those – (Submit (your) Data Categories for standardization) De facto standardization by a community, e.g., CLARIN-NL/VL – Free of charge – Grass roots approach CLARIN-NL: interaction via Ineke

ISOcat and CLARIN(-NL/VL): general remarks 20 June CLARIN-NL ISOcat workshop

Importance of ISOcat Collaboration – Human, machine, language x, language y Essential in CLARIN, but … Impossible when we don’t know (exactly) what we are talking about! -Transitive verb – transitief werkwoord -Transitief werkwoord – overgankelijk werkwoord 20 June 2013CLARIN-NL ISOcat workshop16

Importance of ISOcat ISOcat: – Provides us with a framework to make such things clear (is X the same as Y, does A use it the same way) – At least, that is the intention, ISOcat still being ‘under construction’ Today’s sessions: – How to work with ISOcat – Which other “cats” do we have at the moment – The future … 20 June 2013CLARIN-NL ISOcat workshop17

CLARIN-NL (and VL) and ISOcat There are some 60 projects dealing with ISOcat in some sense (sometimes ‘only’ metadata (CMDI)) – 55 Netherlands – 5 Flanders – 1 NL/VL pilot – Of course, that is not the main focus of these projects, but still… – A lot of ISOcat work needs to be done! 20 June 2013CLARIN-NL ISOcat workshop18

CLARIN-NL (and VL) and ISOcat At least of TTNWW (the pilot) one of the explicit goals is to signal problems and to try to remedy them (for our own good, and that of CLARIN as a whole) In that respect, we do have some ‘success’ – Several larger and smaller issues are already being remedied At l 20 June 2013CLARIN-NL ISOcat workshop19

CLARIN-NL (and VL) and ISOcat Many (Dutch) projects working on ISOcat issues, plus those of other national CLARINs same concepts ? same problems ?  very likely 20 June 2013CLARIN-NL ISOcat workshop20

Collaboration necessary National (Dutch) level Coordinated effort Shared workspace under ‘shared’ (VIEW) USE IT Plus discussion platform Report problems to me (Ineke) International level We will try to collaborate with them as well 20 June 2013CLARIN-NL ISOcat workshop21

Collaboration (1) 20 June 2013CLARIN-NL ISOcat workshop22

Collaboration (2) VIEW FORUM 20 June 2013CLARIN-NL ISOcat workshop23

View Searches are done in ‘our own’ part of ISOcat – Try to reuse what is already contained in it – If necessary, go to the full ISOcat to reuse something available there (‘house’ icon) – Last resort: make a new DC 20 June 2013CLARIN-NL ISOcat workshop24

FORUM - All kinds of information for CLARIN NL/VL - Regular updates ! 20 June 2013CLARIN-NL ISOcat workshop25

Thanks ! 20 June 2013CLARIN-NL ISOcat workshop26