1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
ISOcat Data Model: Workflow & Guidelines Marc Kemps-Snijders a, Sue Ellen Wright b, Menzo Windhouwer a a Max Planck Institute for Psycholinguistics, b.
Example queries for Federated search Jan Odijk CLARIN Federated Search Workshop Copenhagen, 24 Apr
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Chapter 4 Syntax.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
11 CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam
Morphology, Part 2 September 26, Quick Write Thoughts Is it realistic to portray Mr. Burns as having a dictionary inside his head?
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
TLA/CLARIN CLAVAS Use Cases: Overview CMDI integration – Metadata editing Resource Annotation Kinship data.
ISOcat: known issues 10 May /20111CLARIN-NL ISOcat workshop.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL Call 3 Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
Principles of the GOLD Ontology & Conversion of GOLD to DCIF Presenters: Anthony Aristar, Evelyn Richter.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
System-level and RESA Administrators Functions. Topics Manually creating new student account Manually creating new teacher account Importing data Viewing.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
System-level and RESA Administrators Functions. Topics Manually creating new student account Manually creating new teacher account Importing data Viewing.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Core Publisher: Creating Programs & Podcasts. Training 1: Site Administration Training 2: Programs Training 3: Content Tagging Training 4: Creating Posts.
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
DC specifications or “Do’s and don’ts” when creating a DC.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
ISOcat: known issues 20 June 20131CLARIN-NL ISOcat workshop.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
CLARIN work packages. Conference Place yyyy-mm-dd
Prospect Development. Viewing Information on Our Constituents.
CLARIN-NL ISOcat workshop 2012 part 2 ( ) Ineke Schuurman Menzo Windhouwer.
ISOcat: known issues 19 June 20121CLARIN-NL ISOcat workshop.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
UML-1 8. Capturing Requirements and Use Case Model.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
SIL FieldWorks Language Explorer: The lexicon component Gary Simons SIL International Lexicon Tools and Lexicon Standards Nijmegen, 4–5 August 2010.
ISOcat: How to create a DC (including “do’s and don’ts”) 19 June 20121CLARIN-NL ISOcat tutorial.
CLARIN-NL Requirements and Desiderata Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
Natural Language Processing Chapter 2 : Morphology.
Menzo Windhouwer.  The Typological Database System (TDS) provides integrated access to multiple, independently created typological databases.  Users.
1 CLARIN? ISOCAT! Ineke Schuurman Hilversum,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Group 2: Sino-Tibetan Languages Working Group II: Sino-Tibetan Languages Session Report July 2, 2005.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ISOcat: How to create a DC (including “do’s and don’ts”) 20 June 20131CLARIN-NL ISOcat tutorial.
Kuiper and Allan Chapter 2.2.2
TDS-Curator DANS MPI for Psycholinguistics Utrecht Institute of Linguistics OTS languagelink.let.uu.nl/tds/ 9/21/20101CLARIN-NL - Call 1 - ISOcat status.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
Using Custom Submission Questions and Questionnaires in Editorial Manager™ Created by J. Strusz (9/21/2010)
1 The grammatical categories of words and their inflections Kuiper and Allan Chapter 2.1.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
Relations between Data Categories
DuELME: database of multiword expressions (MWE)
Universal Dependencies
ISOCAT ISOCAT Problems
Adobe Acrobat DC Accessibility Data Tables
Presentation transcript:

1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010

2 Overview General Standardized DCs? Multiple relevant DCs in ISOCAT Overlap with other projects Container Data Catgegories Almost Identical DCs Language Sections Existing Tagsets

3 General Always try to map to an existing ISOCAT DC, –Where possible –Irrespective of whether the ISOCAT DC is part of an official standard If not possible, or if there is uncertainty –Create a new DC, but –Also specify the relation with existing closely related ISOCAT DCs. Provide Type of the relation –dropdown list to be provided by RELCAT developers, »E.g. equals, almost-equals, is hyponym of, is hyperonym of, etc. Textual clarification of the deviation

4 General Relation to be entered into Relation Registry (RR) as soon as it is available Temporarily Proposed notation: –recordset in CSV format with records consisting of 4 fields: Relation type (from drop-down list; should be ISOCAT DCs themselves) Data-category 1 (ISOCAT PID) Data-category 2 (ISOCAT PID) Clarification (rich text) Plus some administrative info: User id, creation date etc. –To import into RR as soon as available

5 Standardized DCs? Ignore +/- standard status of DC in ISOCAT If needed, use relations in Relation Registry

6 Multiple ISOCAT DCs Map to an existing DC that is identical (wherever possible) Use relations to relate it to almost identical DCs in ISOCAT

7 Overlap with other projects Consult with other projects Registry of topics people/projects are working on –Dieter took some initiative – npZ6ZTdDZlT2VjeGhwZm5iRW5IM3BTZFI5WEE &hl=en&authkey=CL_Wl4IDhttp://spreadsheets.google.com/ccc?key=0Al5Lw- npZ6ZTdDZlT2VjeGhwZm5iRW5IM3BTZFI5WEE &hl=en&authkey=CL_Wl4ID This workshop (and others if needed)

8 Container data categories ISOCAT might be extended for this Probably not really a problem in the short term(?)

9 Almost identical DCs For ill-defined DCs in ISOCAT –Suggest better definitions and submit them to the Thematic Domain Group –Use relations to relate your DC to existing slightly different DCs (see later)

10 Almost identical DCs Example: Noun Noun is a Part of Speech assigned to words which share specific morphosyntactic (inflectional), morphological, syntactic (and semantic) properties –morphosyntactic (inflectional) properties: person, number, gender/class. declension class, case, … Specific morphological combinatorial potential (derivation, compounding), in particular diminutives, augmentatives specific syntactic combinatorial potential Where each language selects a specific subset of these properties (as illustrated in the language sections.

11 Language Sections? The highly (Polish) language-specific – (noun) Noun [subst] contains lexemes infecting for number and case, with a lexically determined grammatical gender, which do not have the category of person, e.g., woda `water', profesor `professor', pięciokrotność 'fivefoldness'; this class also contains defective plurale tantum and singulare tantum lexemes, but not depreciative lexemes. Grammatical categories of noun [subst]: number ( case ( gender ( Can now be part of the Polish language section of the DC Noun with the definition given in the previous slide

12 Existing Tagsets Make sure all DCs of an existing de facto standard tag set are in ISOCAT –Either existing DCs –Or newly added DCs Assign all DCs from such a tag set to a new closed complex category –E.g. DC d-coiTagset, ipipanTagset, etc. –(and/or to datacategory set?)

13 More… Problems and Proposed solutions –Odijk (2009), “Data Categories and ISOCAT: some remarks from a simple linguist", presentation held at FLaReNet/CLARIN Standards Workshop, Helsinki, September 27, 2009 –Odijk, J. (2010), ""Relations between Data Categories, presentation held at the CLARIN Relation Registry Workshop, MPI, Nijmegen, January 8, 2010 Both to be found (inter alia) on

14 CLARIN-NL Thanks for your attention!