Www.isocat.org Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.

Slides:



Advertisements
Similar presentations
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
Advertisements

ISOcat Data Model: Workflow & Guidelines Marc Kemps-Snijders a, Sue Ellen Wright b, Menzo Windhouwer a a Max Planck Institute for Psycholinguistics, b.
Example queries for Federated search Jan Odijk CLARIN Federated Search Workshop Copenhagen, 24 Apr
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Principles of ISOcat, a Data Category Registry Marc Kemps-Snijders a, Menzo Windhouwer a, Sue Ellen Wright b a Max Planck Institute for.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
CLARIN-NL/VL procedure 20 June 20131CLARIN-NL ISOcat workshop.
11 CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
ISOcat: known issues 10 May /20111CLARIN-NL ISOcat workshop.
Data Category specifications 20 March 20121CLARIN-NL ISOcat workshop.
CLARIN-NL: Dealing with ISOcat Ineke Schuurman. ISOcat and CLARIN Projects call 1 CLARIN-NL Joint Flemish/Dutch pilot Whenever relevant, elements are.
Principles of the GOLD Ontology & Conversion of GOLD to DCIF Presenters: Anthony Aristar, Evelyn Richter.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Third Declension Magister Riggs. Third Declension Third Declension Latin Nouns written by: John Garger edited by: Tricia Goss updated: 12/7/2011 The third.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Trends in Concept Modelling Turning Issues into Solutions How to Discipline a Cat Sue Ellen Wright, Kent State University.
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
DC specifications or “Do’s and don’ts” when creating a DC.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
ISOcat: known issues 20 June 20131CLARIN-NL ISOcat workshop.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
Introduction to programming in the Java programming language.
CLARIN-NL ISOcat workshop 2012 part 2 ( ) Ineke Schuurman Menzo Windhouwer.
ISOcat: known issues 19 June 20121CLARIN-NL ISOcat workshop.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ISOcat: How to create a DC (including “do’s and don’ts”) 19 June 20121CLARIN-NL ISOcat tutorial.
How Do I Learn English?.
CRITERIA FOR STANDARDIZING DATA CATEGORIES The Well-Formed Data Category Specification SUE ELLEN WRIGHT METADATA TDG WEBINAR
Repetition af Domæne model. Artifact influence emphasizing the Domain Model.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
Topic and the Representation of Discourse Content
Communicative and Academic English for the EFL Professional.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
ISOcat status
CLARIN Requirements for a Semantic Registry Daan Broeder The Language Archive – MPI Ineke Schuurman CLARIN-NL/VL – KU Leuven & Utrecht.
Psychology 307: Cultural Psychology Lecture 3
1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
DiscAn : Towards a Discourse Annotation system for Dutch language corpora or why and how we would want to annotate corpora on the discourse level Ted Sanders.
1 CLARIN? ISOCAT! Ineke Schuurman Hilversum,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
One vs. You. One and You Are generic pronouns ▫They can be used when the gender is not specific ▫Also known as indefinite or impersonal pronouns.
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ISOcat: How to create a DC (including “do’s and don’ts”) 20 June 20131CLARIN-NL ISOcat tutorial.
TDS-Curator DANS MPI for Psycholinguistics Utrecht Institute of Linguistics OTS languagelink.let.uu.nl/tds/ 9/21/20101CLARIN-NL - Call 1 - ISOcat status.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
Parts of Speech: Nouns, verbs, adjectives, adverbs.
CMD and TEI CMDI interoperability workshop Utrecht Matej Ďurčo, ICLTT, Vienna.
Writing 2 ENG 221 Norah AlFayez. Lecture Contents Revision of Writing 1. Introduction to basic grammar. Parts of speech. Parts of sentences. Subordinate.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
Relations between Data Categories
GREEK ADJECTIVES
ISOCAT ISOCAT Problems
Natural Language Processing
Presentation transcript:

Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop

Data Category Registry (DCR) Set of Data Categories (DCs) describing, for example, linguistic annotation schemes, metadata, at several levels/thematic domains – Morphosyntax – Syntax – Terminology – … ISOcat as a whole is a DCR

Thematic Domains (TD) and Groups TDs Contain Data Category Selections (DCS) – Data Categories within a specific domain (like everything for terminology), Apart from TDs as reflected by ‘Thematic Views’ in browser pane: ‘groups’ – A subset of DCs, like all DCs for a specific project The latter strategy will be followed in CLARIN-NL, making the definitions used in existing corpora etc available first “private/SHARED”, later “public”

Groups Group: morphosyntax – Entries for: CGN (Corpus of Spoken Dutch) SoNaR (reference Corpus for Dutch) Finnish project XYZ Austrian … – ALL concept used are to be included, those needed in the definitions themselves included!!

Groups

ISOcat panes

CLARIN-NL/VL project level Make DCs for all concepts, for example for ‘particple adjective’ Note: – in running text: /participle adjective/ – in identifier: participleAdjective Make use of camelCase ! – in Data Element Name: participle adjective For CLARIN-NL (and VL) – Make use of Shared CLARIN-NL group !

Language to be used in DC identifier, etc – Standard in English ALSO when your tagset is defined in another language! Your tagset “eigennaam” (Dutch) => /proper noun/, properNoun, proper noun But: names in other languages can be added elsewhere. Data Element Name is the literal as used in an application/domain

Definition to be used The ideal case: – There is already a good, standardized definition available which you can reuse – Note: this definition should express what is expressed in your application as well! Een galopperend paard (a galloping horse) – participle adjective(key 1595)i.e. an adjective – WW(od,prenom,zonder),i.e. a verb

Definition to be used (2) No perfect/good definition available? – In case this should/could be fixed We are to negotiate with the owner and try to have it adapted CLARIN-NL/VL: contact the ISOcat coordinator (Ineke) – In case of a more fundamental mismatch, come up with a definition yourself (like verb vs adjective)

New definition New definition? Guidelines: English One (1) sentence fragment – “Element which is used …” – “STTS tag for attributive adjective” – “A noun or adjective denoting…” Do not mention the concept to be defined zuInclusion =/= inclusion of zu adjective =/= Adjective [cont.]

New definition (2) Guidelines (cont.): Provide a generic definition whenever possible – Especially when a definition was still lacking, not ‘wrong’ Mention relevant characteristics Start with reference to superordinate concept – “ Pronoun referring to person” for pers.pronoun Be as “language neutral” as possible – No exotic languages in definition, unless …

Other aspects of new entries Come up with an (mnemonic) identifier – Not necessarily unique (unlike key) Justify the new entry: why is it important – (not: why is existing entry bad !) Origin: BNC, Chomsky, ISO, … Provide set of valid values (if applicable) in Conceptual Domain Gender: masculine, feminine, neutral, other

Other aspects of new entries (2) Optional: Note section 1424 verb note: A verb in a lot of languages expresses morphological features … Explanatory section 2794 ADIA explanation General rule: The ADJA class contains …

Profile Profile: – Indicates which Thematic Domain is involved, this can be more than one: /Part of Speech/ (key 1345): – Terminology, morphosyntax – In such a case: try to come up with one DC instead of several ones by adding several profiles to one and the same concept Failing profile makes it difficult to find a concept /participant age range/

Language/linguistic sections Do not confuse these !! English/Dutch/Finnish Language Section – The English section presents the definition (etc.) – In the other languages a translation is provided, Also for the ‘concept’ itself proper noun => eigennaam The definition remains in se the same!

Linguistic Section

Linguistic Section (2) Not always available Here relevant values for a specific language can be added Also possibility to provide language specific examples

General Start in ‘Private’ – CLARIN NL: share your info with the other projects in order to avoid all kinds of problems! Public – At a later moment the data can be made public. – In the end (part of the) DCs may be submitted for standardization (others may be too specific)