CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer
Part A Issues brought up by participants –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
Part B ISOcat and CLARIN: Do’s and don’ts (version 0.1) – Introduction and discussion
Part 1 –When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
When (not) to adopt an existing DC –It should ‘match’ with the way you use a specific notion in your annotation scheme, application, … –It should come with the same profile –It should handle the same phenomenon, SpeakerID =/= SingerID
Speaker vs Singer String→Name→Person→Singer→Opera → Opera singer→Tenor →Tenor in La Bohème First: too generic, last: too specific The others are candidates Note that SingerID and SpeakerID are siblings, whereas SingerID is subclass of both Singer and ID (RELcat!)
–When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
Standards Within ISOcat currently there are little or no standards, Therefore CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge (she will consult with others)
–When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
Flagged DCs Never link with ‘deprecated’ DCs ! (in case of doubt: consult with Ineke or Menzo) In other cases the flags show whether the DC specification is correct from a technical point of view. Note that only DCs with a green marking are qualified for standardization
–When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
DC/DCS and profile Profiles are not added automatically, a DCS may contain elements with various profiles In case the profile you need is not yet available, contact Menzo and Ineke
–When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
What to include? Cf slide on SingerID/SpeakerID In general: all linguistically meaningful notions mentioned in your schema, manual, definition (cf part B) Abbreviations (PST for /past tense/) are to be mentioned as Data Element Name
–When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
TEI, metadata, webservice TEI: likely to be taken care of at ‘higher level’, if not YOU are to insert the TEI definitions you use. Metadata: new in CMDI? In that case definition in ISOcat to be provided as well Webservice: to be taken care of in CMDI
–When (not) to adopt an existing DC –What about (CLARIN) standards –What with ‘flagged’ DCs –Relation DCS – profile –What should be included in ISOcat (level of detail, abbreviations, …) –What about TEI, metadata, webservice? –How to deal with larger amounts of data
Larger amounts? in such a case: contact Menzo Windhouwer
Part B: do’s & don’ts Do’s: Create a DCS for your scheme (name project, ann.scheme, …) Provide clear definition (short, to the point) for your scheme, application, …. Take care not to leave concepts used in your definition undefined or vague Use appropriate vocabulary (per profile) Check ‘adopted’ DC’s regularly till standardization !
Do’s (continued) When creating a DC, fill out Justification: used in XYZ, part of tagset N Language section –Always English language section –Strong recommendation: sections for object language(s), for working language manual –Sections in the various languages should match (+/- be translations of each other)
Do’s (continued) When creating a DC, fill out Example section –Note that *negative* examples may be very helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))
Example sections Suppose you want to illustrate a German phenomenon: Ex.sec. in EN language section –German ex with transl in English Ex.sec. in NL language section –German ex with transl in Dutch Ex.sec. in EN linguistic section –EN example Ex.sec. in NL linguistic section –NL example with translation in English
Don’ts Confuse Language and Linguistic section –Latter contains language specific values for closed domains Be (too) language specific in definition Mention scheme in definition Use several definitions in one DC Circular definitions Rely on authority Rely on standardized status –Definition should fit YOUR scheme, etc
. --End --