Www.isocat.org CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.

Slides:



Advertisements
Similar presentations
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Advertisements

February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Bulk loading ISOcat data categories with the Data Category Interchange Format 10/24/20111CLARIN-NL ISOcat Call 2 followup.
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
Principles of ISOcat, a Data Category Registry Marc Kemps-Snijders a, Menzo Windhouwer a, Sue Ellen Wright b a Max Planck Institute for.
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
CLARIN-NL/VL procedure 20 June 20131CLARIN-NL ISOcat workshop.
11 CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
TLA/CLARIN CLAVAS Use Cases: Overview CMDI integration – Metadata editing Resource Annotation Kinship data.
4/16/2007Declare a Schema File I1. 4/16/2007Declare a Schema File I2 Declare a Schema File A collection of semantic validation rules designed to constrain.
ISOcat: known issues 10 May /20111CLARIN-NL ISOcat workshop.
Data Category specifications 20 March 20121CLARIN-NL ISOcat workshop.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
NERC DataGrid Vocabulary Server Access Vocabulary Workshop, RAL, February 25, 2009.
November 1, 2006IU DLP Brown Bag : Fall Data Integrity and Document- centric XML Using Schematron for Managing Text Collections Dazhi Jiao, Tamara.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
ISOcat: known issues 20 June 20131CLARIN-NL ISOcat workshop.
Report on the ISOcat project Marc Kemps-Snijders Menzo Windhouwer Peter Wittenburg Sue Ellen Wright January 8,
1 Collection Specific Vocabularies March Terminology CB - abbreviation for collection builder CV - abbreviation for controlled vocabulary.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
ISOcat: known issues 19 June 20121CLARIN-NL ISOcat workshop.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
METS Application Profiles Morgan Cundiff Network Development and MARC Standards Office Library of Congress.
ISOcat: How to create a DC (including “do’s and don’ts”) 19 June 20121CLARIN-NL ISOcat tutorial.
CLARIN-NL Requirements and Desiderata Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Slide 1 SDTSSDTS FGDC CWG SDTS Revision Project ANSI INCITS L1 Project to Update SDTS FGDC CWG September 2, 2003.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
The ISO Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)
ISOcat status
CLARIN Requirements for a Semantic Registry Daan Broeder The Language Archive – MPI Ineke Schuurman CLARIN-NL/VL – KU Leuven & Utrecht.
1 CLARIN? ISOCAT! Ineke Schuurman Hilversum,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Using DSDL plus annotations for Netconf (+) data modeling Rohan Mahy draft-mahy-canmod-dsdl-01.
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
ISOcat tutorial DCR data model and guidelines. Simple and complex DCs Simple Data CategoryComplex Data CategoryConceptual Domain Data CategoryDescription.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ISOcat: How to create a DC (including “do’s and don’ts”) 20 June 20131CLARIN-NL ISOcat tutorial.
Web Design – Week 2 Introduction to website basics Website basics: How the Web Works Client / server architecture Packet switching URL components.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
ISOcat tutorial Hands-on session. Supported browsers Internet Explorer 7 and 8 – IE 8 is regularly tested Firefox 3 and higher – Firefox 3.5 is regularly.
CMD and TEI CMDI interoperability workshop Utrecht Matej Ďurčo, ICLTT, Vienna.
Linking to Linguistic Data Categories in ISOcat Menzo Windhouwer a, Sue Ellen Wright b a The Language Archive - MPI for Psycholinguistics,
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
Relations between Data Categories
Getting a Leg Up on OAI for the NSDL
XML in Web Technologies
The Re3gistry software and the INSPIRE Registry
S-127 – Marine Traffic Management Release Candidate NIPWG 6 30 January 2019 Raphael Malyankar Eivind Mong Sponsored by IHO.
WebDAV Design Overview
Presentation transcript:

CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up

Topics Data Category types Bulk import Beyond ISOcat – RELcat – SCHEMAcat 10/10/2012CLARIN-NL ISOcat Call 3 follow-up2

Data Category types 10/10/2012CLARIN-NL ISOcat Call 3 follow-up3 writtenForm string open grammaticalGender string neuter masculine feminine closed simple: string constrained complex:

Data Category types 10/10/2012CLARIN-NL ISOcat Call 3 follow-up4 language alphabet writtenForm japanese ipa lexicon entry lemma container:

Which type? Which type is appropriate depends on the place of the data category in the structure of your resource: 1.Can it have a value? Complex Data Category with an data type – Any of the values of the data type? » Open Data Category – Can you enumerate the values? » Closed Data Category Fill its value domain with simple Data Categories – Is there a rule to constrain the values? » Constrained Data Category Express the rule/constraint in one of the rule languages 2.Is it a value? Simple Data Category 3.Does it group other (container or complex) Data Categories? Container Data Categories If a Data Category both has a value and groups Data Categories – Complex Data Category 10/10/2012CLARIN-NL ISOcat Call 3 follow-up5

CMDI example CMD component relates to a container DC CMD element relates to a complex DC CMD value relates to a simple DC The ISOcat search in the CMD Component Editor enforces this – Also a DC should be public and member of the Metadata profile However, if you link to a DC nothing of the specification is taken over into your profile  10/10/2012CLARIN-NL ISOcat Call 3 follow-up6

Some examples 10/10/2012CLARIN-NL ISOcat Call 3 follow-up7 categorynoun phrase agreement person numbersingular third S NPVP VNP DetN Text=“John” Text=“hit” Text=“the”Text=“ball” /category/ a closed DC /noun phrase/ a simple DC /agreement/ a container DC /number/ a closed DC /singular/ a simple DC /person/ a closed DC /third/ a simple DC (Encoded as TEI P5 FSR the XML elements and attributesTEI P5 FSR are seen as syntactic sugar) /S/ a container DC /NP/ an open DC /VP/ a container DC /V/ an open DC /NP/ a container DC /Det/ an open DC /N/ an open DC (Text is seen as syntactic sugar)

GrNe example aor. ἔπᾰθον;. πέπονθα, ep.. 2 plur. πέπασθε en πέποσθε; ptc. πεπονθώς, ep.. πεπᾰθυῖα;. ἐπεπόνθειν en Att. ἐπεπόνθη, Ion. plqperf. 3. ἐπεπόνθεε;.. πείσομαι; 1.Better structure: markup the symbols: … plur. … 2.Under development: /morfl/ a constrained DC linked to an EBNF grammar in SCHEMAcat (see CGN EBNF) that accepts free text interleaved with a controlled vocabularyCGN EBNF 3.Temporary: /morfl/ a closed DC linked to with the controlled vocabulary as its value domain 10/10/2012CLARIN-NL ISOcat Call 3 follow-up8

Bulk import: DCIF Create a valid DCIF XML document – In general by converting an existing digital resource XSLT, Perl, … – DCIF Schema: Human readable: – DCIF Validation levels: Structure: Relax NG validation Referential integrity: Schematron validation Example: example.dcifhttp:// example.dcif 10/10/20129CLARIN-NL ISOcat Call 3 follow-up

DCIF Validation Scenario in oXygen 10/10/201210CLARIN-NL ISOcat Call 3 follow-up

What will be overwritten? PIDs – Just invent your own URI, e.g., my:DC-1 – Use them to relate DCs: Closed DC conceptual domain to simple DC Simple DC is-a relation to another simple DC – Will be overwritten by ISOcat PIDs Unless you have ISOcat acceptale PIDs Version -> 1:0 Registration status -> private Creation date -> date of import 10/10/201211CLARIN-NL ISOcat Call 3 follow-up

Contact ISOcat sysadmin If you need: – Additional languages – Additional profiles This will require ISO TC 37 involvement, start with an import in the private profile – Additional constraint rule languages If you’re done: – Send DCIF file – Will be validated (again ) – Test import cycles on the ISOcat test server – Actual import on isocat.org If you want to do bulk updates 10/10/201212CLARIN-NL ISOcat Call 3 follow-up

Beyond ISOcat: RELcat Collect typed relationships between your new DCs and existing DCs in an Excel spreadsheet or CSV file with at least three columns 1.Your ISOcat DC PID 2.typed relationship sameAs: same semantics just different types or an uncooperative DC owner almostSameAs: minor, but for you important, differences subClassOf: yours is more specific superClassOf: yours is more general hasPart/partOf: partitive relationships 3.Related ISOcat DC PID (or an URL to an entry in another persistent concept/data category registry) 10/10/2012CLARIN-NL ISOcat Call 3 follow-up13

Beyond ISOcat: SCHEMAcat Annotate your resource schema with ISOcat DC PIDs 1.Use what your schema language provides to link to an external semantic specification ODD: in an XML-based schema language RNG: 3.Embed annotation in a comment in another (text- based) schema language EBNF: MORFL …/DC-nnn *) 4.Embed annotation in a description or note or … MDF: …/DC-nnn 5.Contact 10/10/2012CLARIN-NL ISOcat Call 3 follow-up14

ISOcat user interface Problematic: – Simple DC selector for a closed value domain Too slow especially when the closed DC is a member of the Private profile, if more specific, e.g., Metadata, the number of simple DCs loaded will be much smaller Upcoming: replace full list by a search or selection from the basket or viewed DCS – Default Private profile Users forget to select the proper profile, making the DC not appear in profile specific searches, e.g., CMDI search for metadata DCs Upcoming: no default profile – Distinction between CLARIN-NL/VL candidate DCs and recommended DCs Upcoming: CLARIN-NL/VL recommendations – Links between DCs Upcoming: become clickable Later: integration with RELcat for typed relationships 10/10/2012CLARIN-NL ISOcat Call 3 follow-up15