Depositors’ usage of IMDI metadata Daan Broeder & Alex Klassmann MPI Institute for Psycholinguistics DELAMAN meeting London 2006.

Slides:



Advertisements
Similar presentations
Cultural Heritage in REGional NETworks REGNET Review Meeting (REV-01-01), , Brussels.
Advertisements

IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Qualitative Data:Preparation and Use John Southall Senior Qualitative Data & Support Services Officer Qualidata.
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
© 2010 Bennett, McRobb and Farmer1 Use Case Description Supplementary material to support Bennett, McRobb and Farmer: Object Oriented Systems Analysis.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
Search Engines and Information Retrieval
1 ISO – Metadata Next Generation International consensus being built on structured metadata within a broader Geomatics Standard under ISO Technical.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
“A successful man is usually a classifier and a chartmaker. This applies as much to modern business as to science or libraries… A large business or work.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
© 2011 Pearson Prentice Hall, Salkind. Nonexperimental Research: Qualitative Methods.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
1 DEVELOPING ASSESSMENT TOOLS FOR ESL Liz Davidson & Nadia Casarotto CMM General Studies and Further Education.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Search Engines and Information Retrieval Chapter 1.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
Collection Level Data Problems … & Suggestions for Avoiding Them.
Preceptor Orientation
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Metadata, the CARARE Aggregation service and 3D ICONS Kate Fernie, MDR Partners, UK.
LIRICS mid-term review 1 WP1: DCR Metadata and API Peter Wittenburg Max Planck Institute for Psycholinguistics 23rd May 2006.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
Managing Project Through Information System.  Monitoring is collecting, recording, and reporting information concerning any and all aspects of project.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
1 Archiving Michael J. Levin Harvard Center for Population and Development Studies
1 How to make the GEOSS Data CORE a reality Part 1 Max Craglia Presented on behalf of Max by: Alan Edwards, EC Stefano Nativi, CNR.
CountrySTAT Regional Basic Administrator Training for ECO Member States Friday, October 23, 2015 EVENT Foundations of CountrySTAT E-learning.
Electronic Scriptorium, Ltd. AIIM Minnesota Chapter Metadata and Taxonomy Presentation Copyright Electronic Scriptorium, Ltd. All rights reserved, 1991.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Dryad Management Board Meeting Friday, May 22 1:30 p.m. Session 3: Software development timeline and priorities Slides pprepared by the Dryad development.
INFORMATION MANAGEMENT Unit 2 SO 4 Explain the advantages of using a database approach compared to using traditional file processing; Advantages including.
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Curriculum reform Tirana 16th-17th March. A bit about ESIB ESIB-the National Unions of students in Europe is an umbrella organization representing over.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
The Case for Participation Enter Date Enter Presentation Audience.
NEFIS (WP5) Evaluation Meeting, November 2004 Evaluation Metadata Aljoscha Requardt, University of Hamburg Response rate: 93% (14 of 15 partners.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
MSG Reuse Catalog T.W. van den Berg 7 April 2010.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
ADN Framework Overview A Collaboration of ADEPT, DLESE and NASA (2002 Nov. 19)
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
ILDG MDWG report Chris Maynard. ILDG Status QCDml1.3  Issues outstanding at ILDG7 –Management –Ensemble observables –Valid.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
© 2009 Pearson Prentice Hall, Salkind. Chapter 10 Nonexperimental Research: Qualitative Methods.
© 2015 OKM ATT 2014–2017 initiative 
2. An overview of SDMX (What is SDMX? Part I)
CSE 635 Multimedia Information Retrieval
Open Archival Information System
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Market Access Database (MADB)
The role of metadata in census data dissemination
Subject repositories Session 6.3
Crop Protection Compendium Instruction Manual
Introduction to reference metadata and quality reporting
Presentation transcript:

Depositors’ usage of IMDI metadata Daan Broeder & Alex Klassmann MPI Institute for Psycholinguistics DELAMAN meeting London 2006

IMDI metadata Forms with ~150 possible descriptors –Describes bundles of related resources –Extensive set compared with DC/OLAC –But only “name” descriptor is compulsory Archive holds –~40000 IMDI sessions or resource bundles non-local but available in our DB –Describing ~ resources

IMDI Metadata The descriptors hierarchically ordered entries, which concern –the event (recording location, date, etc), –the project, –the languages involved, –the Participants, –the type and nature of speech, –technical information about the resources –access rights values of descriptors can be closed or open vocabularies or free text. user can use prose descriptions at each of these levels + project/user defined keys

Metadata Use Documentation of the resources Retrieval and reuse: archive offers tools for: –Browsing the archives’ corpora –Structured metadata search High precision, low recall –Unstructured google-like metadata search High recall, low precision Large set-> not all elements are always relevant –Sparsely populated metadata space –Search tool to show frequency counts for metadata values. Avoids fruitless searches.

Depositor Guidance In general depositors are urged to be complete as possible for documentation purposes Some projects have an obligatory set of descriptors to fill in. (CGN, DBD, …) Provide training to get familiar with the set and tools Provide documentation Support by student-assistants and corpus managers

Observations II Often researchers do not fill in all the relevant data at their disposal. Some tendency to avoid this time-consuming work oriented to re-usage by others. The sheer size of the set may discourage people to start filling in data at all. Training helps. Best results in projects that decided beforehand what descriptors were needed to fill in. Of course there are also very committed individuals!!! Corpus managers/student assistants may clean things up. –but limited use since only the researcher has specific knowledge –can serve as intermediaries.

Observations II Only that part of the archive where metadata was specified manually (e.g. CGN was excluded as were sessions outside the MPI) Statistics on the basis of ~25000 remaining sessions The data gives an impression of how often fields are actually filled in (e.g. not empty and not default “unknown“ or “unspecified“). Cannot exclude “repairs” where obvious omissions were repaired by corpus management

Descriptor nametotal fl-12000acqui Country Address Region71011 Description Key Project.Name Content.Description Genre SubGenre Task Modalities Subject362 Interactivity PlanningType Involvement SocialContext6109 EventStructure799 Channel81011 Content.Language.Description Content.Language.Id Content.Language.Name919094

Actor.Language.Description Actor.Language.Id Actor.Language.Name Actor.Role Actor.Name Actor.FullName Actor.Code Actor.FamilySocialRole Actor.EthnicGroup Actor.BirthDate588 Actor.Age Actor.Sex Actor.Education Actor.Description Actor.Key MediaFile.Type MediaFile.Format MediaFile.Quality18831 WrittenResource.Type WrittenResource.SubType WrittenResource.Format WrittenResource.ContentEncoding370 WrittenResource.CharacterEncoding3120 WrittenResource.LanguageId411

Conclusions As can be seen the sets are far from being complete. But also every field of the scheme has been used in some sessions, so that it seems that no field in the schema is obsolete People find use for the description fields that are available at different levels (~50%) Also the user/project defined keys are used (~50%) -> IMDI set is not big enough Some keys are not much used –Remove? –But where then to put this information if its available?