Transcripts are stored in a relational database Transcripts are divided up to their smallest constituent (words), while the context is preserved, in a.

Slides:



Advertisements
Similar presentations
THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
Advertisements

Theo van Veen, Koninklijke Bibliotheek The European Library: opportunities for new services.
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
CATCHPlus Valorisation project for CATCH research programme. –Public funding –But: development mainly by commercial parties –Open source required Cultural.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Adaptability of learning objects by appropriate knowledge representation Anastas Misev Institute of Informatics Faculty of Natural Science and Mathematics.
XHTML Presenters : Jarkko Lunnas Sakari Laaksonen.
DT228/3 Web Development JSP: Directives and Scripting elements.
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
JSP Architecture  JSP is a simple text file consisting of HTML or XML content along with JSP elements  JSP packages define the interface for the compiled.
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
Create and Manage METS in retrodigitization Markus Enders Goettingen State and University Library
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Web Services/SOA in the Portuguese Parliament The whole is greater than the sum of its parts Matos, Margarida –
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
SEMESTER PROJECT PRESENTATION CS 6030 – Bioinformatics Instructor Dr.Elise de Doncker Chandana Guduru Jason Eric Johnson.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
LIS 654 BUILDING DIGITAL LIBRARIES FALL 2011 NOVEMBER 03, 2011 The OAI-PMH Harvester Plugin for The Omeka Content Management System JAMES R. GRIFFIN III.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
BMC Open Access Colloquium, 8 February Morgan: "Open Access Repositories"
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
Slide 1 ERFP Website The German Centre for Documentation and Information in Agriculture 10 th Workshop for European National.
LRC ’03 Localisation Engineering Standards in the Digital World the Localisers’ Perspective Enda McDonnell.
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Dale E. Gary Professor, Physics, Center for Solar-Terrestrial Research New Jersey Institute of Technology 1 9/25/2012Prototype Review Meeting.
Introduction to Archon for CARLI Members Jen Masciadrelli, Library Systems Coordinator, CARLI Office Sarah Horowitz, Special Collections Librarian, Augustana.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Louisa Lambregts, Louisa Lambregts
ONZEminer Margaret Maclagan, ONZE director Robert Fromont, designer.
Functional Requirements Specification for Open Repository for Doctoral Thesis at UNSA Dušanka Bošković University of Sarajevo 15 th Workshop on “Software.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
CRIS and repositories: NARCIS Elly Dijk KNAW Research Information EuroCRIS meeting, Moscow (Rusland), 9 October 2008.
Implementing (parts of) FRAD in a FRBR-based discovery system Jenn Riley Metadata Librarian Indiana University Digital Library Program.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Louisa Lambregts, Louisa Lambregts
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Getting Your Content in the Penn State Student Portal Presented By James Leous, Program Manager James Vuccolo, Lead Research Programmer.
Do Real Archivists Use OAI? Mid-Atlantic Regional Archives Conference Gettysburg, PA October 31, 2003 Chris Prom Assistant University Archivist University.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
May 2011DLM Forum, Budapest1 The First OAIS-compliant Ingest of Digital Records Zoltán Lux The National Archives of Hungary web:
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
English-Lithuanian-English Lexicon Database Management System for MT Gintaras Barisevicius and Elvinas Cernys Kaunas University of Technology, Department.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
OAI and Metadata Harvesting
Sharing of Eurostat predefined tables
Sharing of Eurostat predefined tables
Presentation transcript:

Transcripts are stored in a relational database Transcripts are divided up to their smallest constituent (words), while the context is preserved, in a structure basically like this: interview_id interview speaker_idsentence_idword_id speakersentenceword interview_idspeaker_idsentence_id locality start time end time DynaSAND: technology

This means that individual words can be addressed, e.g. for POS tagging The POS tags are themselves stored as separate categories, attributes and values, not as opaque strings: attribute_id value_id word_id category_id word_id word category attributes

Generating other formats The fact that the data is stored in its smallest constituent parts makes it relatively easy to generate other formats Example: we realize that a binary format like a relational database is not appropriate for long-term archival, so we made the SAND transcriptions available as TEI XML by creating a template and filling that with data from the database with a script Another example: the IMDI metadata for another corpus (The Goeman-Taeldeman-Van Reenen Project, or GTRP corpus) were created in the same way

Generating metadata for CLARIN Previous experience with SAND and GTRP indicates that generating XML metadata for CLARIN from our databases should be doable The TEI and IMDI for SAND and GTRP were created once and are static; we plan to make the process more dynamic for CLARIN metadata by creating the XML on the fly (and implementing a caching mechanism for performance reasons) so that the metadata is always up to date

Edisyn (European Dialect Syntax) One of the goals of Edisyn is the development of a search engine which uses one tag set to search different corpora, including the SAND, concurrently Central tag set is being developed by Franca Wesseling; we plan to make it compatible with ISOcat Search engine translates these tags to the native tag sets of the corpora Ideal case: corpora are hosted by their own organizations and accessible via a web service In practice: the Meertens has local copies of the corpora Participating corpora: SAND, CORDIAL-SIN (Portuguese), ASIS (Italian), EMK (Estonian); more to come

Other Meertens language resources PLAND (Plant Names in Dutch Dialects) NVD (Dutch Database of First Names) NFD (Dutch Database of Family Names) Corpus of free dialect speech (sound recordings) Dutch Database of Toponyms (in development) Dutch Song Database Dutch Folktale Database

Other Meertens language resources Apart from part of the sound recordings, all these are web-based and based on the same database technology We plan to make CLARIN metadata available for these resources in a stepwise manner: first metadata on the corpus level, later also metadata on the record level The technologies involved (OAI-PMH) are new to us, so we want to do this in close cooperation with a “harvesting” institution to make sure that our stuff is correct

Further in the future The Meertens Institute wants to be part of CLARIN and in the future we also hope to contribute to the development of tools to work with language resources

Thank you for your attention!