Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics

Slides:



Advertisements
Similar presentations
DLM-Forum - Barcelona, 7-8 May 2002 Promoting and Supporting Open Archives in Europe: The Open Archives Forum Project Donatella Castelli IEI-CNR
Advertisements

IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Accessing Distributed Resources Information: An OLAC perspective Steven Bird Gary Simons Chu-Ren Huang Melbourne SIL Academia Sinica ENABLER/ELSNET Workshop.
Getting Involved in OLAC Steven Bird University of Pennsylvania LSA Symposium: The Open Language Archives Community 4 January 2002.
Building metadata components Dieter Van Uytvanck Max Planck Institute for Psycholinguistics CLARIN-NL Info Session Nijmegen
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
The DART-Europe E-theses Portal Martin Moyle Digital Curation Manager UCL Library Services, UK ETD 2009, University of Pittsburgh, June.
Delivering HILT as a shared service Rachel Heery UKOLN, University of Bath
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
CLARIN AAI, Web Services Security Requirements
Preservation of Software Barbara Sierman (digital preservation manager) E-Humanities Software and Tools Sustainability,
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
CMDI Virtual Language Observatory Faceted Browsing Patrick Duin Max Planck Institute for Psycholinguistics 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
‘european digital library’ (EDL) Julie Verleyen TEL-ME-MOR / M-CAST Seminar on Subject Access Prague, 24 November 2006.
2 nd Training Workshop 4 – 5 June 2007 Common Data Index - CDI By Dick M.A Schaap Technical Coordinator SeaDataNet.
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
Language-Sites: Accessing Language Resources via Geographic Information Systems Dieter van Uytvanck, Alex Dukers, Paul Trilsbeek Jacquelijn Ringersma (Peter.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
CLARIN - a European Research Infrastructure Peter Wittenburg Max-Planck Institut für Psycholinguistik, Nijmegen.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
CLARIN work packages. Conference Place yyyy-mm-dd
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Joint Information Systems Committee Supporting Higher and Further Education Rachel Bruce Programme Manager, JISC Executive Collection.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Authorization and Authentication Infrastructure Daan Broeder & Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Open Archive Forum Rachel Heery UKOLN, University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
Collection-level description: from theory to practice Minerva project meeting Paris, 24 January 2003 Pete Johnston UKOLN, University of Bath Bath, BA2.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
MICHAEL and the European Digital Library: promoting teaching, learning and research The MICHAEL Project is funded under the European Commission eTEN Programme.
CLARIN Federated Identity Vision
SDMX: A brief introduction
Darja Fišer CLARIN ERIC Director of User Involvement
JISC Information Environment Service Registry (IESR)
Márton Németh – László Drótos How to catalogue a web archive?
Presentation transcript:

Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics Metadata in Context workshop Nijmegen

Metadata in Context Nijmegen Overview Context sketch VLO: ideas, sources, modalities Interoperability issues Future plans

Metadata in Context Nijmegen Context sketch Lots of resources somewhere out there: Data collections Corpora Lexica Grammars Multimedia recordings Software Web applications / services Old-school linguistic resources: Books Articles CD-ROMs It’s like a jungle, sometimes...

Metadata in Context Nijmegen VLO: the idea Researcher: “where do I start?” Provide a single entry point giving access to all information Because of the large amount of data: Drill-down paradigm (decrease search space gradually) Multiple ways of exploring: Full-text search Facet browsing Geographic overlay Unified interface, links to the original context Available via

Metadata in Context Nijmegen VLO: the sources

Metadata in Context Nijmegen VLO: the sources – LRT inventory Initiated by CLARIN Ad-hoc, low-barrier, user-driven inventory of Language Resources and Tools Number of records (+/-): Resources: 848 Tools: 180 You can add new entries yourself!

Metadata in Context Nijmegen VLO: the sources – OLAC catalogue > OLAC data providershttp://catalog.clarin.eu Metadata as harvested from 40 OLAC providers (among them several CLARIN centres) Quality and quantity differs hugely

Metadata in Context Nijmegen VLO: the sources – MPI catalogue About metadata records Broad spectrum: Experimental data Spoken Dutch corpus Sign Language corpora Endangered languages documentation Archive in principle open for externally created linguistic data collections (eg: endangered languages, see Donated Corpora) If these collections comply with the technical requirements (archiveable formats, metadata, …)

Metadata in Context Nijmegen VLO: the sources – DFKI tool registry Contains information about 292 (linguistic) software packages You can add entries yourself

Metadata in Context Nijmegen VLO: the modalities GIS

Metadata in Context Nijmegen VLO: the modalities Hierarchical catalogue

Metadata in Context Nijmegen VLO: the modalities Facet browser

Metadata in Context Nijmegen Interaction between modalities

Metadata in Context Nijmegen … all leading to the data

Metadata in Context Nijmegen Interoperability issues (1) The six facets to which all of the metadata records are mapped are currently country continent origin language organization genre subject

Metadata in Context Nijmegen Interoperability issues (2) Observations: Lots of inconsistencies and errors, eg for 1 organisation: MPI (5) MPI for Psycholinguistics (Nijmegen, Netherlands), Académie Marquisienne (Tuhuna 'Eo 'Enata) (2) MPI for Psycholinguistics (Nijmegen, Netherlands), Académie Marquisienne (Tuhuna 'Eo 'Enata) (39) Max Planck Institute for Psycholinguistics (Nijmegen, Netherlands) (112) Max Planck Institute for Psycholinguistics (13849) Max Planck Institute for Psycholinguistics & Volkswagen Stiftung (12) Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands (2) Max Planck Institute for Psycholinguistics, Postbus 310, 6500 AH Nijmegen, The Netherlands (15) Facets help to detect them

Metadata in Context Nijmegen Interoperability issues (3) Because of the distributed approach: Distributed responsabilities Loss of specificity by converting all metadata records to a common subset Important to provide link to original record (also for the context!) Need for high-quality and well maintained controlled vocabularies and relevant Persistent Identifiers: Mime types Organisation names ISO language codes (cfr. ISOcat) Domain-specific vocabularies

Metadata in Context Nijmegen Interoperability issues (4) Metadata exchange protocols exist (OAI-PMH eg) but: They are not always used For the VLO one still has to rely on non-continuous information flows like CSV files Clearly an undesired situation on the longer term Granularity: how to indicate it in a standardized way? User feedback

Metadata in Context Nijmegen Future steps Curate the metadata: correct typographical errors add information use consistent terminology, etc. Process CMDI- and ISOcat based metadata Use (emerging) standards to refer to persons projects resources... in a persistent and interoperable way

Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n°