Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics

Similar presentations


Presentation on theme: "Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics"— Presentation transcript:

1 Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics Dieter.VanUytvanck@mpi.nl Metadata in Context workshop 2010-09-08 Nijmegen

2 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Overview Context sketch VLO: ideas, sources, modalities Interoperability issues Future plans

3 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Context sketch Lots of resources somewhere out there: Data collections Corpora Lexica Grammars Multimedia recordings Software Web applications / services Old-school linguistic resources: Books Articles CD-ROMs It’s like a jungle, sometimes...

4 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the idea Researcher: “where do I start?” Provide a single entry point giving access to all information Because of the large amount of data: Drill-down paradigm (decrease search space gradually) Multiple ways of exploring: Full-text search Facet browsing Geographic overlay Unified interface, links to the original context Available via www.clarin.eu/vlowww.clarin.eu/vlo

5 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the sources

6 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the sources – LRT inventory http://www.clarin.eu/inventory Initiated by CLARIN Ad-hoc, low-barrier, user-driven inventory of Language Resources and Tools Number of records (+/-): Resources: 848 Tools: 180 You can add new entries yourself!

7 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the sources – OLAC catalogue http://catalog.clarin.eu > OLAC data providershttp://catalog.clarin.eu Metadata as harvested from 40 OLAC providers (among them several CLARIN centres) Quality and quantity differs hugely

8 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the sources – MPI catalogue http://corpus1.mpi.nl About 130.000 metadata records Broad spectrum: Experimental data Spoken Dutch corpus Sign Language corpora Endangered languages documentation Archive in principle open for externally created linguistic data collections (eg: endangered languages, see Donated Corpora) If these collections comply with the technical requirements (archiveable formats, metadata, …)

9 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the sources – DFKI tool registry http://registry.dfki.de/ Contains information about 292 (linguistic) software packages You can add entries yourself

10 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the modalities GIS

11 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the modalities Hierarchical catalogue

12 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu VLO: the modalities Facet browser

13 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Interaction between modalities

14 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu … all leading to the data

15 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Interoperability issues (1) The six facets to which all of the metadata records are mapped are currently country continent origin language organization genre subject

16 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Interoperability issues (2) Observations: Lots of inconsistencies and errors, eg for 1 organisation: MPI (5) MPI for Psycholinguistics (Nijmegen, Netherlands), Académie Marquisienne (Tuhuna 'Eo 'Enata) (2) MPI for Psycholinguistics (Nijmegen, Netherlands), Académie Marquisienne (Tuhuna 'Eo 'Enata) (39) Max Planck Institute for Psycholinguistics (Nijmegen, Netherlands) (112) Max Planck Institute for Psycholinguistics (13849) Max Planck Institute for Psycholinguistics & Volkswagen Stiftung (12) Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands (2) Max Planck Institute for Psycholinguistics, Postbus 310, 6500 AH Nijmegen, The Netherlands (15) Facets help to detect them

17 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Interoperability issues (3) Because of the distributed approach: Distributed responsabilities Loss of specificity by converting all metadata records to a common subset Important to provide link to original record (also for the context!) Need for high-quality and well maintained controlled vocabularies and relevant Persistent Identifiers: Mime types Organisation names ISO-639-3 language codes (cfr. ISOcat) Domain-specific vocabularies

18 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Interoperability issues (4) Metadata exchange protocols exist (OAI-PMH eg) but: They are not always used For the VLO one still has to rely on non-continuous information flows like CSV files Clearly an undesired situation on the longer term Granularity: how to indicate it in a standardized way? User feedback

19 Metadata in Context 2010-09-08 Nijmegen www.clarin.eu Future steps Curate the metadata: correct typographical errors add information use consistent terminology, etc. Process CMDI- and ISOcat based metadata Use (emerging) standards to refer to persons projects resources... in a persistent and interoperable way

20 Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n° 212230


Download ppt "Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics"

Similar presentations


Ads by Google