Download presentation
Presentation is loading. Please wait.
Published byJames Gilbert Modified over 6 years ago
1
Heinrich Widmann widmann@dkrz.de
EUDAT & CKAN Heinrich Widmann
2
EUDAT The project European Data Infrastructure (EUDAT ) Motivation : Manage the rising tide of research data Improve Interoperability in a wide cross-disciplinary scope Objective : Build up a Collaborate Data Infrastructure, based on common data services ( driven by requirements of the research communities
3
B2FIND the metadata service of EUDAT (info+doc ) based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other (external) repositories provides a powerful and user-friendly discovery portal on metadata covering a wide range of research cross-discipline communities b2find.eudat.eu
4
Used Technologies CentOS 6 (productive instance)
Modular Ingestion Workflow Harvesting : OAI-PMH (but as well support of JSONAPI etc.) Own Mapping Module (+ community specific md schemas and ontologies, closed vocabs, …) Upload to CKAN : + common B2FIND MD schema, lot of additional facets (extra fields) Apache + Varnish 3 Cache + CKAN Version with extensions CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?
5
CKAN extensions ckanext-b2find (+ b2find facets, legal pages etc.)
ckanext-spatial (supported by ckan !, but compatibility issues (fixed) ) ckanext-timeline (own development for ‚Temporal coverage‘ on different time scales => makes the usibility quite complex) (how) can be added to supported CKAN extentions ? ‚commitment‘ by CKAN for support and maintanance ? Others interested in further development of this extension ? ckanext-datesearch (PublicationYear) Planned : Support of more extensions, e.g. Use potential of sematic web/LOD ( + dcat, sparql, rdf) Recombinant ??, Kettle ??, …. Improve web appearance : (+ elastic search, …) CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?
6
Issues Scalability / Performance (mostly Postgres related)
Status : > records harvested Upload / indices (re-index lasts > 3 days !) Download / search (esp. When access on PG-DB) Delete (purge!) datasets (often not removed completely from DB+SOLR) Upgrade to newer CKAN versions Compatibility of ckan extentions (spatial, temporal) Compatibility to own schema Decouple upload and serach Two SOLR indices (one ‚read only‘, one ‚write and update‘) ?
7
Issues (cont.) History ( - how to get rid of it (in PostGres)
Something like ‚paster clean history‘ ? Support of Taxonomies/Hierarchies for facets (hierarichal tree of (sub-)disciplines)
8
Outlook More records from more communities (will this scale with > 1 or 10 millions records ?) Use tools as Kibana and elasticsearch to provide statistics on the fly in dashboard Community customisation (switch between different SOLR cores and adapted search facets) Further search/dessiminate funtionality : annotations, SRU interface
9
Links Docs : https://eudat.eu/services/userdoc/b2find
Portal : Sourcecode : Support : Contact :
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.