Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heinrich Widmann widmann@dkrz.de EUDAT & CKAN Heinrich Widmann widmann@dkrz.de.

Similar presentations


Presentation on theme: "Heinrich Widmann widmann@dkrz.de EUDAT & CKAN Heinrich Widmann widmann@dkrz.de."— Presentation transcript:

1 Heinrich Widmann widmann@dkrz.de
EUDAT & CKAN Heinrich Widmann

2 EUDAT The project European Data Infrastructure (EUDAT ) Motivation : Manage the rising tide of research data Improve Interoperability in a wide cross-disciplinary scope Objective : Build up a Collaborate Data Infrastructure, based on common data services ( driven by requirements of the research communities

3 B2FIND the metadata service of EUDAT (info+doc  ) based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other (external) repositories provides a powerful and user-friendly discovery portal  on metadata covering a wide range of research cross-discipline communities b2find.eudat.eu

4 Used Technologies CentOS 6 (productive instance)
Modular Ingestion Workflow Harvesting : OAI-PMH (but as well support of JSONAPI etc.) Own Mapping Module (+ community specific md schemas and ontologies, closed vocabs, …) Upload to CKAN : + common B2FIND MD schema, lot of additional facets (extra fields) Apache + Varnish 3 Cache + CKAN Version with extensions  CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?

5 CKAN extensions ckanext-b2find (+ b2find facets, legal pages etc.)
ckanext-spatial (supported by ckan !, but compatibility issues (fixed) ) ckanext-timeline (own development for ‚Temporal coverage‘ on different time scales => makes the usibility quite complex)  (how) can be added to supported CKAN extentions ? ‚commitment‘ by CKAN for support and maintanance ? Others interested in further development of this extension ? ckanext-datesearch (PublicationYear) Planned : Support of more extensions, e.g. Use potential of sematic web/LOD ( + dcat, sparql, rdf) Recombinant ??, Kettle ??, …. Improve web appearance : (+ elastic search, …) CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?

6 Issues Scalability / Performance (mostly Postgres related)
Status : > records harvested Upload / indices (re-index lasts > 3 days !) Download / search (esp. When access on PG-DB) Delete (purge!) datasets (often not removed completely from DB+SOLR) Upgrade to newer CKAN versions Compatibility of ckan extentions (spatial, temporal) Compatibility to own schema Decouple upload and serach Two SOLR indices (one ‚read only‘, one ‚write and update‘) ?

7 Issues (cont.) History ( - how to get rid of it (in PostGres)
Something like ‚paster clean history‘ ? Support of Taxonomies/Hierarchies for facets (hierarichal tree of (sub-)disciplines)

8 Outlook More records from more communities (will this scale with > 1 or 10 millions records ?) Use tools as Kibana and elasticsearch to provide statistics on the fly in dashboard Community customisation (switch between different SOLR cores and adapted search facets) Further search/dessiminate funtionality : annotations, SRU interface

9 Links Docs : https://eudat.eu/services/userdoc/b2find
Portal : Sourcecode : Support : Contact :


Download ppt "Heinrich Widmann widmann@dkrz.de EUDAT & CKAN Heinrich Widmann widmann@dkrz.de."

Similar presentations


Ads by Google