Heinrich Widmann widmann@dkrz.de EUDAT & CKAN Heinrich Widmann widmann@dkrz.de.

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)

28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.

UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN

Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.

Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.

The eXtensible Catalog’s Drupal Toolkit: a Discovery Interface to Address Users’ Needs Jennifer Bowen University of Rochester, Rochester, NY ALA LITA Drupal.

MEDIN Standards M. Charlesworth and the MEDIN Standards Working Group.

StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.

MINT – METADATA INTEROPERABILITY SERVICES Nikolaos Simou – National Technical University of Athens.

Repository Development Projects LeMill & Waramu Tallinn University Centre for Educational Tecnology Estonia.

DASISH Metadata Catalogue Binyam Gebrekidan Gebre, Stephanie Roth, Olof Olsson, Catharina Wasner, Matej Durco, Bartholemeus Worcslav, Przemyslaw Lenkiewicz,

University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.

Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.

FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.

VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.

Workshop 1.4: ESPON Database ESPON Internal Seminar November 2011 Kraków,Poland ESPON M4D Project - LIG (Grenoble Computer Science Lab) Partner Jérôme.

A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:

Core Integration Web Services Dean Krafft, Cornell University

A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,

Oregon Spatial Data Library Enhancements BIENNIUM FIT PROPOSAL.

The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.

Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.

Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.

B2find.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No

The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.

Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.

SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365

Sharing models as social objects through HydroShare

Customising Primo V3 for discovery of digital collections E-LUNA 2011 Annual Conference Milwaukee, WI – 13th May 2011 Stefania Riccardi Library Repository.

WIS and GCI/GEOSS interoperability project

VI-SEEM Data Discovery Service

Integrating Data for Archaeology

Flanders Marine Institute (VLIZ)

Data Services at CSC ©2016 OKM ATT initiative Licensed under Creative Commons BY 4.0.

INSPIRE Geoportal Thematic Views Application

Steering Group Member, Link Digital

Accessing a national digital library: an architecture for the UK DNER

Building Search Systems for Digital Library Collections

IP Publishing From IP Data Base to IP list to IP catalog

VI-SEEM Data Repository

Lifting Data Portals to the Web of Data

PNDS Architecture - an overview

Data Access and Re-use Carl Johan Håkansson EUDAT Service Area Manager

WISE and the future of WFD reporting

INSPIRE Geoportal Thematic Views Application

The Re3gistry software and the INSPIRE Registry

Experiences of the Digital Repository of Ireland

Introducing da|raSearchNet

B2FIND Integration and Usage

Search Relevancy in GEO Data Access Broker

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Enabling direct data access to social science research data

The Bodleian Libraries

Metadata, Ingest, and Data Feeds

INSPIRE Geoportal Thematic Views Application

Priority geospatial datasets for the European Commission

IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C

Semantic Annotation service

Session 2: Metadata and Catalogues

Publishing data and metdata From iRODS to repositories

Disseminating Service Registry Records

BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES

Márton Németh – László Drótos How to catalogue a web archive?

Agro Hackathon Hack 5: Agro Portal and VEST Registry

Why IIIF? Shane Huddleston Jeff Mixter Dave Collins Product Manager

Metadata supported full-text search in a web archive

Australian and New Zealand Metadata Working Group

Presentation transcript:

Heinrich Widmann widmann@dkrz.de EUDAT & CKAN Heinrich Widmann widmann@dkrz.de

EUDAT The project European Data Infrastructure (EUDAT http://eudat.eu ) Motivation : Manage the rising tide of research data Improve Interoperability in a wide cross-disciplinary scope Objective : Build up a Collaborate Data Infrastructure, based on common data services ( https://eudat.eu/services) driven by requirements of the research communities

B2FIND the metadata service of EUDAT (info+doc  https://eudat.eu/services/b2find ) based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other (external) repositories provides a powerful and user-friendly discovery portal  http://b2find.eudat.eu on metadata covering a wide range of research cross-discipline communities b2find.eudat.eu

Used Technologies CentOS 6 (productive instance) Modular Ingestion Workflow Harvesting : OAI-PMH (but as well support of JSONAPI etc.) Own Mapping Module (+ community specific md schemas and ontologies, closed vocabs, …) Upload to CKAN : + common B2FIND MD schema, lot of additional facets (extra fields) Apache + Varnish 3 Cache + CKAN Version 2.2.3 with extensions  CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?

CKAN extensions ckanext-b2find (+ b2find facets, legal pages etc.) ckanext-spatial (supported by ckan !, but compatibility issues (fixed) ) ckanext-timeline (own development for ‚Temporal coverage‘ on different time scales => makes the usibility quite complex)  (how) can be added to supported CKAN extentions ? ‚commitment‘ by CKAN for support and maintanance ? Others interested in further development of this extension ? ckanext-datesearch (PublicationYear) Planned : Support of more extensions, e.g. Use potential of sematic web/LOD ( + dcat, sparql, rdf) Recombinant ??, Kettle ??, …. Improve web appearance : (+ elastic search, …) CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?

Issues Scalability / Performance (mostly Postgres related) Status : > 450000 records harvested Upload / indices (re-index lasts > 3 days !) Download / search (esp. When access on PG-DB) Delete (purge!) datasets (often not removed completely from DB+SOLR) Upgrade to newer CKAN versions Compatibility of ckan extentions (spatial, temporal) Compatibility to own schema Decouple upload and serach Two SOLR indices (one ‚read only‘, one ‚write and update‘) ?

Issues (cont.) History ( - how to get rid of it (in PostGres) Something like ‚paster clean history‘ ? Support of Taxonomies/Hierarchies for facets (hierarichal tree of (sub-)disciplines)

Outlook More records from more communities (will this scale with > 1 or 10 millions records ?) Use tools as Kibana and elasticsearch to provide statistics on the fly in dashboard Community customisation (switch between different SOLR cores and adapted search facets) Further search/dessiminate funtionality : annotations, SRU interface

Links Docs : https://eudat.eu/services/userdoc/b2find Portal : http://b2find.eudat.eu Sourcecode : https://github.com/EUDAT-B2FIND Support : https://eudat.eu/support-request Contact : widmann@dkrz.de