Heinrich Widmann widmann@dkrz.de EUDAT & CKAN Heinrich Widmann widmann@dkrz.de.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
The eXtensible Catalog’s Drupal Toolkit: a Discovery Interface to Address Users’ Needs Jennifer Bowen University of Rochester, Rochester, NY ALA LITA Drupal.
MEDIN Standards M. Charlesworth and the MEDIN Standards Working Group.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
MINT – METADATA INTEROPERABILITY SERVICES Nikolaos Simou – National Technical University of Athens.
Repository Development Projects LeMill & Waramu Tallinn University Centre for Educational Tecnology Estonia.
DASISH Metadata Catalogue Binyam Gebrekidan Gebre, Stephanie Roth, Olof Olsson, Catharina Wasner, Matej Durco, Bartholemeus Worcslav, Przemyslaw Lenkiewicz,
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
Workshop 1.4: ESPON Database ESPON Internal Seminar November 2011 Kraków,Poland ESPON M4D Project - LIG (Grenoble Computer Science Lab) Partner Jérôme.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Core Integration Web Services Dean Krafft, Cornell University
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
Oregon Spatial Data Library Enhancements BIENNIUM FIT PROPOSAL.
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
B2find.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Sharing models as social objects through HydroShare
Customising Primo V3 for discovery of digital collections E-LUNA 2011 Annual Conference Milwaukee, WI – 13th May 2011 Stefania Riccardi Library Repository.
WIS and GCI/GEOSS interoperability project
VI-SEEM Data Discovery Service
Integrating Data for Archaeology
Flanders Marine Institute (VLIZ)
Data Services at CSC ©2016 OKM ATT initiative Licensed under Creative Commons BY 4.0.
INSPIRE Geoportal Thematic Views Application
Steering Group Member, Link Digital
Accessing a national digital library: an architecture for the UK DNER
Building Search Systems for Digital Library Collections
IP Publishing From IP Data Base to IP list to IP catalog
VI-SEEM Data Repository
Lifting Data Portals to the Web of Data
PNDS Architecture - an overview
Data Access and Re-use Carl Johan Håkansson EUDAT Service Area Manager
WISE and the future of WFD reporting
INSPIRE Geoportal Thematic Views Application
The Re3gistry software and the INSPIRE Registry
Experiences of the Digital Repository of Ireland
Introducing da|raSearchNet
B2FIND Integration and Usage
Search Relevancy in GEO Data Access Broker
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
NFFA Europe.
Enabling direct data access to social science research data
The Bodleian Libraries
Metadata, Ingest, and Data Feeds
INSPIRE Geoportal Thematic Views Application
Priority geospatial datasets for the European Commission
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
Semantic Annotation service
Session 2: Metadata and Catalogues
Publishing data and metdata From iRODS to repositories
Disseminating Service Registry Records
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Márton Németh – László Drótos How to catalogue a web archive?
Agro Hackathon Hack 5: Agro Portal and VEST Registry
Why IIIF? Shane Huddleston Jeff Mixter Dave Collins Product Manager
Metadata supported full-text search in a web archive
Australian and New Zealand Metadata Working Group
Presentation transcript:

Heinrich Widmann widmann@dkrz.de EUDAT & CKAN Heinrich Widmann widmann@dkrz.de

EUDAT The project European Data Infrastructure (EUDAT http://eudat.eu ) Motivation : Manage the rising tide of research data Improve Interoperability in a wide cross-disciplinary scope Objective : Build up a Collaborate Data Infrastructure, based on common data services ( https://eudat.eu/services) driven by requirements of the research communities

B2FIND the metadata service of EUDAT (info+doc  https://eudat.eu/services/b2find ) based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other (external) repositories provides a powerful and user-friendly discovery portal  http://b2find.eudat.eu on metadata covering a wide range of research cross-discipline communities b2find.eudat.eu

Used Technologies CentOS 6 (productive instance) Modular Ingestion Workflow Harvesting : OAI-PMH (but as well support of JSONAPI etc.) Own Mapping Module (+ community specific md schemas and ontologies, closed vocabs, …) Upload to CKAN : + common B2FIND MD schema, lot of additional facets (extra fields) Apache + Varnish 3 Cache + CKAN Version 2.2.3 with extensions  CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?

CKAN extensions ckanext-b2find (+ b2find facets, legal pages etc.) ckanext-spatial (supported by ckan !, but compatibility issues (fixed) ) ckanext-timeline (own development for ‚Temporal coverage‘ on different time scales => makes the usibility quite complex)  (how) can be added to supported CKAN extentions ? ‚commitment‘ by CKAN for support and maintanance ? Others interested in further development of this extension ? ckanext-datesearch (PublicationYear) Planned : Support of more extensions, e.g. Use potential of sematic web/LOD ( + dcat, sparql, rdf) Recombinant ??, Kettle ??, …. Improve web appearance : (+ elastic search, …) CKAN itself could harvest OAI-PMH. Why is there a separate mapping and harvesting module?

Issues Scalability / Performance (mostly Postgres related) Status : > 450000 records harvested Upload / indices (re-index lasts > 3 days !) Download / search (esp. When access on PG-DB) Delete (purge!) datasets (often not removed completely from DB+SOLR) Upgrade to newer CKAN versions Compatibility of ckan extentions (spatial, temporal) Compatibility to own schema Decouple upload and serach Two SOLR indices (one ‚read only‘, one ‚write and update‘) ?

Issues (cont.) History ( - how to get rid of it (in PostGres) Something like ‚paster clean history‘ ? Support of Taxonomies/Hierarchies for facets (hierarichal tree of (sub-)disciplines)

Outlook More records from more communities (will this scale with > 1 or 10 millions records ?) Use tools as Kibana and elasticsearch to provide statistics on the fly in dashboard Community customisation (switch between different SOLR cores and adapted search facets) Further search/dessiminate funtionality : annotations, SRU interface

Links Docs : https://eudat.eu/services/userdoc/b2find Portal : http://b2find.eudat.eu Sourcecode : https://github.com/EUDAT-B2FIND Support : https://eudat.eu/support-request Contact : widmann@dkrz.de