Data Services at CSC ©2016 OKM ATT 2014-2017 initiative www.openscience.fi Licensed under Creative Commons BY 4.0
CSC and Customers Computing Services Research Information Management Services Funet Network Services Education Management and Student Administration Services Identity and Access Management Services Datacenter and Capacity Services (IaaS) Training Services Consultation and Tailored Solutions Ministry of Education and Culture Other ministries and state administration Higher education institutions Research institutions Companies
Data Service Portfolio Data services for open science (details later) HPC archive 20GB to 5TB default quota/user iRODS (see IDA) Cloud (ePouta, cPouta) ePouta: secure and private for organizations cPouta: general purpose IaaS (includes FGCI resources) Databases for HPC and others HPC data-analysis, off-the-shelf and tailored services EUDAT Pan-European research data infra, training and consultancy B2DROP*, B2Share*, B2Safe, B2Stage, B2Find* Coordinated by CSC
Open Science Services Development: Policy Work E.g. requirements for higher education institutions Framework architecture A target-level description of open science & research processes, services, information, data structures, actors, roles and information system services Defines the framework for national common solutions, components, data management, information system and local service design and implementation Finished in 2015, put into practice in 2016 Long term preservation model for research data Recognize the most important or unique research output Ensure linkages between publications, data and methods Make services easy to use, efficient and adaptable Enable organizations to easily adopt the services in their own operations
Data Services for Open Science Etsin research data finder IDA research data storage service AVAA open data publishing portal PAS digital preservation solution Tuuli data management planning tool Research infrastructure databank
Data lifecycle and services Data planning Data search Data analysis Data storage Data sharing Data reuse Open science & research handbook PAS
AVAA Current architecture IDA Etsin iRODS Reetta REMS Apache CKAN Anyone Browser/API Haka user Browser/AP I IDA user Anyone Browser Haka user Browser Browser Folder Command line http http https WebDAVS https https https irods Current architecture AVAA Liferay SUI My files IDA-AVAA download IDA-REMS download AVAA sites irods irods irods https IDA Davis irods oai-pmh iRODS SQL OAI-PMH Etsin Reetta REMS Apache CKAN https RESTful API
research data storage service IDA
IDA research data storage Offered since 2012 for projects in Finnish universities, universities of applied sciences and the Academy of Finland Organizational usage quotas vary according to size from 30 TB to 1260 TB Open-source iRODS technology provides secure storage procedures with data replication openscience.fi/ida
IDA research data storage Currently 130 projects 500 registered users 19 million data files 470 TB used Data owner decides on openness and use policy Metadata catalogue and open data portal data metadata User Producer Research organization’s service
plans for new IDA Needs: Everyday storage and sharing A medium-length term (~10 yrs) preservation buffer for PAS (long term) Data lifecycle support: storage, “freezing” and hand-over Centralized metadata management: Data registration in external metadata resource, linking files to datasets to storage packages Access management improvements: roles, organizations Upgrades planned for multiple layers: software (iRODS vs. OwnCloud?), storage solution (scale out) and system architecture
research data finder Etsin
Etsin research data finder National metadata catalogue for research data Adheres to the national metadata model URN PIDs assigned, also support for other IDs Currently 9000+ dataset metadata entries published etsin.avointiede.fi
Etsin research data finder Extension of CKAN data portal & Solr search engine DDI and OAI-PMH metadata harvesting from outside sources Lately: UX improvements, new datasets harvested, plans for integration with research organization catalogues
open data publishing portal AVAA
AVAA open data publishing platform For producers and users of open data since 2013 Pilot cases of research data and access tools developed together with researchers Open data from IDA Roughly 3000 users yearly, 10 million+ API requests avaa.tdata.fi
AVAA open data publishing platform Applications and interfaces for data download, analysis and visualizations Applications developed as open source: github.com/avaa-csc/
AVAA open data publishing platform Applications and interfaces for data download, analysis and visualizations Applications developed as open source
digital preservation solution PAS
Layers in data storage and discovery Managing status (is data integrity intact? is data available?) Managing location (where is the data?) Managing roles (who owns rights to the data? who is responsible for sustainability?) Managing risks (how to keep data discoverable and usable? what actions are needed?) Source: McDonald 2008
PAS digital preservation solution PAS infrastructure operational National Digital Library digital preservation (KDK-PAS) in production since 5.11.2015 Under the administration of the Ministry of Education and Culture Preserving cultural heritage ISO27001 audited service Research Data PAS Same infrastructure as KDK-PAS Preservation model published in 12/2015 At piloting phase To production in stages starting 2017
data management planning tool dmpTuuli
dmpTuuli data management planning tool What: data management planning (DMP) tool for Finnish research organizations How: a collaborative project with a user driven approach Why: DMP is an integral part of good research practise and ensures research integrity and quality Where: www.dmptuuli.fi When: Piloting with national funders in 2016
dmpTuuli Data management plan (DMP) will help you manage your data, meet funder requirements and help others use your data if shared. – DMPTuuli will help you write data management plans. DMPTuuli is provided by the Finnish Tuuli-project. The project has worked closely with researchers and research funders to produce guidance and templates that assist researchers to produce an effective data management plan (DMP) to cater for the whole lifecycle of a project, from bid-preparation stage through to completion. DMPTuuli is based on DMPonline code, developed by the UK's Digital Curation Centre.
Data management plan A living document – updatable and reviewable Create your data management plan early and review it regularly throughout the research project Describes what data will be collected and how the usage and storage of your data how to enable the reuse of your data after the project Covers issues concerning Responsibilities Data ownership and licensing Costs
Research infrastructure databank
Research infrastructure databank Unified descriptions of RIs and services Promotes openness and sharing Centralized and easily updatable For researchers, RI service providers, funders infras.openscience.fi
RI DB: Features in development PIDs for RIs Open API Updates through harvesting Linking data: publications, data, funding, projects, organizations, resources etc.
Common challenges
Common Challenges Metadata management & creation, metadata reserve Levels of abstraction in research data management: file vs. dataset Researcher vs. organization, handover Roles of funders International data
Thank you!