Data Archiving and Networked Services

Slides:



Advertisements
Similar presentations
Criteria for the trustworthiness of data centres Jens Klump Helmholtz Centre Potsdam German Research Centre for Geosciences (GFZ) DataCite Summer Meeting.
Advertisements

Data Seal of Approval 16 guidelines in 16 slides Dr. Henk Harmsen.
DSA and the Certification Framework Ingrid Dillo Data Archiving and Networked Services DSA Conference, Florence 10 December 2012.
Discover the world at Leiden University Hans Fransen Introducing data management planning at an institution the first wobbly steps of a newborn baby.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Certification and Dutch data management services Marjan Grootveld LIBER workshop,
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Towards sustainable data sharing Peter Doorn Open Access Week, University of.
Data Seal of Approval Overview Olivier Rouchon – Data Seal of Approval conference, Florence 10 th December.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services DANS Research Data Services and the APARSEN Centre of Excellence Peter Doorn.
DANS is een instituut van KNAW en NWO Data Archiving and Networked Services The Front Office-Back Office model: supporting research data management in.
Costs and benefits of preserving digital research data
Data Archiving and Networked Services DANS is een instituut van KNAW en NWO Certification at DANS Ingrid Dillo DSA Conference 2014 Amsterdam, 24 September.
Royal Netherlands Academy of Arts and Sciences 1 CRIS and DAREnet integrated into NARCIS: access to research information in the Netherlands Elly Dijk KNAW.
ICPSR and the Data Seal of Approval: A Case Study Mary Vardigan Assistant Director, ICPSR October 8, 2013.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Matthias Hemmje (FTK)
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services The e-depot for Dutch archaeology; a trusted digital repository Hella Hollander.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Enhanced Publications in the Netherlands A durable repository infrastructure.
Data Seal of Approval Overview Lightning Talk RDA Plenary 5 – San Diego March 11, 2015 Mary Vardigan University of Michigan Inter-university Consortium.
PhD-course Research Data Management (RDM) Expert Centre Research Data.
Who is doing a good job in digital preservation? Audit and Certification of Digital Repositories: ISO and the European Framework.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Dykes of standards supporting polders of data The practices used in the Netherlands.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Repository Requirements and Assessment August 1, 2013 Data Curation Course.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO and the Peter Doorn Data Archiving and Networked Services EUDAT Conference Trust.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Access to Research Data in Trustworthy Digital Archives Peter Doorn, Director.
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services Introduction to Data Management Planning.
Nestor – German network of expertise in digital preservation nestor German Network of Expertise in Digital Preservation nestor.
DigCCurr Professional Institute: Curation Practices for the Digital Object Lifecycle Digital Curation Program Development Nancy Y McGovern Research Assistant.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
Datasealofapproval.org13/12/2015 DANS is an institute of KNAW and NWO 1 Identifying and removing barriers for sharing scientific data Laurents Sesink
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Measurement of research impact in OpenAIRE 2020: via text mining or the CRISs?
DANS is an institute of KNAW and NWO Data Archiving and Networked Services DANS Research Data Services and the APARSEN Centre of Excellence Peter Doorn.
CRIS and repositories: NARCIS Elly Dijk KNAW Research Information EuroCRIS meeting, Moscow (Rusland), 9 October 2008.
Aligning Digital Preservation Policies with Community Standards Nancy McGovern Digital Preservation Officer.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Sharing: why not? Ingrid Dillo, DANS OAB Panel Data Sharing RDA San Diego,
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Data Archiving and Networked Services DANS Peter Doorn, director DANS.
Data Seal of Approval (DSA) SEEDS Kick-off meeting May 5, Lausanne Renate Kunz.
PhD-course Research Data Management (RDM) Expert Centre Research Data.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Katherine Skinner, Martin Halbert & Matt Schultz Educopia Institute and MetaArchive Cooperative NDSA Infrastructure Committee
GEO Data Management Principles Implementation : World Data System–Data Seal of Approval (WDS-DSA) Core Certification of Digital Repositories Dr Mustapha.
Kathleen Shearer Data management: The new frontier for libraries.
DSA & WDS WG Certification RDA Outputs: Munich 20 February 2015.
PhD-course Research Data Management (RDM) Expert Centre Research Data.
Core Certification for Trustworthy Data Repositories
WP3: Common policies and implementation strategies
CESSDA SaW Training on Trust, Identifying Demand & Networking
Audit & Certification in APARSEN
2nd DPHEP Collaboration Workshop
Legacy and future of the World Data System (WDS) certification of data services and networks Dr Mustapha Mokrane, Executive Director, WDS International.
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
Trusted Repository Systems Overview
Certification of Trusted Repositories
RDA/WDS IG Certification of Digital Repositories The new 'Core Trustworthy Data Repository Requirements' hands-on RDA Plenary 9, Barcelona,
D33.1B PEER REVIEW OF DIGITAL REPOSITORIES
DANS Certification Efforts Use Case
Trustworthiness of Preservation Systems
Summit 2017 Breakout Group 2: Data Management (DM)
USING THE DSA TO BENCHMARK AND GUIDE TRUST WITHIN CESSDA
Sophia Lafferty-hess | research data manager
Experiences of the Digital Repository of Ireland
Introduction to the CESSDA Data Management Expert Guide
It’s all about people Data-related training experiences from EUDAT, OpenAIRE, DANS Marjan Grootveld, DANS EDISON workshop, 29 August 2017.
Presentation transcript:

Data Archiving and Networked Services Research data management, data archiving and data dissemination. The roles of a Trustworthy Digital Repository (TDR): The case of DANS Kees Waterman Data Archiving and Networked Services Séminaire DRTD-SHS Les données de la recherche dans les humanités numériques Journée 3: Maîtriser les technologies pour valoriser les données Lille, 21 avril 2015

Topics for today DANS Trusted Digital Repository Research Data Management (RDM) Depositing data requirements possibilities Data reuse Expanding Trust projects infrastructures certification

Research Data Lifecycle http://www.data-archive.ac.uk/create-manage/life-cycle

DANS? First predecessor dates back to 1964 (Steinmetz Foundation), and Historical Data Archive 1989 Institute of Dutch Academy of Sciences and National Funding Organisation (KNAW & NWO) since 2005 Mission: promote and provide permanent access to digital research information

DANS core services Long-term data archive EASY: Electronic Archiving System for self-deposit Long-term data archive Short& Mid-term storage (current research) National information portal to data + e-pubs + researchers + projects As of 5/2014: Dutch Dataverse Network NARCIS: Gateway to scholarly information In the Netherlands

Training & Consultancy Additional services Persistent Identifier URN:NBN resolver Common metadata harvester Training & Consultancy

EASY: Electronic Archiving SYstem 30.000 datasets archived Metadata in 6 tastes/variaties http://easy.dans.knaw.nl

Geodata in data archive

Persistent Identifier

Dublin Core Metadata Hyperlinks possible to other (geo)data and publications

Depositing data (1) Basic requirements sufficient metadata (Dublin Core) research documentation question, methodology variables (code book !) questionnaires, surveys (consent forms !) anonymize your data ! data in preferred / accepted formats

Depositing data (2) Access and agreements Persistent Identifier 4 access categories embargo is an option license and user agreement Persistent Identifier Data processing privacy using established protocol [OAIS / TDR] Publishing

So… are we successful ?

Datasets in EASY, by year (deposit) November 2010: 1 million data files March 2013: 2 million data files (25,000 data sets)

Datasets in EASY, by size 1,8% of datasets > 2 GB 2,8% of datasets > 1 GB The long tail of research data

Reuse of datasets EASY 2005-2014

2012

2014

Datasets, by type of access

OAIS – Open Archival Information System SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package

1 DSA in France, CINES www.cines.fr Centre Informatique National de l’Enseignement Supérieur

1 CESSDA centre in France, Réseau Quetelet www.reseau-quetelet.cnrs.fr Consortium of European Social Science Data Archives

1 CESSDA centre in France, Réseau Quetelet Membership and types of data

Additional roles and responsibilities As an organization publish and maintain Preservation Policy support RDM (dataverse, brochure for DMP*) data archive certified As partner in Dutch association - RDNL actively develop a federated data infrastructure support RDM (training) build relationships with universities etc., ’front office-back office’ model[data supporters] http://www.dans.knaw.nl/en/deposit/information-about-depositing-data/DANSdatamanagementplanUK.pdf

Additional roles and responsibilities Participate in Dutch and international projects on Research Infrastructures surveydata.nl – longitudinal research EHRI – holocaust studies CARARE – visualization & presentation ARIADNE – metadata standardization EUDAT – European data services

Additional roles and responsibilities Develop national e-depots for specific disciplines EDNA – archaeology DANS promotes and sustains the certification of repositories (‘e-depots’) as contribution to further development of TRUST in sustainable archiving and consultation of ‘data’.

http://www.dans.knaw.nl http://www.edna.nl https://easy.dans.knaw.nl Retrospective archiving of datasets http://www.dans.knaw.nl http://www.edna.nl https://easy.dans.knaw.nl Ongoing archiving and publication of datasets ‘Grey’ literature scanning project National regulation in KNA (Quality Norm Archaeology) Embedding in DANS EASY Odyssee project CARARE project Ariadne project 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014+

2004-present NWO-funded Odyssee-project (2009-2010) =>Project database for undeveloped research Improving accessibility through various projects Ongoing archiving and publication of datasets

1. CARARE plot (Europeana)

Challenges in sharing data Technical challenges: standardisation, accessibility, discoverability, preservation and curation, data security, etc. Legal and ethical challenges Cultural challenges

Why not share? Those data are mine! Discredit my findings Still analyzing the data I cannot trust the data produced somewhere else

Riding the Wave (10/2010) 1st recommendation: Develop an international framework for a Collaborative Data Infrastructure

Federated data-infrastructure in the Netherlands

Services in the FOBO model Information and awareness raising on data management and curation Training (data experts at FOs and researchers) Storage (during and after the research)

Back office: roles and responsibilities Focus on expertise and long term storage: Expertise and innovation in the area of data curation, data management and re-use of data Providing expertise to the research community: contributions to FO training courses for researchers Providing expertise to the front office: training courses for data experts, consultancy, contact persons Long term preservation of data in a trustworthy digital repository (certification; at least Data Seal of Approval)

Challenges Expanding the model over all universities (institutional agreements) Creating one single back office “desk” (RDNL) Creating a technical infrastructure for automatic data ingest Developing a business model to cover the costs

RDNL National and international funders are developing open data policies Researchers must be able to comply with these policies through awareness raising, RDM training, high quality and reliable storage facilities Federated data-infrastructure to enable stakeholders to collaborate, coordinate activities and divide roles and tasks based on their specific expertise (economies of scale, selection of data). Small country The total picture is far more complex: international and discipline specific infrastructures

Long-term preservation for a specific sub-discipline: surveydata.nl

EASY for long-term preservation

Layer 1 Search portal Dedicated Questasy servers Layer 2 Archive for long-term preservation Layer 3

The federated data infrastructure: a collaborative framework Trust Data Curation User functions: data capture and transfer Data Generators Data Users Front offices: Local Data Facilities (University Libraries) Domain-Specific Research Infrastructures Community Support Services Back Offices: DANS, 3TU.Datacentrum, … Common Data Services: Archiving, Access, … Basic Technical Infrastructure: SURFsara, Target, … Common Data Services: Storage, Backups, …

Trust comes on foot, but leaves on horseback Trust in research data Trust is at the very heart of storing and sharing data Trust involves: Data creators Data users Data repositories Funders It is clear that trust is at the very heart of storing and sharing research data. And this is so for a number of stakeholders, all for different reasons: The users of data from a digital repository have questions like: Have the data been preserved properly? Are they of high quality? Have they been changed in some way? Does the pointer get me to the right object? The depositors of data want to be sure that in the digital repositories their data are safe and remain accessible, usable and meaningful over time. Finally the funders, They want the reassurance that their investment in the production of valuable research data is no waisted but will remain also in the future; re-use of data will give them a higher return on their investment Trust comes on foot, but leaves on horseback

What is trust built on? Dedicate yourself (mission statement) Do what you promise (stable, sincere and competent reputation) Be transparent (peer review, get certified)

The need for trusted digital repositories A brief history 1996, Open Archival Information System - OAIS 2002, ISO 14721 Independent auditing was deemed necessary to certify OAIS-compliance and thus engender trust. 2003 development of OAIS auditing metrics begun 2007 Trustworthy Repositories Audit & Certification (TRAC) 2012 ISO 16363 for Trusted Digital Repository (TDR)

Certification Standards: Data Seal of Approval (DSA) DANS initiative (2005/6) International Board 16 guidelines Self assessment Transparency 35 seals awarded since 2010 The research data: can be found on the Internet are accessible (clear rights and licenses) are in a usable format (interoperable) are reliable can be referred to (persistent identifier) There has long been a demand for some way to evaluate, to assess the quality of the services of a digital repository. Over the last few years a number of evaluation guidelines are becoming available. Slide DSA can be seen as an entry point which requires limited effort from the repositories Data producers are responsible for the quality of research data, repositories for storage and long-term access, and users for correct use of data http://datasealofapproval.org/

European Framework for Audit and Certification 3. Formal certification: DSA + full external audit and certification based on ISO 16363* or DIN 31644** 2. Extended certification: DSA + structured, externally reviewed and publicly available self-audit based on ISO 16363* or DIN 31644** 1. Basic certification: Data Seal of Approval (DSA) http://www.trusteddigitalrepository.eu *ISO 16363 - Audit and Certification of Trustworthy Digital Repositories **DIN 31644 - Information and Documentation - Criteria for Trustworthy Digital Archives DSA can be used on it’s own but it is also the first step in a European framework being set up for Audit and certification. Because DSA has taken the DIN and ISO on board it can be used as a stepping stone towards more rigorous levels of audit and certification. It depends on the repository and their stakeholders which level of certification they would like to pursue. Simple and easy method to get into audits an certification of you repository and now let’s have a look at the online tool it self. Short presentation not live because the change of failure is proportional to the amount of people watching this presentation. I will be having a poster session later this afternoon and you are more then welcome to have a look at the live system.

Survey of 30 Institutions for Highest Priority in Data Policies Policy Importance Integrity 217 Preservation 150 Access control 126 Provenance 108 Data Management plans 99 Publication 75 Replication 66 Data staging 52 Federation 37 Metadata sharing 23 Regulatory 16 Collection properties 7 Identifiers Data sharing Versioning Licensing 6 Format Data Life Cycle Arrangement 5 Processing Based on RDA Working Group: Practical Policy. Rainer Stotzka, Reagan Moore, Presentation for Plenary 3, March 2014 (https://www.rd-alliance.org/filedepot?cid=104&fid=466)

http://www.youtube.com/watch?v=HJbo-OAaJ1I#t=229

Thank you for your attention www.dans.knaw.nl https://easy.dans.knaw.nl www.narcis.nl https://dataverse.nl/dvn/ Contact: kees.waterman@dans.knaw.nl