Download presentation
1
Data Archiving and Networked Services
Research data management, data archiving and data dissemination. The roles of a Trustworthy Digital Repository (TDR): The case of DANS Kees Waterman Data Archiving and Networked Services Séminaire DRTD-SHS Les données de la recherche dans les humanités numériques Journée 3: Maîtriser les technologies pour valoriser les données Lille, 21 avril 2015
2
Topics for today DANS Trusted Digital Repository
Research Data Management (RDM) Depositing data requirements possibilities Data reuse Expanding Trust projects infrastructures certification
3
Research Data Lifecycle
4
DANS? First predecessor dates back to 1964 (Steinmetz Foundation), and Historical Data Archive 1989 Institute of Dutch Academy of Sciences and National Funding Organisation (KNAW & NWO) since 2005 Mission: promote and provide permanent access to digital research information
5
DANS core services Long-term data archive
EASY: Electronic Archiving System for self-deposit Long-term data archive Short& Mid-term storage (current research) National information portal to data + e-pubs + researchers + projects As of 5/2014: Dutch Dataverse Network NARCIS: Gateway to scholarly information In the Netherlands
6
Training & Consultancy
Additional services Persistent Identifier URN:NBN resolver Common metadata harvester Training & Consultancy
7
EASY: Electronic Archiving SYstem
datasets archived Metadata in 6 tastes/variaties
8
Geodata in data archive
9
Persistent Identifier
10
Dublin Core Metadata Hyperlinks possible to other (geo)data and publications
11
Depositing data (1) Basic requirements
sufficient metadata (Dublin Core) research documentation question, methodology variables (code book !) questionnaires, surveys (consent forms !) anonymize your data ! data in preferred / accepted formats
12
Depositing data (2) Access and agreements Persistent Identifier
4 access categories embargo is an option license and user agreement Persistent Identifier Data processing privacy using established protocol [OAIS / TDR] Publishing
13
So… are we successful ?
14
Datasets in EASY, by year (deposit)
November 2010: 1 million data files March 2013: 2 million data files (25,000 data sets)
15
Datasets in EASY, by size
1,8% of datasets > 2 GB 2,8% of datasets > 1 GB The long tail of research data
16
Reuse of datasets EASY 2005-2014
17
2012
18
2014
19
Datasets, by type of access
20
OAIS – Open Archival Information System
SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package
21
1 DSA in France, CINES www.cines.fr
Centre Informatique National de l’Enseignement Supérieur
22
1 CESSDA centre in France, Réseau Quetelet www.reseau-quetelet.cnrs.fr
Consortium of European Social Science Data Archives
23
1 CESSDA centre in France, Réseau Quetelet Membership and types of data
24
Additional roles and responsibilities
As an organization publish and maintain Preservation Policy support RDM (dataverse, brochure for DMP*) data archive certified As partner in Dutch association - RDNL actively develop a federated data infrastructure support RDM (training) build relationships with universities etc., ’front office-back office’ model[data supporters]
25
Additional roles and responsibilities
Participate in Dutch and international projects on Research Infrastructures surveydata.nl – longitudinal research EHRI – holocaust studies CARARE – visualization & presentation ARIADNE – metadata standardization EUDAT – European data services
26
Additional roles and responsibilities
Develop national e-depots for specific disciplines EDNA – archaeology DANS promotes and sustains the certification of repositories (‘e-depots’) as contribution to further development of TRUST in sustainable archiving and consultation of ‘data’.
27
http://www.dans.knaw.nl http://www.edna.nl https://easy.dans.knaw.nl
Retrospective archiving of datasets Ongoing archiving and publication of datasets ‘Grey’ literature scanning project National regulation in KNA (Quality Norm Archaeology) Embedding in DANS EASY Odyssee project CARARE project Ariadne project 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014+
28
2004-present NWO-funded Odyssee-project ( ) =>Project database for undeveloped research Improving accessibility through various projects Ongoing archiving and publication of datasets
29
1. CARARE plot (Europeana)
30
Challenges in sharing data
Technical challenges: standardisation, accessibility, discoverability, preservation and curation, data security, etc. Legal and ethical challenges Cultural challenges
31
Why not share? Those data are mine! Discredit my findings
Still analyzing the data I cannot trust the data produced somewhere else
32
Riding the Wave (10/2010) 1st recommendation: Develop an international framework for a Collaborative Data Infrastructure
33
Federated data-infrastructure in the Netherlands
34
Services in the FOBO model
Information and awareness raising on data management and curation Training (data experts at FOs and researchers) Storage (during and after the research)
35
Back office: roles and responsibilities
Focus on expertise and long term storage: Expertise and innovation in the area of data curation, data management and re-use of data Providing expertise to the research community: contributions to FO training courses for researchers Providing expertise to the front office: training courses for data experts, consultancy, contact persons Long term preservation of data in a trustworthy digital repository (certification; at least Data Seal of Approval)
36
Challenges Expanding the model over all universities (institutional agreements) Creating one single back office “desk” (RDNL) Creating a technical infrastructure for automatic data ingest Developing a business model to cover the costs
37
RDNL National and international funders are developing open data policies Researchers must be able to comply with these policies through awareness raising, RDM training, high quality and reliable storage facilities Federated data-infrastructure to enable stakeholders to collaborate, coordinate activities and divide roles and tasks based on their specific expertise (economies of scale, selection of data). Small country The total picture is far more complex: international and discipline specific infrastructures
38
Long-term preservation for a specific sub-discipline: surveydata.nl
39
EASY for long-term preservation
40
Layer 1 Search portal Dedicated Questasy servers Layer 2 Archive for long-term preservation Layer 3
41
The federated data infrastructure: a collaborative framework
Trust Data Curation User functions: data capture and transfer Data Generators Data Users Front offices: Local Data Facilities (University Libraries) Domain-Specific Research Infrastructures Community Support Services Back Offices: DANS, 3TU.Datacentrum, … Common Data Services: Archiving, Access, … Basic Technical Infrastructure: SURFsara, Target, … Common Data Services: Storage, Backups, …
42
Trust comes on foot, but leaves on horseback
Trust in research data Trust is at the very heart of storing and sharing data Trust involves: Data creators Data users Data repositories Funders It is clear that trust is at the very heart of storing and sharing research data. And this is so for a number of stakeholders, all for different reasons: The users of data from a digital repository have questions like: Have the data been preserved properly? Are they of high quality? Have they been changed in some way? Does the pointer get me to the right object? The depositors of data want to be sure that in the digital repositories their data are safe and remain accessible, usable and meaningful over time. Finally the funders, They want the reassurance that their investment in the production of valuable research data is no waisted but will remain also in the future; re-use of data will give them a higher return on their investment Trust comes on foot, but leaves on horseback
43
What is trust built on? Dedicate yourself (mission statement)
Do what you promise (stable, sincere and competent reputation) Be transparent (peer review, get certified)
44
The need for trusted digital repositories
A brief history 1996, Open Archival Information System - OAIS 2002, ISO 14721 Independent auditing was deemed necessary to certify OAIS-compliance and thus engender trust. 2003 development of OAIS auditing metrics begun 2007 Trustworthy Repositories Audit & Certification (TRAC) 2012 ISO for Trusted Digital Repository (TDR)
45
Certification Standards: Data Seal of Approval (DSA)
DANS initiative (2005/6) International Board 16 guidelines Self assessment Transparency 35 seals awarded since 2010 The research data: can be found on the Internet are accessible (clear rights and licenses) are in a usable format (interoperable) are reliable can be referred to (persistent identifier) There has long been a demand for some way to evaluate, to assess the quality of the services of a digital repository. Over the last few years a number of evaluation guidelines are becoming available. Slide DSA can be seen as an entry point which requires limited effort from the repositories Data producers are responsible for the quality of research data, repositories for storage and long-term access, and users for correct use of data
46
European Framework for Audit and Certification
3. Formal certification: DSA + full external audit and certification based on ISO 16363* or DIN 31644** 2. Extended certification: DSA + structured, externally reviewed and publicly available self-audit based on ISO 16363* or DIN 31644** 1. Basic certification: Data Seal of Approval (DSA) *ISO Audit and Certification of Trustworthy Digital Repositories **DIN Information and Documentation - Criteria for Trustworthy Digital Archives DSA can be used on it’s own but it is also the first step in a European framework being set up for Audit and certification. Because DSA has taken the DIN and ISO on board it can be used as a stepping stone towards more rigorous levels of audit and certification. It depends on the repository and their stakeholders which level of certification they would like to pursue. Simple and easy method to get into audits an certification of you repository and now let’s have a look at the online tool it self. Short presentation not live because the change of failure is proportional to the amount of people watching this presentation. I will be having a poster session later this afternoon and you are more then welcome to have a look at the live system.
47
Survey of 30 Institutions for Highest Priority in Data Policies
Policy Importance Integrity 217 Preservation 150 Access control 126 Provenance 108 Data Management plans 99 Publication 75 Replication 66 Data staging 52 Federation 37 Metadata sharing 23 Regulatory 16 Collection properties 7 Identifiers Data sharing Versioning Licensing 6 Format Data Life Cycle Arrangement 5 Processing Based on RDA Working Group: Practical Policy. Rainer Stotzka, Reagan Moore, Presentation for Plenary 3, March 2014 (
49
Thank you for your attention
Contact:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.