Research Outputs Management in a Nutshell

Slides:



Advertisements
Similar presentations
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
Advertisements

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Who is doing a good job in digital preservation? Audit and Certification of Digital Repositories: ISO and the European Framework.
Release & Deployment ITIL Version 3
AICT5 – eProject Project Planning for ICT. Process Centre receives Scenario Group Work Scenario on website in October Assessment Window Individual Work.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Integrating Digital Curation in a Digital Library curriculum: the International Master DILL case study Anna Maria Tammaro University of Parma Florence,
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits crib sheet Graham Parton With many thanks to Dr.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Fedora and the Preservation of University Electronic Records Project NHPRC Electronic Records Research Grant Kevin L. Glick Manuscripts and Archives, Yale.
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
11 Researcher practice in data management Margaret Henty.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Because good research needs good data The DCC lifecycle model, Exeter Uni, May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
CESSDA SaW Training on Trust, Identifying Demand & Networking
Metadata Issues in Long-term Management of Data and Metadata
Ingest and Dissemination with DAITSS
GISELA & CHAIN Workshop Digital Cultural Heritage Network
OAIS Producer (archive) Consumer Management
Project: Improving accessibility of digitally created archives
Integrated Management System and Certification
Trustworthiness of Preservation Systems
Exercise: understanding authenticity evidence
Summit 2017 Breakout Group 2: Data Management (DM)
Software Configuration Management
ServiceNow Implementation Knowledge Management
Statewide Digitization and the FCLA Digital Archive
Active Data Management in Space 20m DG
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Exercise: understanding authenticity evidence
Sophia Lafferty-hess | research data manager
Engineering Processes
EOSCpilot Skills Landscape & Framework
Metadata for research outputs management
2. An overview of SDMX (What is SDMX? Part I)
Research Data Management
Research data preservation in Canada
Metadata in Digital Preservation: Setting the Scene
An Open Archival Repository System for UT Austin
Open Archival Information System
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Robin Dale RLG OAIS Functionality Robin Dale RLG
AICT5 – eProject Project Planning for ICT
The Reference Model for an Open Archival Information System (OAIS)
Policy Frameworks: building a firm foundation for your IR
GSBPM AND ISO AS QUALITY MANAGEMENT SYSTEM TOOLS: AZERBAIJAN EXPERIENCE Yusif Yusifov, Deputy Chairman of the State Statistical Committee of the Republic.
Research data lifecycle²
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

Research Outputs Management in a Nutshell Anna Maria Tammaro, University of Parma Joy Davidson, University of Glasgow Tomasz Miksa, Technical University of Vienna ROMOR Basic Training Workshop/ 06/09/2017

Learning objectives This introductory session will provide an overview of reference models that will help to refine the range of services and infrastructure required to effectively manage and preserve access to research outputs. Key terms will be explained and contextualised to provide a solid foundation for the remainder of the workshop. After this session participants will: understand what digital curation is and how it relates to research outputs management be familiar with the curation lifecycle model be familiar with the OAIS reference model and how it can be applied to the design of open access information repositories and related services be able to communicate more consistently by ensuring a shared understanding of key terms

Structure of session Welcome, practical information, objectives for the workshop (Parma) Overview of digital curation lifecycle model (Glasgow) Overview of Open Archival Information System reference model (Parma) Jargon busting session (Vienna) Open to questions from participants (all)

Introduction to the Digital Curation Lifecycle Model Joy Davidson ROMOR Basic Training Workshop/ 06/09/2017

Digital curation is “maintaining and adding value to a trusted body of digital information for current and future use; specifically… the active management and appraisal of data over the lifecycle of scholarly and scientific materials” http://www.dcc.ac.uk

Why do we need to care about digital curation and preservation? Better access to higher quality outputs for all (public good) Increased confidence in published findings (validation and reproducibility) Improved visibility of the research and reputation of the researcher – higher research standing Improves efficiency (not collecting same data multiple times) Enables novel insights to be derived from existing outputs (data driven research and innovation) Protects data again technical obsolescence

The DCC Curation Lifecycle Model provides a graphical high level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt. The model enables: mapping of granular functionality definition of roles and responsibilities building frameworks of standards and technologies to implement identification of additional steps required identification of actions which are not required ensuring adequate documentation of processes and policies

All stages Object oriented model – all lifecycle stages centre on the research output. Actions over the entire lifecycle include: Capturing context Preservation planning Community watch Description Information (Metadata) persistently identifies data and maintains reliable links to them clearly describes what they are clearly identifies technical information needed to use data identifies who is responsible for their management and preservation describes what can be done to them describes what is needed to represent them at the required level of fidelity records their history and documents their authenticity allows users to understand their context and relationship to other objects Representation Information Structure Information: describes the format and data structure concepts to be applied to the bitstream, which result in more meaningful values like characters or number of pixels. Semantic Information: this is needed on top of the structure information. If the digital object is interpreted by the structure information as a sequence of text characters, the semantic information should include details of which language is being expressed. Other Representation Information: includes information about relevant software, hardware and storage media, encryption or compression algorithms, and printed documentation. Preservation planning: is a set of managed activities aims at ensuring the bit-stream is maintained aims at ensuring that data are accessible is concerned with maintaining bit streams and ensuring accessibility for a definable period of time Community watch and participation: access to a wider range of expertise access to tools and systems that might otherwise be unavailable shared influence on R&D of standards and practices attraction of resources and other support for well-coordinated programmes at a regional, national or sectoral level better planning to reduce wasted effort shared development costs shared learning opportunities encouragement for other stakeholders to take preservation seriously

Conceptualise stage Activities during this stage include conceiving and planning for the creation of outputs, including capture methods and storage options. It is, put simply, planning research with digital curation processes and outcomes in mind. Most funders now require a data management plan at the grant application stage illustrating that researchers have considered these issues before any research is undertaken. Be aware that decisions made now can affect the entire lifecycle (ethics, legal, contracts). Conceptualise - plan with digital curation in mind produce first iteration of data management plan develop robust workflow, processes and documentation choose appropriate, existing open standards - interoperability capture and store data in curation-friendly file formats (open source) record sufficient information during data capture to assist with ongoing use scrupulously identify files store data on appropriate media identify a safe place for storage (e.g. a trusted archive) and make sure that archive will take your data identify access methods identify legal framework 9

Create or receive stage During this stage try to capture context as you create outputs (administrative, descriptive, structural and technical, preservation metadata). Use the FAIR principles as a guideline (findable, accessible, interoperable, reusable). If reusing third party outputs, be ware of any licensing restrictions. Remember to document any outputs that may be necessary to validate published findings later! This could include underlying data, models, software and code, algorithms. Create or Receive – ensure data are curation ready of high quality well structured adequately documented interoperable authentic (it is what it claims to be) accurate (it hasn’t been tampered with) renderable (it can be used in the ways for which it was intended, or viewed as originally intended) in a form that best ensures its longevity 10

Appraise and select stage At this point in the lifecycle, you’ll evaluate the outputs you will need to keep and for how long they need to be preserved. Decisions will be based on institutional and funder policies as well as legal and ethical requirements. Consider how selected outputs will be checked for quality and completeness. Who will be responsible for this (researcher, repository, both?) Appraise and Select – develop robust policies How long do we want to keep the data? in terms of changes of technology in terms an organisation’s business requirements in terms of user requirements (e.g. as evidence to verify conclusions derived from research). How long do we need to keep the data? assess benefits and risks of keeping/not keeping data what are the consequences of not keeping the data? how much would it cost to recreate it in the future? is it even possible to recreate it in the future? Re-appraisal and disposal Typically data may be transferred to another archive, repository, data centre or other custodian. In some instances data is destroyed. The data’s nature may, for legal reasons, necessitate secure destruction 11

Ingest stage At this point in the lifecycle, selected outputs will be deposited to a digital repository. transferred to a suitable OAIR, data centre or other custodian. Consider any normalisation actions that are performed (e.g., Excel to PDF) and potential usability issues. Consider how you will maintain links between outputs generated in a given project deposited in multiple systems (publications, data, software).

Preservation and Storage stages Undertake actions to ensure long-term preservation and retention of the authoritative nature of data. Preservation actions should ensure that data remains authentic, reliable and usable while maintaining its integrity. Value added actions include data cleaning, validation, assigning preservation metadata, assigning representation information and ensuring acceptable data structures or file formats for reuse. Store Store the data in a secure manner adhering to relevant standards. Preservation Action Undertake actions to ensure long-term preservation and retention of the authoritative nature of data. Preservation actions should ensure that data remains authentic, reliable and usable while maintaining its integrity. Actions include data cleaning, validation, assigning preservation metadata, assigning representation information and ensuring acceptable data structures or file formats. Information managers use ongoing actions such as Preservation Planning and Community Watch to identify when actions need to be taken on data stored in the stable repository environment. They may also choose to make derivatives of data. Migrate – for preservation storage Migrate data to a different format. This may be done to accord with the storage environment or to ensure the data’s immunity from hardware or software obsolescence. Reappraise Return data which fails validation procedures for further appraisal and reselection. Dispose

Access and reuse stage At this point in the lifecycle, the outputs are made accessible for others to reuse. Consider what context someone would need to use the output. In some cases, restrictions on access are needed (e.g., to protect commercially sensitive or personal information) and appropriate access controls will need to be considered (dealing with access requests, processes). Provide citation guidelines for outputs and consider how usage metrics will be collected. Access, Use and Reuse Ensure that data is accessible to both designated users and re-users, on a day-to-day basis. This may be in the form of publicly available published information. Robust access controls and authentication procedures may be applicable. The original project team has been using their data throughout the curation process and can continue to do so after the project has finished. They perform analyses and publish an article based on their findings. They also opens up access to other researchers who can make different uses of the data. This data is of most value when integrated with other research done in the field, so steps are taken to use this data to augment existing datasets.

Transform stage Accessed outputs can be integrated with other data to facilitate new analyses. These produce derivative research outputs: conclusions that could only be drawn from an amalgamation of existing outputs which feed back into the Conceptualise stage of the lifecycle model. Consider how you might track such transformations over time (impact). Create new data from the original, for example by migration into a different format or by creating a subset, by selection or query, to create newly derived results, perhaps for publication. After being integrated with other data, new analyses and techniques are applied to the data by researchers in the same field and some interdisciplinary studies. These produce derivative data: conclusions that could only be drawn from an amalgamation of data from multiple projects, as well as transformations of data items (for example, new visualisations or ‘enhancements’) which feed back into the Create stage of the Lifecycle.

Open Archival Information System ROMOR Workshop 5 September 2017

OAIS International Standard OAIS is an international standard first published in 2003 as ISO 14721:2003. Space data and information transfer systems -- Open archival information system (OAIS) -- Reference model In 2012, ISO issued a revised version of the Reference Model as ISO 14721:2012

The OAIS is a ‘reference model’ What is a reference model? A framework for understanding relationships among entities of its domain for development of consistent standards or specifications supporting the domain A conceptual model which is based on a small number of underlying concepts describes key concepts and relationships, and how they interface to each other and the external environment may be used as a basis for explaining domain specific concepts to non-specialists OAIS is NOT an implementation model!

Purpose, Scope, and Applicability Primary goal of an OAIS: to preserve information for a designated community over an indefinite period of time The standard is a framework for understanding and applying concepts necessary for long-term preservation of digital objects (of any sort) Note, here long-term means long enough to be concerned about technological change Addresses the full range of archival functions Actually applicable to all sorts of organisations and individuals dealing with information that needs long-term preservation (not just ‘archives’)

What is OAIS about? Defining an OAIS: “An archive, consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community” ‘Designated Community’ is a singularly important concept within the OAIS model and is defined as the community of stakeholders and users that the OAIS serves The Reference Model is a high-level description of the Environment Functional model Information model of an OAIS

OAIS is NOT an implementation model!

OAIS environment model Management Producer OAIS (Archive) Designated Community

OAIS environment the key external entities with which the archive interacts in the course of carrying out its operations. Producer is the role played by those persons, or client systems, who provide the information to be preserved Management is the role played by those who set overall OAIS policy as one component in a broader policy domain (not day-to-day administration) Consumer is the role played by those persons, or client systems, who interact with OAIS services to find and acquire preserved information of interest OAIS Environment: the key external entities with which the archive interacts in the course of carrying out its operations. OAIS: this is just the archive itself; we’ll talk about this in detail in a moment MANAGEMENT: entity that sets the broad policies for the archive; e.g.: scope of archive’s content; likely provides funding; might serve an oversight function, periodically reviewing archive’s policies and performance. Does not manage day-to- day operations of archive. PRODUCER: entity (or entities) that submits content to the archive to be preserved. Could be an individual, an organization, or even a computer system or machine (or combination). CONSUMER: entity that utilizes the content preserved in the archive. Again, could be a person, organization, or machine (even another archive). Special class of Consumer: DESIGNATED COMMUNITY: class of Consumers who are expected to independently understand information in the form it is preserved in the archive. Independently understand: without requiring additional expertise/assistance in interpretation. Content: scientific data; individuals with a certain level of scientific expertise Content Java source code; persons skilled in Java Programming Scope of Designated Community has important implications for the amount of metadata that must be packaged with the archived content to ensure that it remains meaningful over the long-term. More on this later when we talk about the OAIS information model. Note that the scope of the Designated Community is not necessarily static: could change over time. Return to the OAIS itself …

OAIS Functional Model

OAIS Functions (1) Ingest: services and functions that accept SIPs from Producers; prepares AIPs for storage, and ensures that AIPs and their supporting Descriptive Information become established within the OAIS Archival Storage: services and functions used for the storage and retrieval of AIPs Data Management: services and functions for populating, maintaining, and accessing a wide variety of information related services visible to Consumer

OAIS Functions (2) Administration: services and functions needed to control the operation of the other OAIS functional entities on a day-to- day basis Preservation Planning: services and functions for monitoring the OAIS environment and ensuring that content remains accessible to the Designated Community Access: services and functions which make the archival information holdings and related services visible to Consumer

OAIS Information Model OAIS provides an information model for managing the digital materials as they pass through the system. The basis of this model is the Information Package (IP). An IP consists of: The digital object(s) to be preserved. The metadata required at that point in the system. The Packaging Information which relates 1 and 2. OAIS specifies three types of Information Package: Submission Information Packages (SIPs). Archival Information Packages (AIPs). Dissemination Information Packages (DIPs).

Information Package model

OAIS information packages SIP: the package that is sent to an OAIS by a Producer. Its form and detailed content is typically negotiated between the Producer and the OAIS AIP: the package that is actually preserved in the OAIS. May be made up of more than one SIP, or one SIP may produce several AIPs DIP: what is delivered to the consumers. It is not necessarily the same as the AIP but will be derived from it in some way

QUESTIONS