Preserving ETDs: Resources and Recommendations

Slides:



Advertisements
Similar presentations
IRs: towards preservation services Steve Hitchcock Preserv Project Intelligence Agents Multimedia Group, School of Electronics and Computer Science (ECS),
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Open Exeter Project Team
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Presenter: Karla Strieb Assistant Executive Director Transforming Research Libraries June 3, 2010 Supporting E-science: Progress at Research Institutions.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Digital Preservation: Store & Protect Laurie Sauer Information Technologies Librarian Knox College
13 September 2012 The Libraries’ Role in Research Data Management: A Case Study from the University of Minnesota Meghan Lafferty, Chemistry, Chemical Engineering,
Digital Preservation: Lessons learned through national action Digital Preservation Interoperability Framework Workshop April 2010.
Katherine Skinner Educopia Institute and MetaArchive Cooperative Matt Schultz Educopia Institute and MetaArchive Cooperative NDIIPP Partners Meeting Arlington,
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
1 Designing Storage Architecture for Digital Collections 2012.
Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
Katherine Skinner, Executive Director, Educopia Institute ESOPI 2013 Chapel Hill, NC April 19, 2013.
Session 3.  Now you know WHY to make policies and WHAT they should contain…  But HOW do you implement policies?  And then HOW do you implement a program.
May 2, 2013 An introduction to DSpace. Module 1 – An Introduction By the end of this module, you will … Understand what DSpace is, and what it can be.
Institute Repositories and Digital Preservation : Assessing Current Practices at Research Library Rathachai Chawuthai Information.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
Providing the ETDs of Today for the Researchers of Tomorrow Martin Halbert, Katherine Skinner, Matt Schultz 2012 CNI Fall Membership Meeting Washington,
NDSR Boston webinar: Digital Preservation Introduction Presenter: Nancy Y McGovern October 2015.
Katherine Skinner, Educopia Institute Emily Gore, Clemson University U.S. Workshop on Roadmap for Digital Preservation Interoperability Framework NIST,
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Katherine Skinner, Martin Halbert & Matt Schultz Educopia Institute and MetaArchive Cooperative NDSA Infrastructure Committee
DIGITAL PRESERVATION NETWORK DPLAfest 2016 Mary Molinaro DPN Chief Operating Officer.
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network aparsen.eu #APARSEN Options.
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Overview of E-Learning Authoring Software
Chapter 1 Computer Technology: Your Need to Know
Principles of Good Governance
Planning for Application Recovery
Chip German, Program Director, APTrust
Office 365 Security Assessment Workshop
Open Exeter Project Team
Managing Electronic Theses and Dissertations (ETD) Data
Managing Electronic Theses and Dissertations (ETD) Data
Trustworthiness of Preservation Systems
Taming the Wild Unstructured Data: The Shared Drive Jungle
Chapter 1- Introduction
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Frequently asked questions about software engineering
Managing ETDs with Associated Complex Digital Objects
There is no perfect file format
Module 6: Preparing for RDA ...
Introduction to Research Data Management
Storage Basic recommendations:
Gail McMillan Digital Library and Archives, Virginia Tech
As we reflect on policies and practices for expanding and improving early identification and early intervention for youth, I would like to tie together.
Research data preservation in Canada
Introduction to the PRISM Framework
Employee engagement Delivery guide
How Do I Write a Good Technology Plan?
Research Data Management
The MetaArchive Model: Distributed Digital Preservation Networks
Digital Library and Plan for Institutional Repository
Startup and future / Inge Rutsaert / dd
Implications of openly licenced resources for librarians
Documenting Personal and Community Stories Title slide Perry Collins
Successful Data Curation for Large Data Archives
Archiving and preservation services in the cloud
Lesson 3.2 Product Planning
Digital Library and Plan for Institutional Repository
Presentation transcript:

Preserving ETDs: Resources and Recommendations Hi. My name is Katherine Skinner, and I’m the ED of Educopia. Today, I’ll be sharing with you a brief introduction to a new resource that the Educopia Institute and its partners have produced in 2017, the ETD+ Toolkit. Katherine Skinner, PhD Executive Director, Educopia Institute

First, I want to know a bit about you… What content type(s) does your institution’s ETD program currently accept?    -Images: jpg, gif, tiff, png, ai, svg, ...    -Video: mpeg, m2tvs, flv, dv, ...    -GIS: kml, dxf, shp, tiff, ...    -CAD: dxf, dwg, pdf, …    -Data: csv, mdf, fp, spv, xlx, tsv, ...    -Text: txt, rtf, tvi, doc, pdf… e.g.,

First, I want to know a bit about you… What content type(s) does your institution’s ETD program currently accept? What content types does your institution’s digital preservation program currently support?    -Images: jpg, gif, tiff, png, ai, svg, ...    -Video: mpeg, m2tvs, flv, dv, ...    -GIS: kml, dxf, shp, tiff, ...    -CAD: dxf, dwg, pdf, …    -Data: csv, mdf, fp, spv, xlx, tsv, ...    -Text: txt, rtf, tvi, doc, pdf… e.g.,

Threats to digital content include: -Storage failure -Hardware/software failure -Application software failure -Format obsolescence -Legal encumbrance -Human error -Malicious attack (e.g., hacking) -Natural disaster -Loss of access to the software needed to render the file -Loss of institutional commitment -Lack of versioning control (which file is the “final” file?) From: http://www.dataone.org/sites/all/documents/L01_DataManagement.pptx  

What can you do to offset these? Identify what content you have that might be valuable to you later Manage that content deliberately to ensure its longevity Store that content in safe locations where you can reach it Use  tools and services to help you protect it

Fixity Checking Geographic Replication Digital Preservation Ingest Format Validation Audit Storage Fixity Checking Geographic Replication Access Repair Data Wrangling Metadata Testing Trust Rights (etc...) “The series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” - Digital Preservation Coalition Digital preservation requires persistence. Ongoing interventions. Constant care. Bits are fragile.   Digital preservation is not about backing up files. It’s not about storing things in the cloud and walking away. It’s about a few key things, all of which make for very real challenges acquiring content—possessing a digital object so that you can stabilize and curate it obtaining the rights to maintain that content, long-term, by whatever means is required (including making lots of copies and distributing them widely) establishing funding to maintain the content over time continually assessing and offsetting risks of loss—from malice, accident, acts of nature, acts of war, bit-rot, storage failures, obsolescence…etc At the end of the day Digital Preservation really is about a series of managed activities, as DPC stated years ago. Because technologies will always evolve, and new challenges will emerge, digital preservation is about acquiring and maintaining the necessary knowledge and skills to make wise decisions about how best to direct resources toward providing long-term access for digital materials in their changing forms. NAUSEATING DETAIL.

Distributed Digital Preservation DDP emphasizes the importance of such factors as content replication, independence, and coordination for ensuring the longevity of digital objects Key features: geographic distribution infrastructure heterogeneity organizational diversity Key: geographic distribution, infrastructure heterogeneity, organizational diversity Taking it one step further—most practitioners in the field now recommend Distributed Digital Preservation approaches rather than centralized approaches, especially for preservation storage. What that means is that we replicate content—make copies of it. We make sure each copy is independent of the other copies, that each copy is under different control. In other words, let’s not make lots of copies and then have the same systems administrator in charge of each. That makes it too easy to corrupt the content. And finally, we coordinate the copies in order to ensure that we know if any of the copies change over time, whether that change is deliberate or accidental, malicious or happenstance. Big issues here, as per ScholarsPortal conversation. Issues extend to organizational ones—”difficult” but it’s just swapping out drives… Skinner 2013

Selected Preservation Storage Options Available Today Comparison Chart: Selected Preservation Storage Options Available Today Name Copies Location Distribution 10TB/yr estimate Structure Storage Ownership APTrust 3 Amazon ? $20,000 Consortium Outsourced Arkivum UK sites UK $40,000 Service Unknown Chronopolis US sites US $10,000 + Member owned and controlled DPN 3+ DuraCloud, HathiTrust, SDSC, TDL, SDR, APTrust $47,500 Mixed DuraCloud 2 Amazon, SDSC, Rackspace $17,550 MetaArchive 7 International libraries International $14,500 Preservica $31,950 Let’s look at this for a moment. Right now, there are at least 7 options available broadly to those in the market for “Preservation Storage”. Each bundles a set of services; at the minimum, each of these provides bit-level storage, including replication, monitoring. These services include a range of approaches and with them, a range of risks. Some make two copies; others make 7. Some store content only in one commercial cloud (Amazon); others embed storage infrastructure in libraries and archives Some distribute copies in one nation; others distribute them internationally They vary widely in pricing, and note that the costs don’t always align with what you get for the amount you’re spending. In other words, spending more doesn’t buy you more or better preservation in any discernable way. It does buy you different company. Some are structured as services—where in essence, you are a customer. Others have formed deliberately as communities and consortia where members have a vested stake in the work performed and have a governance voice over the infrastructure, pricing, and other factors. Some of these groups use storage options that are outsourced and ultimately out-of-their control; others are member owned and controlled. I am mystified—sincerely mystified—every time I look at this chart or update it. The choices that are available in today’s market are uneven at best, and many of the most successful are also those that are the most outsourced and the least under library/archives control. NOTE: Rosetta (Ex Libris) and Digital Archive (OCLC) are additional services that we didn’t include here because they lack transparency about all of these factors.

Selected Preservation Storage Options Available Today Comparison Chart: Selected Preservation Storage Options Available Today Name Copies Location Distribution 10TB/yr estimate Structure Storage Ownership APTrust 3 Amazon ? $20,000 Consortium Outsourced Arkivum UK sites UK $40,000 Service Unknown Chronopolis US sites US $10,000 + Member owned and controlled DPN 3+ DuraCloud, HathiTrust, SDSC, TDL, SDR, APTrust $47,500 Mixed DuraCloud 2 Amazon, SDSC, Rackspace $17,550 MetaArchive 7 International libraries International $14,500 Preservica $31,950 Let’s look at this for a moment. Right now, there are at least 7 options available broadly to those in the market for “Preservation Storage”. Each bundles a set of services; at the minimum, each of these provides bit-level storage, including replication, monitoring. These services include a range of approaches and with them, a range of risks. Some make two copies; others make 7. Some store content only in one commercial cloud (Amazon); others embed storage infrastructure in libraries and archives Some distribute copies in one nation; others distribute them internationally They vary widely in pricing, and note that the costs don’t always align with what you get for the amount you’re spending. In other words, spending more doesn’t buy you more or better preservation in any discernable way. It does buy you different company. Some are structured as services—where in essence, you are a customer. Others have formed deliberately as communities and consortia where members have a vested stake in the work performed and have a governance voice over the infrastructure, pricing, and other factors. Some of these groups use storage options that are outsourced and ultimately out-of-their control; others are member owned and controlled. I am mystified—sincerely mystified—every time I look at this chart or update it. The choices that are available in today’s market are uneven at best, and many of the most successful are also those that are the most outsourced and the least under library/archives control. NOTE: Rosetta (Ex Libris) and Digital Archive (OCLC) are additional services that we didn’t include here because they lack transparency about all of these factors.

ETD + Research 2014-17 research data software code audio-video files digital text digital art visualizations GIS datasets The ETD+ Toolkit is the result of a project funded by the Institute of Museum and Library Services. Educopia Institute led the creation of the Toolkit in partnership with the NDLTD, ProQuest and 12 U.S. research libraries. Its purpose is to train students to manage their research outputs, including data, software code, audio video files, digital ltext, digital art, visualizations, and GIS datasets

Students report non-PDF files like research data, video, digital art, and software code are either as important or more important than those submitted as PDFs to satisfy degree requirements. Our project team surveyed nearly 800 students and more than 30 faculty/staff on nine university campuses in 2014 to better understand the gaps between what students are producing and submitting in their research processes. We also sought to better understand what students know and need to know about long-term file management practices that can help them ensure that their research files remain usable later in their careers. We found that students believe their non-PDF files—including research data, video, digital art, and software code—are either as important or more important than those that they submit as PDFs to satisfy their degree requirements.

Fully 80% of 795 students report they will produce non-text files in their dissertation or thesis research, including: Tabular data (43%) Digital images (38%) Software code (29%) Digital text (28%) We also found that 80% of these respondents plan to produce non-text files in their research, including such forms as tabular data and software code.

The ETD+ Toolkit helps the academic community to train students to ensure the longevity and accessibility of their research outputs. Based on our findings, we designed this Toolkit to: Help students make sure that their research outputs are stored and maintained in durable formats and on durable devices; Help students make informed decisions about file formats, documentation, and rights. The Toolkit also contains resources to help administrators better understand the digital research outputs students are creating and assess what they need to collect and care for as part of the institutional memory.

What is the Toolkit An open set of six modules and evaluation instruments that prepare students to create, store, and maintain their research outputs. So, what is the toolkit? It’s an open set of … Each is designed to stand alone; they may also be used as a series.

MODULE 2: DATA ORGANIZATION MODULE 1: COPYRIGHT How can students gain appropriate permissions and how can students signal copyright for their own works? MODULE 2: DATA ORGANIZATION How can students structure, describe, store, and deposit data and research files for reuse and/or future access? MODULE 3: FILE FORMATS How will the formats students choose make future access to their research easier or more difficult? MODULE 4: METADATA How can students store information describing their files to make sure they can tell what they are in the future? MODULE 5: STORAGE How can students make well informed choices about where to store their research materials? MODULE 6: VERSION CONTROL What mechanisms can students use to make it easier to see the history of a file with multiple versions? As you can see on this slide, the modules cover a wide range of practices. These modules are introductory in nature; they present a concise set of information that can be covered by a one-hour workshop, and then they also provide a lot of “jumping off points” to deeper materials and resources that students may consult following that workshop.

Each module includes: Learning Objectives One-page Handout Guidance Brief (customizable) Slideshow with presenter notes Evaluation survey

Anyone may freely adopt and adapt this toolkit. Who can use the Toolkit? Anyone may freely adopt and adapt this toolkit.   http://educopia.org/etdplustoolkit We especially recommend its use by administrators, faculty, and librarians teaching students and by students seeking practical advice about digital content management. You can get started by going to the URL http://educopia.org/publications/etdplustoolkit

ETDplus team: Educopia Institute Oregon State University MetaArchive Cooperative Penn State University NDLTD Purdue University ProQuest University of Louisville Carnegie Mellon University UNC School of Library and Information Science Colorado State University University of North Texas HBCU Library Alliance University of Tennessee - Knoxville Indiana State University Virginia Tech University First, we should mention the authors. These Guidance Briefs have been created by a team of 12 universities in partnership with the Educopia Institute, the MetaArchive Cooperative, the Networked Digital Library of Theses and Dissertations, and ProQuest in order to support student researchers of all disciplines learn about the management of complex digital objects at the beginning of their careers. We knew that a range of efforts have been launched in the last five years to train and prepare data scientists to manage, share, and ensure the sustainability of their data outputs, including DataONE’s webinars and education modules, the DMPTool Data Management General Guidance, and the Virginia-based Data Management Bootcamp for Graduate Students. These data science resources are connecting researchers with the different university-based units that can help support and sustain their data outputs (e.g., IT, library, Offices of Sponsored Research, etc.).   Beyond data science, however, we could locate few attempts to address the training needs of the broad array of researchers—humanities, social science, arts, and sciences alike—who increasingly need to manage complex digital objects. We determined that this was a critical area that needed improvement. We also determined that this need dovetailed with the need of ETD programs to provide a strong foundation for replication of research findings in ETDs. Or to say that another way, ETDs are a core output of the university. They’re also a key training mechanism for students. Part of what the ETD process should produce is a record of research that enables it to be validated and replicated. In many cases, a text-based PDF will not do that—many “theses and dissertations” are not text-based in reality—instead, they are often better represented in a range of other formats: e.g., video/audio files of performances datasets from experiments computer programs GIS-based visualizations

Questions? Katherine Skinner katherine@educopia.org @educopia