Download presentation
Presentation is loading. Please wait.
1
Preserving ETDs: Resources and Recommendations
Hi. My name is Katherine Skinner, and I’m the ED of Educopia. Today, I’ll be sharing with you a brief introduction to a new resource that the Educopia Institute and its partners have produced in 2017, the ETD+ Toolkit. Katherine Skinner, PhD Executive Director, Educopia Institute
2
First, I want to know a bit about you…
What content type(s) does your institution’s ETD program currently accept? -Images: jpg, gif, tiff, png, ai, svg, ... -Video: mpeg, m2tvs, flv, dv, ... -GIS: kml, dxf, shp, tiff, ... -CAD: dxf, dwg, pdf, … -Data: csv, mdf, fp, spv, xlx, tsv, ... -Text: txt, rtf, tvi, doc, pdf… e.g.,
3
First, I want to know a bit about you…
What content type(s) does your institution’s ETD program currently accept? What content types does your institution’s digital preservation program currently support? -Images: jpg, gif, tiff, png, ai, svg, ... -Video: mpeg, m2tvs, flv, dv, ... -GIS: kml, dxf, shp, tiff, ... -CAD: dxf, dwg, pdf, … -Data: csv, mdf, fp, spv, xlx, tsv, ... -Text: txt, rtf, tvi, doc, pdf… e.g.,
4
Threats to digital content include:
-Storage failure Hardware/software failure Application software failure -Format obsolescence -Legal encumbrance -Human error -Malicious attack (e.g., hacking) -Natural disaster -Loss of access to the software needed to render the file -Loss of institutional commitment -Lack of versioning control (which file is the “final” file?) From:
5
What can you do to offset these?
Identify what content you have that might be valuable to you later Manage that content deliberately to ensure its longevity Store that content in safe locations where you can reach it Use tools and services to help you protect it
6
Fixity Checking Geographic Replication
Digital Preservation Ingest Format Validation Audit Storage Fixity Checking Geographic Replication Access Repair Data Wrangling Metadata Testing Trust Rights (etc...) “The series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” - Digital Preservation Coalition Digital preservation requires persistence. Ongoing interventions. Constant care. Bits are fragile. Digital preservation is not about backing up files. It’s not about storing things in the cloud and walking away. It’s about a few key things, all of which make for very real challenges acquiring content—possessing a digital object so that you can stabilize and curate it obtaining the rights to maintain that content, long-term, by whatever means is required (including making lots of copies and distributing them widely) establishing funding to maintain the content over time continually assessing and offsetting risks of loss—from malice, accident, acts of nature, acts of war, bit-rot, storage failures, obsolescence…etc At the end of the day Digital Preservation really is about a series of managed activities, as DPC stated years ago. Because technologies will always evolve, and new challenges will emerge, digital preservation is about acquiring and maintaining the necessary knowledge and skills to make wise decisions about how best to direct resources toward providing long-term access for digital materials in their changing forms. NAUSEATING DETAIL.
7
Distributed Digital Preservation
DDP emphasizes the importance of such factors as content replication, independence, and coordination for ensuring the longevity of digital objects Key features: geographic distribution infrastructure heterogeneity organizational diversity Key: geographic distribution, infrastructure heterogeneity, organizational diversity Taking it one step further—most practitioners in the field now recommend Distributed Digital Preservation approaches rather than centralized approaches, especially for preservation storage. What that means is that we replicate content—make copies of it. We make sure each copy is independent of the other copies, that each copy is under different control. In other words, let’s not make lots of copies and then have the same systems administrator in charge of each. That makes it too easy to corrupt the content. And finally, we coordinate the copies in order to ensure that we know if any of the copies change over time, whether that change is deliberate or accidental, malicious or happenstance. Big issues here, as per ScholarsPortal conversation. Issues extend to organizational ones—”difficult” but it’s just swapping out drives… Skinner 2013
8
Selected Preservation Storage Options Available Today
Comparison Chart: Selected Preservation Storage Options Available Today Name Copies Location Distribution 10TB/yr estimate Structure Storage Ownership APTrust 3 Amazon ? $20,000 Consortium Outsourced Arkivum UK sites UK $40,000 Service Unknown Chronopolis US sites US $10,000 + Member owned and controlled DPN 3+ DuraCloud, HathiTrust, SDSC, TDL, SDR, APTrust $47,500 Mixed DuraCloud 2 Amazon, SDSC, Rackspace $17,550 MetaArchive 7 International libraries International $14,500 Preservica $31,950 Let’s look at this for a moment. Right now, there are at least 7 options available broadly to those in the market for “Preservation Storage”. Each bundles a set of services; at the minimum, each of these provides bit-level storage, including replication, monitoring. These services include a range of approaches and with them, a range of risks. Some make two copies; others make 7. Some store content only in one commercial cloud (Amazon); others embed storage infrastructure in libraries and archives Some distribute copies in one nation; others distribute them internationally They vary widely in pricing, and note that the costs don’t always align with what you get for the amount you’re spending. In other words, spending more doesn’t buy you more or better preservation in any discernable way. It does buy you different company. Some are structured as services—where in essence, you are a customer. Others have formed deliberately as communities and consortia where members have a vested stake in the work performed and have a governance voice over the infrastructure, pricing, and other factors. Some of these groups use storage options that are outsourced and ultimately out-of-their control; others are member owned and controlled. I am mystified—sincerely mystified—every time I look at this chart or update it. The choices that are available in today’s market are uneven at best, and many of the most successful are also those that are the most outsourced and the least under library/archives control. NOTE: Rosetta (Ex Libris) and Digital Archive (OCLC) are additional services that we didn’t include here because they lack transparency about all of these factors.
9
Selected Preservation Storage Options Available Today
Comparison Chart: Selected Preservation Storage Options Available Today Name Copies Location Distribution 10TB/yr estimate Structure Storage Ownership APTrust 3 Amazon ? $20,000 Consortium Outsourced Arkivum UK sites UK $40,000 Service Unknown Chronopolis US sites US $10,000 + Member owned and controlled DPN 3+ DuraCloud, HathiTrust, SDSC, TDL, SDR, APTrust $47,500 Mixed DuraCloud 2 Amazon, SDSC, Rackspace $17,550 MetaArchive 7 International libraries International $14,500 Preservica $31,950 Let’s look at this for a moment. Right now, there are at least 7 options available broadly to those in the market for “Preservation Storage”. Each bundles a set of services; at the minimum, each of these provides bit-level storage, including replication, monitoring. These services include a range of approaches and with them, a range of risks. Some make two copies; others make 7. Some store content only in one commercial cloud (Amazon); others embed storage infrastructure in libraries and archives Some distribute copies in one nation; others distribute them internationally They vary widely in pricing, and note that the costs don’t always align with what you get for the amount you’re spending. In other words, spending more doesn’t buy you more or better preservation in any discernable way. It does buy you different company. Some are structured as services—where in essence, you are a customer. Others have formed deliberately as communities and consortia where members have a vested stake in the work performed and have a governance voice over the infrastructure, pricing, and other factors. Some of these groups use storage options that are outsourced and ultimately out-of-their control; others are member owned and controlled. I am mystified—sincerely mystified—every time I look at this chart or update it. The choices that are available in today’s market are uneven at best, and many of the most successful are also those that are the most outsourced and the least under library/archives control. NOTE: Rosetta (Ex Libris) and Digital Archive (OCLC) are additional services that we didn’t include here because they lack transparency about all of these factors.
10
ETD + Research 2014-17 research data software code audio-video files
digital text digital art visualizations GIS datasets The ETD+ Toolkit is the result of a project funded by the Institute of Museum and Library Services. Educopia Institute led the creation of the Toolkit in partnership with the NDLTD, ProQuest and 12 U.S. research libraries. Its purpose is to train students to manage their research outputs, including data, software code, audio video files, digital ltext, digital art, visualizations, and GIS datasets
11
Students report non-PDF files like research data, video, digital art, and software code are either as important or more important than those submitted as PDFs to satisfy degree requirements. Our project team surveyed nearly 800 students and more than 30 faculty/staff on nine university campuses in 2014 to better understand the gaps between what students are producing and submitting in their research processes. We also sought to better understand what students know and need to know about long-term file management practices that can help them ensure that their research files remain usable later in their careers. We found that students believe their non-PDF files—including research data, video, digital art, and software code—are either as important or more important than those that they submit as PDFs to satisfy their degree requirements.
12
Fully 80% of 795 students report they will produce non-text files in their dissertation or thesis research, including: Tabular data (43%) Digital images (38%) Software code (29%) Digital text (28%) We also found that 80% of these respondents plan to produce non-text files in their research, including such forms as tabular data and software code.
13
The ETD+ Toolkit helps the academic community to train students to ensure the longevity and accessibility of their research outputs. Based on our findings, we designed this Toolkit to: Help students make sure that their research outputs are stored and maintained in durable formats and on durable devices; Help students make informed decisions about file formats, documentation, and rights. The Toolkit also contains resources to help administrators better understand the digital research outputs students are creating and assess what they need to collect and care for as part of the institutional memory.
14
What is the Toolkit An open set of six modules and evaluation instruments that prepare students to create, store, and maintain their research outputs. So, what is the toolkit? It’s an open set of … Each is designed to stand alone; they may also be used as a series.
15
MODULE 2: DATA ORGANIZATION
MODULE 1: COPYRIGHT How can students gain appropriate permissions and how can students signal copyright for their own works? MODULE 2: DATA ORGANIZATION How can students structure, describe, store, and deposit data and research files for reuse and/or future access? MODULE 3: FILE FORMATS How will the formats students choose make future access to their research easier or more difficult? MODULE 4: METADATA How can students store information describing their files to make sure they can tell what they are in the future? MODULE 5: STORAGE How can students make well informed choices about where to store their research materials? MODULE 6: VERSION CONTROL What mechanisms can students use to make it easier to see the history of a file with multiple versions? As you can see on this slide, the modules cover a wide range of practices. These modules are introductory in nature; they present a concise set of information that can be covered by a one-hour workshop, and then they also provide a lot of “jumping off points” to deeper materials and resources that students may consult following that workshop.
16
Each module includes: Learning Objectives One-page Handout Guidance Brief (customizable) Slideshow with presenter notes Evaluation survey
17
Anyone may freely adopt and adapt this toolkit.
Who can use the Toolkit? Anyone may freely adopt and adapt this toolkit. We especially recommend its use by administrators, faculty, and librarians teaching students and by students seeking practical advice about digital content management. You can get started by going to the URL
18
ETDplus team: Educopia Institute Oregon State University
MetaArchive Cooperative Penn State University NDLTD Purdue University ProQuest University of Louisville Carnegie Mellon University UNC School of Library and Information Science Colorado State University University of North Texas HBCU Library Alliance University of Tennessee - Knoxville Indiana State University Virginia Tech University First, we should mention the authors. These Guidance Briefs have been created by a team of 12 universities in partnership with the Educopia Institute, the MetaArchive Cooperative, the Networked Digital Library of Theses and Dissertations, and ProQuest in order to support student researchers of all disciplines learn about the management of complex digital objects at the beginning of their careers. We knew that a range of efforts have been launched in the last five years to train and prepare data scientists to manage, share, and ensure the sustainability of their data outputs, including DataONE’s webinars and education modules, the DMPTool Data Management General Guidance, and the Virginia-based Data Management Bootcamp for Graduate Students. These data science resources are connecting researchers with the different university-based units that can help support and sustain their data outputs (e.g., IT, library, Offices of Sponsored Research, etc.). Beyond data science, however, we could locate few attempts to address the training needs of the broad array of researchers—humanities, social science, arts, and sciences alike—who increasingly need to manage complex digital objects. We determined that this was a critical area that needed improvement. We also determined that this need dovetailed with the need of ETD programs to provide a strong foundation for replication of research findings in ETDs. Or to say that another way, ETDs are a core output of the university. They’re also a key training mechanism for students. Part of what the ETD process should produce is a record of research that enables it to be validated and replicated. In many cases, a text-based PDF will not do that—many “theses and dissertations” are not text-based in reality—instead, they are often better represented in a range of other formats: e.g., video/audio files of performances datasets from experiments computer programs GIS-based visualizations
19
Questions? Katherine Skinner @educopia
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.