SAYING WHAT WE DO – DOING WHAT WE SAY: Preservation Issues (Metadata And Otherwise) In Institutional Repositories Sarah L. Shreeves University of Illinois.

Slides:



Advertisements
Similar presentations
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Advertisements

The Keys to Speed. File Extensions Definition A tag of three or four letters, preceded by a period, which identifies a data file's format or the application.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Digital Multimedia.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
Digital Preservation Practices and Strategies at Colorado State University Libraries.
TRAC / TDR ICPSR Trustworthy Digital Repositories.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Rethinking language documentation & support for the 21st century David Nathan Endangered Languages Archive SOAS University of London.
The New DRS (DRS 2) Introduction. What is DRS? Digital repository for preservation and access –Maintains integrity of deposited content –Preserves content.
Nat 4/5 - Software Design and Development – Low Level Operations - 1 National 4/5 – Computing Science Information Systems Design and Development Media.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
Archive-It and CINCH tool: Using web harvesting to facilitate born- digital preservation Kathleen Kenney Archive-It Partners Meeting 2012.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
© 2008, IDEALS This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit
Implementation of digital repository at the Ruđer Bošković Institute: organizational and technical issues Alen Vodopijevec Ruđer Bošković Institute, Library.
Digital Preservation: Store & Protect Laurie Sauer Information Technologies Librarian Knox College
Digital Preservation 101, or, How to Keep Bits for Centuries Julie C. Swierczek Digital Asset Manager and Digital Archivist Harvard Art Museums.
© 2007, Sarah L. Shreeves This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. To view a copy of this license,
The Role of File Formats in Digital Preservation: Opportunities and Threats ErpaTraining on File Formats for Preservation Vienna, May 10-11, 2004 Frank.
Welcome! Computer 101 Session 3 With Laura Crichton.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
© 2007, This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. To view a copy of this license, visit.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Maturing Digital Object Management Practices at GVSU Julian Jenson.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
Data in the NEES Data Repository Conditions for Current and Future Use and Re-Use Quake Summit 2012, Boston, Massachusetts July 12, 2012 Stanislav Pejša.
Storage of digital objects Adolf Knoll National Library of the Czech Republic
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
Institute Repositories and Digital Preservation : Assessing Current Practices at Research Library Rathachai Chawuthai Information.
ScholarSpace & Open UH Mānoa March 2013 Beth Tillinghast Web Support Librarian ScholarSpace & eVols Project Manager UHM Library.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
DIGITAL PRESERVATION IN THE WILD CARLI Digital Preservation Forum – July 21, 2009 Tim Donohue Research Programmer - IDEALS University of Illinois (with.
NDSR Boston webinar: Digital Preservation Introduction Presenter: Nancy Y McGovern October 2015.
How Not to Lose Track of Your Research Organization and Planning Resources at Brandeis Melanie Radik and Raphael Fennimore Library & Technology Services.
Stacy Nowicki, Library Director Michigan Academic Library Council Meeting Davenport University, Grand Rapids, MI 18 March 2011 Dspace at Kalamazoo College.
Vicki Tobias Introduction to and Institutional Repositories.
Archiving and Preservation Michele Kimpton CEO, DuraSpace Bryan Beecher Director, ICPSR DuraSpace Webinar November 2, 2011.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
 Advertisements Created  Artwork  Articles Written/Published  Annotated Bibliographies  Awards Received  Budgets Created/Managed  Books/Records.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Portico’s “d-collections” preservation service Stephanie Orphan Positive trends in sustainability? Emerging approaches to archiving commercial databases.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Libraries in the digital age Collection & preservation for generational access part two The LOCKSS Program.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
Community Spaces Teacher and Student Communities Web 2.0 ‘A Space for My School’s Communities’ Teacher and student Create, access and manage Community.
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Introduction to Managing Research and Personal Data.
Joint Meeting of CSUL Committees,
Identifying Barriers To File Rendering In Bit-level Preservation Repositories A Preliminary Approach Kyle R. Rimkus, University Library Scott D. Witmer,
Topics in Born Digital Archiving
DAITSS: Dark Archive in the Sunshine State
How NOT to share your data: Avoiding data horror stories
DAITSS and the Florida Digital Archive
Statewide Digitization and the FCLA Digital Archive
Bentley Project Reel Digitization Bentley Historical Library t
Implementing an Institutional Repository: Part II
File Extension Mini-Lesson
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Lesson 5: Multimedia on the Web
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Bentley Audio Digitization
Lesson 6 File Types.
Presentation transcript:

SAYING WHAT WE DO – DOING WHAT WE SAY: Preservation Issues (Metadata And Otherwise) In Institutional Repositories Sarah L. Shreeves University of Illinois at Urbana-Champaign (with many thanks to Tim Donohue) Intellectual Access to Preservation Metadata IG ALA Annual Conference – July 12, 2009

WARNING! Metadata will not be a huge part of this talk mostly because, well, most IRs don’t do a good job at preservation metadata (or descriptive metadata for that matter). More on that later….

Why do we start IRs?  Centralize access to material produced at institution  Create environment for preservation and permanent access to material  Provide open access to content  Advance a new scholarly communication model Rieh, SY et al “Perceptions and Experiences of Staff…” Library Trends 57 (2)

“an exploration or an experiment” “don’t have a clear notion of what it will become… [we’re] asking [people on campus] to help us define what it can do for them…” “a trend we should explore” Why do we start IRs?

Preservation Challenge for IRs can receive pdf, doc, xsl, html, xml, txt, jpg, tiff, jp2, csv, rtf, avi, mp3, ppt, wav, ogg, png, gif, ram, odt…. from faculty, staff, students with little to no knowledge of how materials were produced or their context or answers to questions like DRM? Embedded files? Lossy compression? Macros?

Regular back ups = digital preservation TRAC compliance is part of the digital preservation program “Not many interviewees were interested in digital preservation issues” “Those that were [interested] consistently emphasized that IR staff should know what they are promising.” Confident in the long term sustainability of IRs Interviewees were “far less coherent when discussing digital preservation.” From MIRACLE study at Univ. of Michigan

Why this study in contrasts? Preservation is something we can do later…. Our software and technical infrastructure just does preservation …. It’s too hard to get our software and technical infrastructure to do that… No staff, resources, training, expertise…. It’s too hard period...We can’t deal with data sets! We can’t deal with audio and video! We can’t deal with complex objects! We can’t deal with petabytes!

In short IR managers have been so distracted by access and ingest issues that very little attention has been given to date to the problem of how promises to preserve this material will be honored. Building an IR without making plans for technological, organizational, and resource allocation is like building a house on sand. McGovern and McKay Leveraging short term opportunities…. Library Trends 57 (2)

Deep breath!

Promises, Promises “create a reliable and easy to use repository service to preserve, manage, and provide persistent and widespread access to the digital scholarship faculty and students now produce…” - Can we really commit to preserving everything? - What does it really mean to preserve this stuff? - What kind of staff expertise do we need? - What kind of resources do we need? - What kind of technical infrastructure do we need? (Dspace was mostly already chosen…)

Getting our act together 1. Starting talking to our Preservation Librarian! 2. Training and self education 3. Assessment of where we were and where we needed to go

Takeaways  “Preservation” needs to be unpacked.  Not about the technology.  Explicitness is key.  You don’t have to preserve everything to the fullest extent if you say you aren’t.

From Dorothea Salo Institutional repositories for the digital arts and Humanities. Humanities Digital Curation Institute. Champaign IL. May

Getting our act together pt 2  Secured explicit administrative support and commitment for digital preservation management program in IDEALS.  Developed high level preservation policy:  Developed actionable procedures and policies that can be reassessed and changed as needed  Began next stage of identifying gaps, like….

Photo by Sylvar. Used under a Creative Commons 2.0 Attribution license. Not Really Our Server Room! Backup tapes stored next to the server! Getting our act together pt 2

Digital Preservation Support  Format-based Categories of Support High Confidence Full Support (including migration) Medium Confidence No migration promised Low Confidence “Bit-level” support only Low Confidence (gray area) (size ≠ weight)

 Compilation of “known” formats  Concentration on textual formats Format Support Matrix ProprietaryOpen Microsoft OfficeOpenOffice.org, HTML Limited Adoption Widely Adopted OpenOffice.orgMicrosoft Office, HTML Limited Support Widely Supported Microsoft OfficeAdobe PDF, HTML Embedded Content / DRM Nothing Embedded MS Powerpoint (w/ Audio or Video)MS Powerpoint Lossy Compression No/Lossless Compression JPEGTIFF, JPEG 2000

Format Recommendations Textual CSV, Text, PDF/A, XML* Open Document Format RTF, MS Office, PDF, HTML Audio AIFF, WAVE, Ogg Vorbis, FLAC AAC, MP3, Real, WMA Images TIFF, JPEG 2000 GIF, JPEG, PNG Video AVI, Motion JPEG 2000 MP2, MP4, Quicktime, WMV High Confidence / Preference Medium Confidence / Preference

What we are doing  Basic Activities (All Items: )  Regular Virus Scans, Checksum verification  Nightly off-campus backups  Refresh storage media  Preservation Metadata (minimal) Format, checksum, file size, etc.  Permanent Identifiers (Handles)  Always keep the original document  Monitoring and reassessment of formats Very minimal/infrequent for

What we are doing  Intermediate Activities ( )  Additional monitoring, more frequent reassessment  When possible, attempt to migrate formats to preserve content and style (hopefully) No promises that functionality will be preserved (e.g.) Powerpoint  PDF (possible functionality loss) (e.g.) PDF 1.4  PDF/A (possible style loss)

What we are doing  Full Support Activities ( )  Additional monitoring, more frequent reassessment  When necessary, migrate document to successive format.  Attempt to preserve content, style and functionality (e.g.) PDF/A  successor to PDF/A

About that metadata…. We automatically collect: - type of format (but this is not verified) - size of file - provenance information (who deposited it and when; automatic conversion activities; and SOME changes that occur later in a file life) - checksum If we make manual changes our procedure is to manually add information to provenance information.

Our First Problem…  Character issues in Word (and PDF)  Found by chance  Consultation with submitter  Originally Wordperfect  Re-submitted as RTF

Big Gaps! - We aren’t checking the validity of formats - We collect pretty minimal metadata - We’re not checking every file for problems - We don’t check every automated conversion BUT - We do explicitly acknowledge these gaps.

Some questions….  What’s the right balance in IRs?  Is transparency an issue?  Are some materials more deserving of ‘full’ preservation than others in our IRs?

Contact Information Sarah Shreeves Coordinator, IDEALS