Identifying Barriers To File Rendering In Bit-level Preservation Repositories A Preliminary Approach Kyle R. Rimkus, University Library Scott D. Witmer,

Slides:



Advertisements
Similar presentations
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
Advertisements

E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Digital Preservation Steps 1 & 2: Identify & Select.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Alternative FILE formats
Windows XP Photo Workflow Tim Grey Imaging Strategist Microsoft Corporation.
14 mai 2007Evolution of Scientific Publications, Colloque de l'Académie des sciences1 Preservation of electronic publications mission Catherine Lupovici.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Applying Theoretical Archival Principles and Policies to Actual Born Digital Collections LEIGH ROSIN | Digital Archivist | National Library of New Zealand.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
Digitization of Historical Materials Dana Logalbo-Baij LIBR559L June 9, 2011.
Research Data Management: The Basics Open Exeter Project team.
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
Multimedia Digital Library Marcia Johnson. Collection 25 text documents 25 text documents In HTML, PDF, TXT formats (source: Project Gutenberg) In HTML,
By: Kylia Ivory.  ( Demonstrate understanding of file extension and the purpose of file types across software products).
OCLC Online Computer Library Center Digital Preservation with OCLC Digitization Standards: Issues & Updates Taylor Surface, OCLC.
Digital Reformatting and File Management Public Library Partnerships Project Sheila A. McAlister Director, Digital Library of Georgia and Sandra McIntyre.
Allegra Huxtable Manager Government Recordkeeping Tasmanian Archives and Heritage Office.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
R Rathinasabapati Library & Information Officer Tuberculosis Research Centre (ICMR) Chennai Digital Archiving at.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
How OAIS and OA IR you? Developing workflows in publishing, promoting, and preserving faculty grey literature within a university Plato L. Smith II; Digital.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Institute Repositories and Digital Preservation : Assessing Current Practices at Research Library Rathachai Chawuthai Information.
Rights Metadata in DRS Basic Rights Functions in: – Batch Builder – EAS – DRS Web Admin.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
Digital Collections Forum Doug Moncur AIATSIS September 2004.
Digitally Signed Records – Friend or Foe? Boris Herceg Hrvoje Brzica Financial Agency – FINA Hrvoje Stančić.
Wisconsin Digital Summit November 28, 2006 Electronic Records in Wisconsin Presented by Amy K. Moran Division of Administrative Services.
Digital Preservation Panel Medusa at the University of Illinois at Urbana-Champaign: A Digital Preservation Service Based on PREMIS Kyle Rimkus, Preservation.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Al Cornish, Systems Librarian Washington State University Libraries Preserving Access to Multimedia Collections.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Libraries in the digital age Collection & preservation for generational access part two The LOCKSS Program.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
Kyle Rimkus, Thomas Padilla, Tracy Popp, Greer Martin full paper to appear in March/April 2013 edition of d-Lib Magazine Preservation Unit, University.
Rebecca L. Mugridge LFO Research Colloquium March 19, 2008.
Working with personal digital archives Susan Thomas Project Manager & Digital Archivist project Manuscripts Matter, Electronica panel London, October.
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Joint Meeting of CSUL Committees,
7th Annual Hong Kong Innovative Users Group Meeting
Current as of April/May 2013
Ingest and Dissemination with DAITSS
Digital Stewardship Curriculum
Dependency Management
Sunan Kalijaga State Islamic University Yogyakarta
Statewide Digitization and the FCLA Digital Archive
Bentley Project Reel Digitization Bentley Historical Library t
Digital Library Development
Digital Collections Update
There is no perfect file format
Experiences of the Digital Repository of Ireland
Open Access to your Research Papers and Data
Digital Project Lifecycle Curating Across the Curriculum
Storage Basic recommendations:
A Match Made In (Ethereal) Heaven
Medusa at the University of Illinois
Digitization Standards: Issues & Updates
Preservation Statistics Survey FY2017
Presentation transcript:

Identifying Barriers To File Rendering In Bit-level Preservation Repositories A Preliminary Approach Kyle R. Rimkus, University Library Scott D. Witmer, School of Information Sciences Medusa Preservation Repository As of November, 2016: 30,000,000 files 86+ TB of storage (replicated 2x+1 backup) Sources of Medusa content: Digitization: in-house and with external vendors Books, newspapers, documents Manuscripts, photographs, maps Audio and video Born digital electronic records Self-deposit of scholarly materials in IDEALS institutional repository

Digital Preservation Challenge: Identifying and evaluating trusted file formats

File Rendering Profile Testing Random Samples against profile

Reasons Reason Type Total System file not within scope of current testing out of scope 48 Auxiliary file created and used by a software program, not meant to be opened as individual file 12 Not meant to be opened—Mac system file with underscore in name 9 Not a file—artifact of disk formatting 5 Software available on market, but testers have not yet acquired it 2 Not meant to be opened—software system file with @ symbol in name 1 Not meant to be opened - temporary file with ~$ in name TOTAL OUT OF SCOPE   78 No file extension file management 16 Despite file extension, file is in a folder designating it for another system purpose 14 Not a file extension Saved with incorrect extension TOTAL FILE MANAGEMENT 34 Software considers file invalid problematic file 13 File does not render in software 3 Software unavailable Software attempts to convert file to new version of format and fails. problematic file TOTAL PROBLEMATIC FILE 18 TOTAL ALL CATEGORIES 130 Testing Profile Pass Fail Total Tested TIFF 1276 1 1277 JPEG 1124 13 1137 JPEG2000 325 434 759 XML 540 2 542 PDF 402 GIF 192 3 HTML 130 TXT 114 EMLX 81 DOC 37 39

Conclusion and Next Steps Revisit JPEG 2000 Policy Remediate problem files to TIFF format Use TIFF for preservation master files Explore born digital electronic records Based on these results, we recommend that the batches of problematic JPEG 2000 files in Medusa be isolated and remediated to TIFF format. The failure rate of this subpopulation indicates that there may be around 700,000 files whose structure is unreadable by the Library’s image rendering software. These files don’t represent an immediate preservation risk. They can be reformatted. However, they do represent an access barrier to users. UIUC Library has also shifted its policy to using TIFF for preservation master files. JPEG 2000 will still be used, but in a more limited scope. Both online web applications and back-end image presentation systems render JPEG 2000 files quickly and efficiently, ensuring its ongoing value to the UIUC Library. The next stage of research involves the development of an improved methodology for testing collections of born digital electronic records. Born digital records make up only 2 out of 60 TB of Medusa storage, but they are disproportionately represented in failed tests. 52 of the non-JPEG 2000 failures originated from born digital collections. This failure rate demands closer attention from Medusa’s preservation managers and perhaps a reconsideration of the appraisal and curation requirements of born digital records. In conclusion, our analysis revealed some of the access challenges curators and patrons may face when attempting to open files stewarded in the Medusa repository. While we don’t know if these local challenges are shared by other institutions, we hope that the testing method and evidence-based approach demonstrated here may be useful to others in assessing their own file rendering issues