Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager University of Leeds Twitter:

Slides:



Advertisements
Similar presentations
Pulling it all together… with thanks to Sheila Anderson.
Advertisements

Animesh Bhattacharyya Librarian, Vivekananda Mahavidyalaya
Preservation of Software Barbara Sierman (digital preservation manager) E-Humanities Software and Tools Sustainability,
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
Getting started with hands-on preservation Paul Wheatley SPRUCE Project Manager University of
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
Data Storage and Security Best Practices for storing and securing your data The goal of data storage is to ensure that your research data are in a safe.
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
Current Thinking on Digital Preservation: Role of Metadata Oya Y. Rieger Coordinator, Library Office of Distributed Learning Cornell University Library.
Preservation and Long-term access through Networked Services Adam Farquhar, The British Library iPres2006 Cornell University, October 2006.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Open Exeter Project Team
Research Data Management: The Basics Open Exeter Project team.
FITS and C3PO enhancements Paul Wheatley SPRUCE Project Manager University of
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Digital preservation Hydra Europe, LSE 24 April 2015 Anders Conrad.
Microsoft Office PowerPoint 2013 Microsoft Office PowerPoint 2013 Courseware # 3256 Lesson 8: Sharing Presentations.
LIFE 3 LIFE3: Predicting Long Term Preservation Costs Paul Wheatley Digital Preservation Manager The British Library.
Digitisation of Archival and Manuscript Materials in Libraries Presentation by Martin Bradley.
Testing New Approaches Delivering CIS through Social Media Channels Martin Ledwick Cancer Research - UK.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Digital Preservation: Store & Protect Laurie Sauer Information Technologies Librarian Knox College
Digital Preservation 101, or, How to Keep Bits for Centuries Julie C. Swierczek Digital Asset Manager and Digital Archivist Harvard Art Museums.
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
Preventing Common Causes of loss. Common Causes of Loss of Data Accidental Erasure – close a file and don’t save it, – write over the original file when.
OCLC Online Computer Library Center Digital Preservation with OCLC Digitization Standards: Issues & Updates Taylor Surface, OCLC.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Supporting practical preservation work and making it sustainable with SPRUCE Paul Wheatley SPRUCE Project Manager University of These.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
1 Designing Storage Architecture for Digital Collections 2012.
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
From Your Archive to the Web: Managing the Project The digitization of the Historic Photograph Collection of the Public Library of Brookline Digital Commonwealth/
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Storage of digital objects Adolf Knoll National Library of the Czech Republic
Verification & Validation F451 AS Computing. Why check data? It’s useless if inaccurate. Also, wrong data: Can be annoying Can cost a fortune Can be dangerous.
The Story of at the Alaska State Library Presented by Sheri Somerville Alaska State Library March 14, 2009.
OCLC Online Computer Library Center The ‘Hows’ and ‘Whys’ of Preserving Digital Materials Brian Lavoie Research Scientist OCLC CARL program: “Here Today,
Introduction to the sessions & structure of the Hackathon Paul Wheatley British Library / OPF / DPC.
People Mashing: What we did in the AQuA Project Paul Wheatley (and) Andrew Jackson, Bo Middleton, Jodie Double, Rebecca McGuinness.
Digital Preservation 8/7/2012 Karen Estlund Head, Digital Library Services
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
IT1001 – Personal Computer Hardware & system Operations Week7- Introduction to backup & restore tools Introduction to user account with access rights.
Archiving and Preservation Michele Kimpton CEO, DuraSpace Bryan Beecher Director, ICPSR DuraSpace Webinar November 2, 2011.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Verification & Validation
Preservation of Digital Data by Christian Wellner Based on: Howard Besser. Digital longevity. In: Maxine Sitts (ed.) Handbook for Digital Projects: A Management.
New Opportunities Fund Preservation Workshop March 15th 2002 Maggie Jones Cedars Project Manager.
Digital Stewardship Lee Dotson Digital Initiatives Librarian University of Central Florida John C. Hitt Library Presentation available at
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
John Samuels October, Why Now?  Vista Problems  New Features  >4GB Memory Support  Experience.
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
By Jason Swoyer.  Computer forensics is a branch of forensic science pertaining to legal evidence found in computers and digital storage mediums.  Computer.
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
Working with personal digital archives Susan Thomas Project Manager & Digital Archivist project Manuscripts Matter, Electronica panel London, October.
KEEPS – a system for UELMA preservation and security
Open Exeter Project Team
DAITSS and the Florida Digital Archive
Information In our Lives
Experiences and Outlook Data Preservation and Long Term Analysis
Digital Preservation In Practice
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Storage Basic recommendations:
Have you seen this screen?
Presentation transcript:

Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager University of Leeds Twitter:

Summary Some digital preservation challenges and solutions –Not exhaustive –Illustrate with some real examples –Summarise with some practical steps for digital preservation Taking a community approach to digital preservation –SPRUCE Project –How to get involved –Where to get help

Keeping the bits

Digital data is fragile Courtesy of State and University Library, Denmark

Media decay Media becomes partially or completely unreadable Media obsolescence Without the respective hardware to read the (hand held) media, it becomes inaccessible Practical issues Inserting lots of discs into a drive is costly Images courtesy of The British Library Digital preservation storage: keeping the bits

Bit storage recommendations Don’t fall for media longevity claims from vendors! They are missing the point! Accept that media decays, media formats will change, and any media will become inaccessible in the medium term Rather than putting your data in a dark archive and trusting it will survive for a long period... Manage it closely, refresh to new media frequently, chose media that is easy to manage –Choose media that is easy to access (server storage, cloud, external hard drives) –Make at least 3 copies of all data, keep copies in different geographical locations –Frequently check the condition of your data

Are any of your digital files missing? Are any of your digital files damaged? Verifiable Manifests (Checksums!) Single most useful digital preservation activity Generate manifests as early as possible Frequently re-check them over time Mend content when necessary LoC Bagit specification and Bagger tool Allow you to easily check the condition of your digital stuff

SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 200x392 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2… … Dependence on software

When it goes wrong…

Migration, Emulation and all that... Migrate content from an obsolete format to a more modern usable format Emulate the original computing environment and run the obsolete software originally used Words of caution: –Is software obsolescence a really critical risk for our digital data? –The debate continues... International Council of Archives Congress 2012: Michael Carden, National Archives Australia: NAA migrates all content Oliver Morley, UK National Archives: “digital formats have standardized” Blogged by Inge Angevarre: –The hard part is the quality assurance of the results. Was anything lost or damaged in the process?

Stuff happens! Whenever a digital collection is moved, processed, curated or altered in any way.... things can go wrong! Network dropouts at critical times Disks get full, subsequent data copied there is lost Software bugs lead to unexpected results Human error leads to all sorts of issues Stuff happens a lot more at scale!

Digitisation post processing corruption Images courtesy of The British Library

TIFF to JPEG2000 migration corruption Images courtesy of The British Library

JPEG 2000 Format specification ambiguity and corresponding tool bugs JPEG2000s can be missing vital source resolution Technology can be imperfect! For more on JPEG2000 format and tool risks see: Images courtesy of The British Library

Only process or alter digital content when it is absolutely necessary Double check everything Make no assumptions Assume nothing, validate everything

Prompt check in – have you got what you thought you would receive? –Check expected files are present, open a random selection to verify expected quality –Request replacements from supplier promptly Create a verifiable manifest –Create a top down manifest file that lists each digital object in your collection as a relative filename and a checksum –Library of Congress Bagit specification and tools will also do a good job here Make at least 3 copies. Protect the bits –Keep a copy on easily accessible media –Backup to tape or more disk. Keep copies in different geographical locations to avoid catastrophic disaster. Cloud storage is also an option. Frequently inspect the condition of your data –Revisit the collection, recalculate your manifests and verify content has not been lost –Do a test recovery of your backups to ensure they are working effectively! Record the existence of each of your collections in a digital items register –Record: What it is, who is the responsible owner, where it is, who owns it, and who can access it. Assume nothing, validate everything! –Double check any processes in the lifecycle that move or alter your digital content –Built in checks can be flawed, a second opinion is much more trustworthy First steps in practical digital preservation

JISC funded 2 years in length (until Nov 2013) £250k funding SPRUCE Project Sustainable Preservation Using Community Engagement

Some observations Lack of focus on the real needs of digital preservation practitioners Insufficient collaboration + coordination Duplication of effort

3 day workshop for ~30 people Practitioners bring along digital collections We identify preservation challenges Pair up practitioners with technical experts Apply existing open source tools to solve the problems In doing so, we exchange knowledge about digital preservation Develop a supportive community The SPRUCE Mashup: Identify and Solve concrete problems Glasgow Mashup April 2012

–What is this digital collection? –What risks are associated with this digital collection? –Separate collection content from temporary/other files. –Identify and weed duplicate or similar files. –Is the metadata consistent with the content? –Are all the pages present in each issue? –Are all digitised pages in focus? –Are any files damaged? –Are the files compliant with a particular profile? See the results here: What questions do practitioners want answered?

Work with practitioners to develop a business case for their work Make small funding awards to further develop and embed the work begun in the mashups Make it sustainable York Mashup September 2011

Sharing requirements Sharing experiences: what tools worked well, what approaches should be avoided Building on existing tools, rather than re-inventing the wheel Libraries + Information Science question and answer site: – More recommended collaborative activities: – Online collaboration

Thanks for listening! Any quesions? Paul Wheatley SPRUCE Project Manager University of Leeds