Preserving containers Euan Cochrane Digital Preservation Manager Yale University Library
How long do we need to preserve data and software for reproducibility purposes?
Short answer: Forever Long(er) answer: It depends on your philosophy of science and your faith in humanity
“non-reproducible single occurrences are of no significance to science” Karl Popper, The Logic of Scientific Discovery, Routledge, London, 1992, p. 66. “No amount of experimentation can ever prove me right; a single experiment [at any point in time] can prove me wrong.” Albert Einstein (allegedly) Will humanity ever not want to have the option to reproduce computational science from today?
How long will containers be usable?
http://stackoverflow.com/questions/17934004/how-does-docker-allow-portable-containers-if-the-kernel-libraries-change
http://stackoverflow.com/questions/17934004/how-does-docker-allow-portable-containers-if-the-kernel-libraries-change
NB: Interesting conversation about ABIs here: https://plus. google http://unix.stackexchange.com/questions/47495/oldest-binary-working-on-linux
http://unix. stackexchange http://unix.stackexchange.com/questions/47495/oldest-binary-working-on-linux
Linux-Dependent containers can only be guaranteed to be usable while the operating system is
Windows/Mac containers will be worse-off Try running old Windows programs in Windows 10, even with the compatibility layer Which version of Windows? Windows RT? Windows IoT? Windows 32-bit? Apple completely dropped support for PowerPC software after OSX Tiger
Q: How long will containers be usable without intervention? A: As long as the operating systems are
So what about the operating systems?
Challenges to operating system compatibility over time Loss of backwards compatibility of new hardware with old software has happened many times in the past E.g. Mac OS X Panther (version 10.3) requires a PowerPC processor Old operating systems often cannot interface with modern hardware Raspberry Pi (ARM) operating systems will not run on x86 hardware – will Raspberry Pi follow Apple and move to x86 processors? Microsoft Windows Internet of Things edition will not run on x86 hardware Future advances such as quantum computing or 128-bit processors could remove backwards compatibility with older operating systems
Summary: We can’t just put things in containers, we need to preserve the containers
How to preserve containers
Preserve access to the Operating Systems Preserve the operating systems Maintain and develop emulators
Preserving operating systems is achievable One preserved instance of an operating system can support limitless numbers of compatible containers We can use existing technologies and methods to preserve operating systems
(bwFLA) Emulation as a Service - EaaS An emulation simplification tool Enables remote access to emulated (or virtualized) machines via a web browser Simplifies the use of emulation & virtualization in limitless workflows by providing a generic API to existing emulators Enables citation of complex digital objects Reduces preservation costs by sharing underlying (e.g. OS) bit streams amongst EMs Can run remotely or on local hardware Can pass hardware connections from host computer to emulated computers when run locally http://eaas.uni-freiburg.de/ Docker package available for installation locally see: http://bw-fla.uni- freiburg.de/wordpress/?p=817
How might using emulation for preserving containers be incorporated into scientific workflows? During the research process scientists test their containers to ensure they can run on Emulated Machines (EMs) At the point of publication scientists: Install (automatically where possible) published packages on a new EM derivative instance hosted by a digital archive Document and configure external data dependencies either on the same EM or as an associated data source connectable to the preserved EM Receive a unique persistent URL for the EM and it’s networked/associated “external” dependencies Scientists share the URL for their EM with reviewers and the community The digital archive preserves the EM over time and provides appropriate access to it
Challenges to achieving sustainable container preservation Archives of preserved operating systems need to be funded, established and maintained Instances of emulation services need to be running and accessible by scientists Emulators need to be preserved Big-data makes this more complicated The scientific community needs to buy-into this vision External data sources that are dependencies of the containers need to be able to be preserved, documented, and usefully associated with the preserved containers via a practical workflows
Thank you Euan Cochrane Digital Preservation Manager Yale University Library Euan.Cochrane@yale.edu http://twitter.com/euanc http://eaas.uni-freiburg.de