Carrying Your Environment With You or Virtual Machine Migration Abstraction for Research Computing
Many research computing tools have specific environment requirements. Libraries Filesystem structure Kernel flags or settings Root access or tools These requirements can limit the systems that an application can be run on. Thus, it is desirable to have a reasonably transparent way for users to migrate applications to remote systems that will run on predictable virtual systems – either to increase performance or to provide a required capability.
A command line interface that requires no more data than a normal Condor user will provide, but which allows virtual machines to be easily integrated into distributed jobs. This requires: A compatibility assessment tool A UML migration and preparation subsystem Test and verification capabilities
Perl commandline interface Flags allow control of virtual machine deployment requirements User Mode Linux virtual machines Pre-built VM allows user to simply specify application and data which will be attached to the VM and deployed User space execution makes VM very portable Provides known hardware/software configuration for applications that are sensitive to these variables. Condor Job distribution framework allows us to treat UML VMs as applications which run tasks. Master/Worker control or other batch systems are a viable alternative.
Do user-mode virtual machines have reasonable performance for this purpose? What costs are incurred in migration? What elements are critical to determine if an environment is similar enough? How do you migrate Virtual Machines without serious bandwidth issues? How effectively can VM migration be abstracted, and how much must a user know about their requirements?
We have presumed that it is reasonable to expect that a researcher will know their application’s general requirements. Architecture, specific libraries, etc. There are many possible ways to measure compatibility, so a flexible syntax must be provided Simple checks, such as Linux version More complex checks such as specific library versions Kernel settings Memory and tmpfs A method to specify “required” versus “report differences” is desirable in some circumstances.
VMNative grep “GAATTCATTCCTACCTGGGT” M.fastq | wc –l M.fastq = 3.2 GB text file MachineVMNativeSlowdown 8-core, 64-bit core, 32-bit core, 64-bit
VM 463 s. average 320 s. average Native 128 s. average 194 s. average Research workload – 32-bit application comparing an 8kb file against ~5GB MachineVMNativeSlowdown 8-core, 64-bit core, 32-bit core, 64-bit
Data and VM decompression outweigh transfer on fast networks, and vice versa on slow networks. For performance, jobs should only be run in VM if the job length exceeds total cost of transfer and initialization after accounting for slowdown and the number of machines is larger than the slowdown ratio for that job. In general, once you have accounted for this:
Improve compatibility checks Investigate optimized VM migration/provisioning methods (support for COW where distributed filesystems are available) Investigate use of the Master/Worker paradigm to improve distribution and workload capability Build method to handle “Ignorant” users by providing automating testing of applications in a very limited manner. Test the application in available VMs, and verify results for the sample, then deploy in that mode.
Distributed filesystems allow massive improvements in initialization speed and data distribution times. UML supports COW – a method that allows a single VM filesystem to be shared by many VMs. HostFS support means that the VM doesn’t have to be aware of the network or the filesystem itself. Advantages of Master/Worker vs. normal Condor On job completion, Condor will remove the VM, requiring re-distribution if all tasks are not completed in a single use of the VM.