Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor
Virtual Machines › VMware Pros: Full virtualization, no modification for guest OS, and checkpoint/restart capability Cons: Commercial product and performance issue compared to Xen › Xen Pros: Open source, good performance, checkpoint/restart and live migration capability Cons: Requires OS modification and must divide memory between host and VMs in advance › UML (User Mode Linux) etc.
Benefit of using Virtual Machine in Condor › Sandbox Security and Isolation › Independent environment Customizing environment for Condor › Several OS’s on a single physical machine Support for a wider variety of jobs › Finer Resource Control Assign memory size to each VM explicitly › Checkpoint and migration All memory of VM can be saved(or suspended) and restarted(or resumed) later
Difficulty of using Virtual Machine in Condor › Hard to manage system memory efficiently › Need to know some information of host machine inside VM › Need to setup some environments in VM › If a VM cannot use the distributed file system, Condor’s file transfer or remote IO mechanism should be used › Need IP address for each VM
How to use VM in Condor Scenario 1 › Already launched VM is ready to be used as a execution machine for Condor jobs › Condor daemons should be installed and run on both the virtual and the host machine, which are all exposed to the pool › Condor startd on the host machine controls when a launched VM is used for Condor. › Supported by Condor and all future releases › Pros: Easy to implement › Cons: Inefficient memory management
Scenario 1 Host Machine Virtual machine Central Manager Collector Startd Negotiator Startd Execution machine Schedd Submit machine Communication pathway
Current Implementation How can VM get the information for host machine 1. Query ClassAd for host ClassAd for VM 2. ClassAd for host machine Virtual Machine VMP_HOST_MACHINE = host.domain.com ClassAd for host Name = host.domain.com” TotalLoadAvg = KeyboardIdle = 50 …. Host Machine VMP_VM_LIST = vmware1.domain.com Name = “vmware1.domain.com” TotalLoadAvg = KeyboardIdle = …. Before Query Name = “vmware1.domain.com” TotalLoadAvg = KeyboardIdle = … HOST_Name=“host.domain.com” HOST_TotalLoadAvg = HOST_KeyboardIdle = 50 … After Query
Current Implementation How does a VM get permission from host machine 1. Send VM_REGISTER ClassAd for VM Virtual Machine VMP_HOST_MACHINE = host.domain.com ClassAd for host START = ((KeyboardIdle > 150 ) && ( LoadAvg <= 0.3 )) Host Machine VMP_VM_LIST = vmware1.domain.com START = False If host status == (‘owner’ | ‘unclaimed’) START = ((KeyboardIdle > 150 ) && ( LoadAvg <= 0.3 )) else 2. Reply permission START = ((KeyboardIdle > 150 ) && (HOST_KeyboardIdle > 150 ) && ( LoadAvg <= 0.3 ) && (HOST_TotalLoadAvg <= 0.3 )) If permission == yes else START = False
Issues in current implementation for Scenario 1 › Problem: › Problem: host machine cannot be used for Condor any more after sending a permission to a virtual machine. › Possibility: › Possibility: A user may want to use both virtual and host machine in a SMP machine. › Possible solution: › Possible solution: After sending permission, host machine does not change START expression. Instead, the virtual machine sends its status to host machine periodically and host machine decides the permission for each virtual machine when a Condor job is assigned.
How to use VM in Condor Scenario 2 › Virtual Machine is launched on demand to serve a Condor job › Checkpoint and migration per virtual machine base can be used › Startd on host machine may have to advertise more than one OS. › A specific daemon in virtual machine needs to communicate with host machine. The daemon receives a command from host machine and executes it when a Condor job is assigned.
How to use VM in Condor Scenario 2 › Not yet implemented in Condor but we hope to do it soon. › Pros: Efficient memory management › Cons: Complex to implement
Scenario 2 Host Machine Virtual machine Daemon Startd Execution machine Negotiator Central Manager Collector Schedd Submit machine 1 launching Starter Shadow Communication pathway Creating/forking process daemon Submit machine 2 Virtual machine schedd Host Machine
Issues in Scenario 2 › Stop VM and save the entire memory of VM instead of suspending an executed Condor job when a user returns to the host machine › During migration if there is no shared file system, files used by a Condor job, including a program file, should be transferred because it is very hard to copy entire disk image. › Xen live migration technique can be effectively used for direct migration without checkpointing.
Scenario 2 - Migration Host Machine Virtual machine Central Manager Collector Daemon Negotiator Startd Execution machine 1 Schedd Submit machine Host Machine Virtual machine Daemon Startd Execution machine 2 migration launching Communication pathway before migration Communication pathway after migration Shadow Starter Creating/forking process
› Virtual Machine offers flexible solution in Condor Sandbox for security Can provide more than one OS on a single physical machine Can provide customized environment for Condor › Scenario 1 has been already supported since Condor › Scenario 2 is not yet implemented in Condor. That is a future work in Condor. Summary