Gonçalo Borges Jornadas LIP – 21/22 Dezembro 2005 - Peniche The future for LIP/Farm: Sun Grid Engine 6 ? Gonçalo Borges Jornadas LIP – 21/22 Dezembro 2005 - Peniche
What is the Sun Grid Engine (I) The grid engine system is an advanced workload resource management tool for heterogeneous distributed computing environments. Workload management means that the use of shared resources is controlled to best achieve an enterprise's goals such as productivity, timeliness, level-of-service, ... Workload management is accomplished through dynamical managing resources and administering policies. Sun Grid Engine is a SUN commercial product. We are presently testing an open source version of SGE6: (http://gridengine.sunsource.net/) Jornadas LIP 2005, Peniche, Portugal 2
Bad resource management ! Better resource management ! Why are we studying SGE Main advantages: Allows dynamical resource sharing through the implementation of different policies: Sometimes, groups need more machines than the ones they own; Some othertimes, group machines are completely empty. When machine A from group A is empty, that machine can be used by group B, which has its own machine B overloaded ... Nevertheless, if group A needs to submit new jobs, the system automaticaly assigns the highest priority to group A jobs, decreases the priority of group B jobs and reject more requests from group B users Allows to group queues and hosts in clusters Common configuration; Allows to restrict users, groups and projects access to certain machines and queues; Allows to establish priorities to queues; Good graphical interface for administrative and user tasks; Same user commands as PBS (easy migration to the user); Good documentation; Open source code. Bad resource management ! Better resource management ! Jornadas LIP 2005, Peniche, Portugal 3
Matching resources to requests (a good analogy) Imagine a large bank. In the bank's lobby are dozens of customers: One customer wants to withdraw a small amount of money; Another customer has an appointment with the investment specialist; Ten customers in front of the first two want to apply for a large loan. The effect is that customers must wait for service unnecessarily. Many of the customers could receive immediate service if only their needs were immediately recognized and then matched to available resources. If the SGE were the bank manager, the service would be organized differently: On entering the bank lobby, customers would declare their name, affiliations and service needs; Each customer's time of arrival would be recorded. Based on the information provided in the lobby, the bank would serve: Customers whose needs match suitable and immediately available resources; Customers whose requirements have the highest priority; Customers who were waiting in the lobby for the longest time. In a SGE system bank would try to assign new customers to the least-loaded and most-suitable bank employee. As bank manager, the SGE system would allow the bank to define service policies: To provide preferential service to customers which generate more profit (functional policy); To ensure that certain customers are served well since they have received bad service so far (shared based policy); To ensure that customers with an appointment get a timely response (urgency policy); To prefer a certain customer on direct demand of a bank executive (override policy). Jornadas LIP 2005, Peniche, Portugal 4
QMON – A powerfull GUI Jornadas LIP 2005, Peniche, Portugal 5
Conclusions We have just started testing the Sun Grid Engine workload management system. Do not forget that this is an open source version from a SUN interprise product: Some services may deliberably not work; Or they may work differently than what we learn in the manuals. It seems to be easy to configure and with more features than MAUI (PBS). It may be used in LIP/FARM (in the near future) if it is able to go through our pre-requisites of dynamical resource sharing. Jornadas LIP 2005, Peniche, Portugal 6