Co-allocation Using HARC IV. ResourceManagers HARC Workshop University of Manchester
Philosophy New types of RMs can be written by others Existing RMs can be customized Interfaces can be enhanced or changed None of this means changing the acceptor code API is extensible too Good community contribution model CCT keeps control of the acceptor code The acceptor code will become very stable (already less than one commit per month) The community evolves the system
Are RMs Easy to Install Harder than client software Much easier than Acceptors Complexity is in the right place: –Only a few people install and configure Acceptors (infrastructure), which is hard –Some people modify/write RMs, which is not too hard –More people install and configure RMs which is easy –Many people install and configure the Client software, which is trivial
Pre-installation - Perl RMs are written in perl, to make installation trivial However, they need a large number of CPAN modules to be installed Some of these, e.g. Net::SSLeay and Crypt::SSLeay are not trivial There is a document which contains things to watch out for –Lists previously seen problems, with solutions –Basically a list of exceptions –Now 7 pages of text! –There’s a lot of AIX content...
Pre-installation - Certificate HARC RM needs a certificate We don’t recommend re-using the host certificate Get a service certificate UK e-Science CA now supports: –harccrm for Compute RMs (CRMs) /C=UK/O=eScience/OU=Manchester/L=MC /CN=harccrm/man2.nw-grid.ac.uk/ Address=... –harcacceptor for Acceptors /C=UK/O=eScience/OU=Manchester/L=MC /CN=harcacceptor/man4.nw-grid.ac.uk/ Address=....
Installation Procedure There’s an installer which installs stuff from the CVS tree - this may change HARC environment variable points to the root of the repo (“negotiation” directory) You have a subdirectory in –$HARC/rm-service/config For example –$HARC/rm-service/config/nw-grid/man2
Installation Procedure 1. Create Contents –install.config - more shortly –grid-mapfile - GT-style mapfile for cert to username mapping (usually a sym-link to /etc/grid-security/grid- mapfile) –acceptor_mapfile - a list of the Acceptor DNs, and also their CA cert DNs –cacerts directory, containing CA Certs for your cert and the Acceptor certs, in PEM format, suffix.crt 2. Then a trivial Install –install-rm nw-grid/man2 /usr/local/man2-rm
install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST= RM_URL=man2-rm RM_PORT=9393
install.config RM_INNER_TYPE=SimpleCompute RM_COMPUTE_NODENAME=man2.nw-grid.ac.uk RM_COMPUTE_BATCH_TYPE=TorqueMaui RM_COMPUTE_MEMORY_MB_PER_CPU=4096 RM_COMPUTE_CPUS=8 RM_MAUI_COMMAND_DIR=/usr/local/maui/bin RM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a Dual AMD Opteron Linux cluster’ RM_HOST= RM_URL=man2-rm RM_PORT=9393 man2.nw-grid.ac.uk
Installation Step Before Installing –Need PERL5LIB and LD_LIBRARY_PATH to be defined in your environment when you install –Or can add these to the config file –Don’t have to set these if you don’t need to Then a trivial Install –install-rm nw-grid/man2 /usr/local/man2-rm –Script is in $HARC/rm-service/scripts What does this do?
What happens? rm-service $ scripts/install-rm nw-grid/man2 /Users/jonmaclaren/man2-rm Makefile.crt... Skipped cct-ca.crt... 5fb2fc80.0 old-uk-escience-ca.crt uk-escience-ca.crt... adcbc9ef.0 uk-escience-root.crt c1cd.0 Notice: Don't forget to place your certificate and key files at: /Users/jonmaclaren/man2-rm/x509/server_cert.pem /Users/jonmaclaren/man2-rm/x509/server_key.pem Installs Source files Creates a crontab & scripts for restarting the RM Customizes some scripts for stopping/starting the RM Installs and hashes CA certificates Output:
What’s in /usr/local/man2-rm ? Some Perl Modules And OuterRM.pl which gets run commands - which configures and runs the RM (based on install.config, etc.) rerun - runs “commands” in the background from crontab crontab - crontab line which can be added directly to your crontab (don’t cut and paste!) start-rm, stop-rm - control whether rerun will actually start the RM, using a control file (.do_not_restart) –./stop-rm –./start-rm [ -w ] x509 - subdirectory containing all the CA certs, mapfiles, etc.
Perl Modules Just an overview here... There is a doc online which has some details on these
Key Modules OuterRM - just does the HTTP listening and Acceptor Cert authN/authZ MainLoop - handles each request TransactionManager - remembers what transactions (by TID) are running, and what their states are InnerRM - the main class for different types of RM –SimpleComputeRM –SimpleNetworkRM –Both inherit from InnerRM
SimpleComputeRM Handles batch queue systems Deals only with processors/memory To talk to the scheduler, a subclass of SCBatch is used –SCBatchTorqueMaui.pm –SCBatchTorqueMoab.pm –SCBatchLoadLeveler.pm - not in CVS yet... Chosen at runtime - RM_COMPUTE_BATCH_TYPE Simple modules –Less than 200 lines –Override initialize makeReservation cancelReservation getStatus
Customizing InnerRM Startup/shutdown –initialize/remove Parsing (validating) the XML –parseResourceElement –parseWorkElement –maybe parseScheduleElement Co-allocation –tryMakeAction –tryCancelAction –addResourceBookings –completeTransactionBookings Others for getTimetable/getStatus
Steps for creating a new RM 1.Design your XML Resource element Work element 2.Create a new subclass of InnerRM.pm Use the utility classes where possible 3.To extend the API, create subclasses of Resource.java Work.java
Caveats for RMs Need to restart to re-read grid-mapfile When restarted, they forget the bookings –Want to add persistence so that it’s trivial for RM developers to utilize Thread handling needs work (soon!)
What’s next? Discussion on MPIg... Beer?
But first......Any Questions?