GXP in nutshell You can send jobs (Unix shell command line) to many machines, very fast Very small prerequisites –Each node has python (ver or later) –You have ssh access to it without being asked to enter passphrases (e.g., use ssh-agent for ssh) –Install GXP (only) to your home node. GXP multiplies itself to nodes you want to use
What is it useful for? With GXP, you can comfortably –operate many nodes, interactively or non- interactively –use nodes across multiple clusters –reach nodes behind firewalls/NATs –deal with many nodes some of which are daily dead or unavailable –coordinate multiple clusters as a parallel processing resource without installing any job scheduling software (PBS/condor etc.)
Things made simple by GXP (1) Launch a parallel program on many nodes across multiple clusters Kill them with a single stroke of Ctrl-C Simple PBS/Condor-like job scheduling Monitor specific programs (ps … on all nodes) Kill specific programs (killall … on all nodes) Clean up all processes as a last resort (bomb)
Things made simple by GXP (2) Copy a file to many nodes, some behind firewalls/NATs Elect a single node from each file system Get load-average of all nodes and drop highly-loaded nodes Check installation of a command and drop nodes that dont have it List processes consuming significant amount of CPU time
Our Experience A fairly large natural language processing task –parse > 100M web documents collected and archived by our web crawler –resource : 350 CPUs across two clusters –GXP integrates them without specific efforts Environments –No globus/PBS installed (globus/rsh ports are blocked across clusters) –Documents must be staged on demand due to disk capacity –Nodes in one of the two clusters cannot connect to outside the cluster. We used GXP to stage a file through multiple relaying hosts
Basic Usage `explore command: login & authenticate yourself to many nodes `e command: send and execute a command line (very fast)
Features (1) multihop logins You can reach nodes through other nodes Two typical scenarios where this is important –Home a cluster gateway cluster nodes –Very large clusters for which trees are mandatory Subsequent command submissions transparently reach all nodes home cluster gateways
Feature (2) node selection You dont always want to send commands to all nodes After logging in many nodes, you can interactively select some of them `smask command selects the nodes on which the last command succeeded
Feature (3) coordination syntax `e command takes an extended shell syntax e {{ S }} M –Run S on all selected nodes –Run M on home node –Merge all Ss standard out and feed it to M e B {{ S }} –Feed Bs standard out to all Ss e S is a abbreviation of e {{ S }}