Presentation is loading. Please wait.

Presentation is loading. Please wait.

Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources.

Similar presentations

Presentation on theme: "Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources."— Presentation transcript:

1 Master Control Program Subha Sivagnanam SDSC

2 Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources MCP uses directives in batch submission scripts to submit to the queues of multiple resources. Eg: #MCP submit_host #MCP username #MCP scratch_dir As soon as the job starts to run on one of the resources, it removes the jobs from all other resources' queues.

3 Assumption: User should compile the application on the desired machines Input should be staged on the remote clusters Submission will be initiated only from one machine MCP can be initiated by –using, manually creating job scripts –using, automating job scripts based on desired attributes

4 MCP flow Grid credential needs to be established (grid-proxy-init or myproxy-get-delegation ) Write job script for each resource Example – NCSA jobscript #!/bin/ksh #MCP qtype pbs #MCP submit_host #MCP username your_username #MCP scratch_dir /home/ncsa/your_username/info/mcp/test/mcp #PBS -l walltime=00:05:00,nodes=4:ppn=2:compute #PBS -d /home/ncsa/your_username/info/mcp/test/run NPROCS=`wc -l < $PBS_NODEFILE` /usr/local/mpich/mpich-gm-1.2.5..10-intel-r2/bin/mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS /home/ncsa/your_username/testprog/ring26 -t 10 -n 2 -l 10 -i 0.03125 #/bin/sleep 900

5 User submits the job files to MCP with job files as the input../ [--debug] MCP submits jobs to all clusters and monitors all clusters for job start Once one job starts, MCP cancels all other jobs

6 Fullauto Flow User runs grid-proxy-init or myproxy-get-delegation to establish grid credential. is created with personalized settings. Eg: match_attributes = { 'CPU_MODEL' : ['==', 'ia64'], 'CPU_MEMORY_GB' : ['>=', 2], 'CPU_MHZ' : ['>=', 1300], 'CPU_SMP' : ['>=', 2], 'NODECOUNT' : ['>=', 128], } machine_dict_list = [ { 'HOSTNAME' : '', 'substitutes_dict' : { 'arguments' : ['-t', '100', '-n', '10', '-l', '4000', '-i', '0.03125', '-c', '0', '-s', '0'], 'wallclock_seconds' : '300', ‘ __MCP_SHELL__' : '/bin/ksh', ‘ __MCP_PARALLEL_RUN__' : '/usr/local/mpich/mpich-gm-1.2.6..14b-intel-r2/bi n/mpirun', ‘ __MCP_SERIAL_RUN__' : '#', ‘ __MCP_NODES__' : '4', ‘ __MCP_CPUS_PER_NODE__' : '2', ‘ __MCP_USERNAME__' : 'your_username', ‘ __MCP_SCRATCH_DIR__' : '/home/ncsa/your_username/info/mcp/test/mcpdata', ‘ __MCP_JOB_DIR__' : '/home/ncsa/your_username/info/mcp/test/run', ‘ __MCP_EXECUTABLE__' : '/home/ncsa/your_username/testprog/ring26', }, }, ]

7 User runs with as the input. --autojobfile= Fullauto finds clusters from the allowable list of resources ( and creates job scripts for each selected cluster. Fullauto uses MCP to run the scripts.

8 Resources available –attributes or from Resource NameLocation Queen Bee (Dell IA64 cluster) LONI Mercury (Intel IA64 cluster)NCSA Abe (Dell Intel IA64 cluster)NCSA Lonestar (Dell 1955 cluster) TACC Steele (Dell 1950 cluster)Purdue

Download ppt "Master Control Program Subha Sivagnanam SDSC. Master Control Program Provides automatic resource selection for running a single parallel job on HPC resources."

Similar presentations

Ads by Google