Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtual mpirun Jason Hale Engineering 692 Project Presentation Fall 2007.

Similar presentations


Presentation on theme: "Virtual mpirun Jason Hale Engineering 692 Project Presentation Fall 2007."— Presentation transcript:

1 Virtual mpirun Jason Hale Engineering 692 Project Presentation Fall 2007

2 Rational  Compute cycles = money  Mimosa (250 nodes): $.06 per CPU hour  Wasted CPU Cycles -> Wasted Money Wasted User Time -> Less Research  Not all parallel computations run efficiently  Goal of a Supercomputing Center: Have users run on the max number of CPUS/Nodes they can utilize efficiently

3

4 MCSR Initiatives to Improve Utilization  g03sub Enhanced (virtualized?) wrapper for users submitting Gaussian calculations  Back-end Processes to poll PBS batch scheduler to compute utilization of parallel jobs; post to DB & Web; e-mail inefficient Users  Amber Alert System

5

6 These Systems Don’t Work for Mimosa Cluster  PBSPro can’t accumulate CPU usage times from parallel processes distributed across compute nodes  Idea: Create a monitor process that will follow parallel processes to nodes, monitor their CPU performance, and report back.  Virtualization: Users will not know about the process. They will launch a virtual mpirun (or g03sub), not realizing that is not the “real” one, and it will launch the real one along with the monitor

7 myprogram.exe Running an MPI Program on a Cluster myprogram.c cc myprogram.c –o myprogram.exe myscript.pbs #PBS –l nodes=4 mpirun –np 4 myprogram.exe myscript.pbs qsub myscript.pbs Virtual mpirun mpirun –np 4 monitor.exe & mpirun –np 4 myprogram.exe monitor.exe myprogram.exe Compute Nodes myscript.pbs Head Node

8 Design Goals  Collect CPU utilization stats on cluster calculations  No changes to user end processes  No significant performance degradation  No side effects (Leave No Trash Behind)  Monitor even non-MPI parallel codes (Gaussian 03)  Generality and robustness for reuse potential

9 Components  monitor (new C++ MPI program)  mpirun New wrapper around existing mpirun Calls existing monitor and “real” mpirun  g03sub existing batch script to launch Gaussian jobs on cluster MCSR’s version previously “virtualized” modify to now call monitor program also

10 monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes

11 monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes

12 Worker Process Logic monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/Else End While Terminate

13 Worker Process Logic monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes /tmp/ps_file While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/Else End While Terminate

14 Worker Process Logic monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes /tmp/ps_file While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/Else End While Terminate

15 Worker Process Logic monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes /tmp/ps_file While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/Else End While Terminate pid cputime 123 06s 124 12s 130 29s = 47s total

16 Worker Process Logic monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/Else End While Terminate pid cputime 123 06s 124 12s 130 29s = 47s total 47 9

17 Worker Process Logic monitor.exe myprogram.exe monitor.exe myprogram.exe monitor.exe myprogram.exe Manager Process Worker Processes While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/Else End While Terminate Idle

18 Manager Process Logic monitor.exe myprogram.exe Manager Process While (Active Processes) MONITOR_LOCAL_PROCESSES If (LocalActiveProcesses) UpdateGlobalCPUTimeStructure UpdateActiveProcessesStructure Else UpdateActiveProcessStructure EndIf ForEachSlave WaitForMessage If (CPUMessage) UpdateGlobalCPUTimeStructure UpdateActiveProcessStructure Else If (IdleMessage) UpdateActiveProcessStructure End If End For EndWhile WKR cputime 0 25s 1 35s 2 09s 3 47s

19 Test MPI Script  Parallel Ultimate Virtual Collapse Program Reads a list of integers from a file Distributes the integers to all available worker nodes Each worker computers the ultimate collapse of its numbers  Control the length of processing time by: Number of numbers in the list (1,000,000) The size of the numbers in the list (1 to 7 digits)  Control the parallel efficiency by: The order of the numbers in the list. Larger numbers grouped together – fewer nodes to most of the work Large numbers evenly distributed – nodes do about the same work

20 Project Status  Test Program is Written (Ultimate Collapse)  Monitor program: Partially Complete; Some Work Remains Sleep/Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/Else Terminate

21 ps syntax from monintor.cpp string psCommand (" ps -u " + username + " --no-headers -o pid,cputime,etime,comm,user,c,pcpu | grep -v ps | grep -v sh | grep mpirun | grep -v mon.exe | grep –v grep >> " + myFileName); system(psCommand.c_str());

22 Example /tmp/ps_file from node pidcputimeetimecomm user c pcpu 32765 00:00:00 02:32 a.out jghale 0 0.0 32764 00:00:00 02:32 a.out jghale 0 0.0 305 00:00:00 02:32 a.out jghale 0 0.0 300 00:02:31 02:32 a.out jghale 99 99.8


Download ppt "Virtual mpirun Jason Hale Engineering 692 Project Presentation Fall 2007."

Similar presentations


Ads by Google