Matlab on the Cray XE6 Beagle Beagle Team Computation Institute University of Chicago & Argonne National Laboratory
2 Matlab on Beagle – Outline Introduction to high performance computing Some relevant facts about Beagle’s hardware Basics about the work environment Data transfer using Globus Online Use of the compilers (C, C++, and Fortran) Launching and monitoring applications Using Matlab on Beagle
What the Heck is Supercomputing? Credit: Henry Neeman, Director OU Supercomputing Center for Education & Research Contact:
4 Matlab on Beagle – Why Beagle? Not the kind of problem we can handle with Matlab at this point Not the kind of problem we can handle with Matlab at this point
5 Matlab on Beagle – What affects performance? Accessing data Examples: 1.Data array too big to fit into cache (12 MB), we need to use main memory (32 GB) 2.An image too big to fit into memory (32 GB), use of disk space or distributed memory (23 TB) 3.Too many genomes to fit on local storage (~ max 50 TB per user), use of network disks Examples: 1.Data array too big to fit into cache (12 MB), we need to use main memory (32 GB) 2.An image too big to fit into memory (32 GB), use of disk space or distributed memory (23 TB) 3.Too many genomes to fit on local storage (~ max 50 TB per user), use of network disks
6 Matlab on Beagle – What affects performance? Repetition Examples: 1.Unrelated experiments (e.g., CT image reconstruction and molecular dynamics modeling) can be run at the same time 2.Each genome in a experiment can be analyzed independently 3.Slices or sub-images can be processed at the same time Examples: 1.Unrelated experiments (e.g., CT image reconstruction and molecular dynamics modeling) can be run at the same time 2.Each genome in a experiment can be analyzed independently 3.Slices or sub-images can be processed at the same time
7 Matlab on Beagle – Matlab examples: 1)If analyzing a single image is time consuming (or images are large): slices or sub-images can be processed at the same time using different threads (e.g., with parallel tools, but not working yet) 2)If images are small: different threads can analyze different images (not really shared memory, just in the same memory) Matlab examples: 1)If analyzing a single image is time consuming (or images are large): slices or sub-images can be processed at the same time using different threads (e.g., with parallel tools, but not working yet) 2)If images are small: different threads can analyze different images (not really shared memory, just in the same memory)
8 Matlab on Beagle –
9 Matlab on Beagle –
Matlab on Beagle –
Some relevant facts about Beagle’s hardware Contact:
Matlab on Beagle – Beagle: hardware overview
Matlab on Beagle – Beagle “under the hood”
Matlab on Beagle – Compute nodes 2 AMD Opteron 6100 “Magny-Cours” 12-core (24 per node) 2.1-GHz 32 GB RAM (8 GB per processor) No disk on node (mounts DVS and Lustre network filesystems) Compute nodes 2 AMD Opteron 6100 “Magny-Cours” 12-core (24 per node) 2.1-GHz 32 GB RAM (8 GB per processor) No disk on node (mounts DVS and Lustre network filesystems) To know more: To know more:
Matlab on Beagle – Details about the Processors (sockets) Superscalar: 3 Integer ALUs 3 Floating point ALUs (can do 4 FP per cycle) Cache hierarchy: Victim cache 64KB L1 instruction cache 64KB L1 data cache (latency 3 cycles) 512KB L2 cache per processor core (latency of 9 cycles) 12MB shared L3 cache (latency 45 cycles) To know more: To know more:
Basics about the work environment Contact:
Matlab on Beagle – Beagle’s operating system Cray XE6 uses Cray Linux Environment v3 (CLE3) SuSE Linux-based Compute nodes use Compute Node Linux (CNL) Login and sandbox nodes use a more standard Linux The two are different (relevant to Matlab). Compute nodes can operate in – ESM (extreme scalability mode) to optimize performance to large multi-node calculations – CCM (cluster compatibility mode) for out-of-the-box compatibility with Linux/ x86 versions of software – more or less without recompilation or relinking! (It doesn’t work yet ) To know more: To know more:
Matlab on Beagle – Beagle’s filesystems /lustre/beagle: local Lustre filesystem (read- write) -- this is where all input and output files should be; however, NO BACKUP! /gpfs/pads: PADS GPFS (read-write) – for permanent storage /home: CI home directories, largely useless you can’t write there from the compute nodes! To know more: To know more:
Matlab on Beagle – How to move data to and from Beagle Beagle is not HIPAA-compliant — no PHI data on Beagle Example of factors for choosing a data movement tool: – how many files, how large the files are … – how much fault tolerance is desired, – performance – security requirements, and – the overhead needed for software setup. Recommended tools: – scp/sftp can be OK for moving a few small files (< a couple of GB) o pros: quick to initiate o cons: slow and not scalable – For optimal speed and reliability we recommend Globus Online : o high-performance (e.g., fast) o reliable and easy to use o easy to use from either a command line or web browser, o provides fault tolerant, fire-and-forget transfers. If you know you'll be moving a lot of data or find scp is too slow/unreliable we recommend To know more: To know more:
Matlab on Beagle – Applications on Beagle Applications on Beagle are (mostly) run from the command line, e.g.: aprun –n myapp & this.log How do I know if an application is on Beagle? – – – On Beagle, use module avail, e.g.: module avail 2>&1 | grep –i matlab Matlab/7.13(default)
Matlab on Beagle – Applications on Beagle GUIs are in general not supported (true for both for Matlab and Simulink) Licensing is similar to any other uchicago.edu machine – Packages charged by number of cores can be expensive on Beagle and aren’t usually supported – Packages which have a campus license can be simply installed and used on Beagle Octave is available at no charge and can in principle be installed (per serious request) on Beagle even if porting is not easy
Matlab on Beagle Contact:
Matlab on Beagle – Matlab on Beagle: GUI The Matlab GUI is not supported and most likely will not be in the future: – According to our experience standard Matlab is not very effective in exploiting massively parallel supercomputers such as Beagle – Parallel tools has the potential to at least overcome some of these issues, but licensing and other practical issues render this approach practically unfeasible at this time – If you have suggestions about how to use the GUI and parallel tools, let us know.
Matlab on Beagle – Matlab on Beagle: Compile code However, compiled executables from Matlab code can be easily run on Beagle: – MATLAB programs should be compiled using mcc (Matlab compiler) and run as command line executables with MCR (Matlab Compiler Runtime). In our experience, Matlab has shown very limited ability in exploiting effectively multi-core processors. – Therefore, to exploit parallelism, executables are compiled single-threaded and run in parallel using a scripting language such as a bash shell or a Swift. – We are working at including parallel tools into the compiled programs, but we have no working solution at this point. – Suggestions?
Matlab on Beagle – Compiling Matlab: Matlab code The Matlab enviroment can compile any Matlab function of the form foofunc(x1,x2,...,xn) Matlab functions can call other Matlab functions from other files, usually leaving them in the compilation directory will be sufficient Calling parameters ( x1, x2, …,xn above) become arguments for the executable. However, those arguments will be considered as strings and will need to be edited as (if arguments are numbers!): if (isdeployed) x1 = str2num(x1); x2 = str2num(x2);... xn = str2num(xn); end
Matlab on Beagle – Compiling Matlab: mcc and MCR The Matlab compiler (mcc) produces executables that in order to run require the Matlab Compiler Runtime (MCR) — a set of shared libraries that enables the execution of Matlab files without an installed version of Matlab or a license. The mcc compiler is loaded with the command module load matlab See also
Matlab on Beagle – Compiling Matlab: mcc and MCR Compilation can be done on other systems, as long as the MCR version corresponding to the mcc used to compile is installed on Beagle. Specific versions MCR can be installed by users in the directories on lustre. Please contact us if you encounter problems while trying to do it. Currently MCR is available as – /soft/matlab/7.13/ – /soft/mcr/v714/ – (if you require other versions let us know).
Matlab on Beagle – Compiling Matlab on Beagle: mcc options We recommend users to compile with mcc -R -singleCompThread -R -nojvm -R -nodisplay -mv myapp.m -o my_app -m generates a standalone application -v option (verbose) displays all the the compilation steps -- e.g., it helps identify which third-party compiler is used and what environment variables are referenced -R specifies run-time options for MCR – -R -nojvm disables the java virtual machine – -R -nodisplay eliminates functions that would produce a display). – -R -singleCompThread runs MCR single threaded At this stage, it does not appear that there is a way to control how MATLAB creates threads or that it can run a multi-threaded program efficiently on a 24-core Cray XE6 node (MATLAB checks directly /proc/cpuinfo to determine how many cores are available for a calculation and uses all of them, independently from the instructions given by the aprun command) To know more: To know more:
Matlab on Beagle – Matlab on Beagle: mcc output After the compilation, a number of files will be generated: – mccExcludedFiles.log : don’t worry about this one – my_app: the executable you will need to copy to Beagle – readme.txt : contains information, for example where is the version of MCRInstaller.bin for your specific MATLAB, which you will need if different from the ones available on Beagle – run_my_app.sh : a shell script that can is used to run each copy of my_app. We recommend that you use it to avoid having to take care of too many variables in your PBS scripts. However, you will need to modify those scripts when using them on Beagle, see next page
Matlab on Beagle – Matlab on Beagle: changes to run_my_app.sh To prevent the various scripts from blocking each other, add you can add something like the following lines at the beginning of the script, right after the initial comments (series of lines starting with "#") #Added to run on Beagle after August 2011 #TMP must be defined by the calling PBS script tmp=`mktemp -d $TMP/matlabcachedir.XXXXXXXXXXX` echo $tmp export MCR_CACHE_ROOT=$tmp; # end added In order to remove the temporary cache directories, after the line eval "${exe_dir}”/my_app $args add #Added to run on Beagle after August 2011 rm -rf $tmp #end added
Matlab on Beagle – Using Matlab on Beagle: scripting Run multiple copies of single-threaded run_my_app.sh using a scripting language: – Bash shell + PBS (batch submission) – Swift Remember that Beagle provides only 32GB per node, any request above that value will produce an Out of Memory (OOM) error, which will result in the termination of the process: be mindful about how much you “pack” calculations
Matlab on Beagle – Using Matlab on Beagle: Bash + PBS #!/bin/bash #PBS -N myTestMatlab #PBS -l walltime=0:10:00 #PBS -l mppwidth=24 #PBS -j oe # Load modules and set for dynamic environment. /opt/modules/ /init/bash # Sets the shared library environment export CRAY_ROOTFS=DSL # set the env variable where the root of MRC is (you might need to change this if you need a specific version of MCR) #export MCRROOT=/soft/mcr/v714 export MCRROOT=/soft/matlab/7.13/ # Create, if necessary, a directory on /lustre to run the simulations LUSTREDIR=/lustre/beagle/`whoami`/testMatlab/magicsquare${PBS_JOBID} mkdir -p $LUSTREDIR # Set up TMP and a cache root dir for MCR, it won't work if it isn't set LUSTRETMP=${LUSTREDIR}/${PBS_JOBID}/tmp mkdir -p $LUSTRETMP export TMP=$LUSTRETMP export MCR_CACHE_ROOT=$LUSTRETMP # copy the file to the run dir and run the code cd $PBS_O_WORKDIR cp run_my_app.sh my_app $LUSTREDIR cd $LUSTREDIR aprun -b -n 1 -d 1./run_my_app.sh $MCRROOT 5 &>test_my_app.log To know more (e.g., packing and loops): To know more (e.g., packing and loops):
Matlab on Beagle – Matlab on Beagle: note We are happy to help you use a scripting language effectively: – Bash shell – Swift (PRESENTATION ABOUT IT follows) In general Matlab compiled executables do not use Beagle very efficiently (both in terms of CPU and memory) and this should be considered carefully when planning large calculations. Let us know if we can help with any of the steps involved into using Matlab on Beagle
Matlab on Beagle – Acknowledgments BSD for funding most of the operational costs of Beagle A lot of the images and the content has been taken or learned from Cray documentation or their staff (Dave Strenski, mostly) Globus for providing us with many slides and support; special thanks to Mary Bass, manager for communications and outreach at the CI. NERSC and its personnel provided us with both material and direct instruction; special thanks to Katie Antypas, group leader of the User Services Group at NERSC All the people at the CI who supported our work, from administrating the facilities to taking pictures of Beagle Beagle users who helped with the content about using Matlab and Python
Thanks! We look forward to working with you. Questions? (or later: