Using HPC for Ansys CFX and Fluent John Zaitseff, April 2015 High Performance Computing
The problem in computer labs Multiple computers locked for long periods of time Often just a handful of students All computers running Ansys CFX or Fluent Often randomly rebooted by other students and/or staff Cannot get a computer when you need it Can lose results when you do Image credit: John Zaitseff, UNSW
The solution: High Performance Computing “High performance computing is used to solve real-world problems of significant scale or detail across a diverse range of disciplines including physics, biology, chemistry, geosciences, climate sciences, engineering and many others.” — Intersect Australia http://www.intersect.org.au/content/time-fast-computing Image credit: IBM Blue Gene P supercomputer, Argonne National Laboratory
High Performance Computing architecture Massively Parallel Distributed Computational Clusters Many individual servers (“nodes”): dozens to thousands Multiple processors per node: between 8 and 64 cores Interconnected by fast networks Almost always run Linux In our case: Rocks Linux Distribution on top of CentOS 6.x The Leonardi cluster Image credit: John Zaitseff, UNSW
High Performance Computing architecture Internet Head Node Storage Node Internal Network Switch Chassis 1 Compute Node 1-1 Compute Node 1-2 Compute Node 1-3 Compute Node 1-4 Compute Node 1-n Chassis m Compute Node m-1 Compute Node m-2 Compute Node m-3 Compute Node m-4 Compute Node m-n Compute Node 1 Compute Node 2 Compute Node 3 Compute Node 4 Compute Node n
Facilities for MECH students and staff The Newton cluster For undergraduate students, postgraduates and staff MECH9620, MECH4100, MMAN4010, MMAN4020, AERO4110 and AERO4120 students already have an account! The Trentino cluster For postgraduate students and staff By application The Leonardi cluster UNSW R1 Data Centre Image credit: John Zaitseff, UNSW
The Newton cluster: newton.mech.unsw.edu.au 10 × Dell R415 server nodes Head node: newton Compute nodes: newton01 to newton09 160 × AMD Opteron 4386 3.1GHz processor cores Two physical processors per node Eight CPU cores per processor Only four floating-point units per processor 320 GB of main memory (32 GB per node) 12 TB of storage: 6 × 3 TB drives in RAID 6 1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The Newton cluster Image credit: John Zaitseff, UNSW
The Trentino cluster: trentino.mech.unsw.edu.au 16 × Dell R815 server nodes Head node: trentino Compute nodes: trentino01 to trentino15 1024 × AMD Opteron 6272 2.1GHz processor cores Four physical processors per node Sixteen CPU cores per processor Only eight floating-point units per processor 2048 GB of main memory (128 GB per node) 30 TB of storage: 12 × 3 TB drives in RAID 6 4×1Gb Ethernet network interconnect http://cfdlab.unsw.wikispaces.net/ The Trentino cluster Image credit: John Zaitseff, UNSW
The Leonardi cluster: leonardi.eng.unsw.edu.au 7 × HP BladeSystem c7000 blade enclosures 1 × HP ProLiant DL385 G7 server: leonardi 56 × HP BL685c G7 compute nodes Compute nodes: ec01b01-ec07b08 2944 × AMD Opteron 6174 2.2GHz processor cores and Opteron 6276 2.3GHz processor cores Four physical processors per node Twelve or sixteen CPU cores per processor 8448 GB of main memory (96–512 GB per node) 93.5 TB of storage: 70 × 2 TB drives in RAID 6+0 2×10Gb Ethernet network interconnect http://leonardi.unsw.wikispaces.net/ Nodes in the Leonardi cluster Image credit: John Zaitseff, UNSW
The Raijin cluster: raijin.nci.org.au 3592 × Fujitsu blade server nodes Multiple login nodes Multiple management nodes 57,472 Intel Xeon E5-2670 2.60GHz processors 160 TB of main memory 10 PB of storage using the Lustre distributed file system 56Gb Infiniband FDR network interconnect http://nci.org.au/nci-systems/national-facility/peak-system/raijin/ Image credit: National Computational Infrastructure
Connecting to a HPC system Use the Secure Shell protocol (SSH) Under Linux or Mac OS X: ssh username@hostname (for example, ssh z9693022@newton.mech.unsw.edu.au) Under Windows: PuTTY (Start » All Programs » PuTTY » PuTTY) Can install Cygwin: “that Linux feeling under Windows” To connect to the Newton cluster: Hostname: newton.mech.unsw.edu.au Check RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce User name: your zID Password: your zPass You will get a command line prompt: something like To exit, type exit and press ENTER. z9693022@newton:~ $
Simple Linux commands List files in a directory: ls [options] [pathname ...] [ ] indicates optional parameters, ... indicates one or more parameters Italic fixed-width font indicates replaceable parameters Options include “-l” (letter L) for a long (detailed) listing To show the current directory: pwd To change directories: cd directory ~ is the home directory . is the current directory .. is the directory above the current one ~user is the home directory of user user Subdirectories are separated by “/”, e.g., /home/z9693022/src To create directories: mkdir directory To remove an empty directory: rmdir directory To get help for a command: man command
More simple Linux commands To output one or more file’s contents: cat filename ... To view one or more files page by page: less filename ... To copy one file: cp source destination To copy one or more files to a directory: cp filename ... dir To preserve the “last modified” time-stamp: cp -p To copy recursively: cp -pr source destination To move one or more files to a different directory: mv filename ... dir To rename a file or directory: mv oldname newname To remove files: rm filename ... Recommendation: use “ls filename ...” before rm or mv: what happens if you accidentally type “rm *”? or “rm * .c”? (note the space!)
Transferring files To copy files to a Linux or Mac OS X system: use scp, rsync or insync To copy files to and from a Windows machine: use WinSCP (Start » All Programs » WinSCP » WinSCP), or scp or rsync under Cygwin To copy files to and from the Newton cluster: Host name newton.mech.unsw.edu.au Check RSA2 fingerprint: 69:7e:64:75:57:67:ad:4c:21:8e:90:7d:8e:97:70:ce User name: your zID Password: your zPass Using WinSCP, simply drag and drop files from one pane to the other.
Editing files Use an editor to edit text files Many choices, leading to “religious wars”! Some options: GNU Emacs, Vim, Nano Nano is very simple to use: nano filename CTRL-X to exit (you will be asked to save any changes) GNU Emacs and Vim are highly customisable and programmable For example, see the file ~z9693022/.emacs Debra Cameron et al., Learning GNU Emacs, 3rd Edition, O’Reilly Media, December 2004. ISBN 9780596006488, 9780596104184 Arnold Robbins et al., Learning the vi and Vim Editors, 7th Edition, O’Reilly Media, July 2008. ISBN 9780596529833, 9780596159351
Running Ansys CFX jobs Set up your job using Ansys CFX as per normal Connect to the Newton cluster using PuTTY Create a directory for this particular job Transfer the .cfx and .def files to that directory using WinSCP Create an appropriate script file Submit the job to the Newton queue Periodically check the status of the job Once finished, transfer the .out and .res files to your desktop computer Check the results using the standard Ansys CFX tools Image credit: The Ansys Blog at http://www.ansys-blog.com/
Steps 1 to 4: Setting up the job Set up your job using Ansys CFX as per normal May use the laboratory computers to do this Connect to the Newton cluster using PuTTY Connect to newton.mech.unsw.edu.au Create a directory for this particular job Use the mkdir directory command Come up with a consistent naming scheme Structure your directories; use subdirectories as required Transfer the .cfx and .def files to that directory using WinSCP Connect to newton.mech.unsw.edu.au as before
Step 5: Create a script file Change to the newly-created directory: cd directory Invoke the text editor to create a script file: nano filename.sh Add the following text, replacing parameters as required: #!/bin/bash #SBATCH --time=0-12:00:00 # for 0 days 12 hours #SBATCH --mem=30720 # 30GB memory #SBATCH --ntasks=1 # A single job #SBATCH --cpus-per-task=16 # 16 processor cores #SBATCH --mail-user=emailaddr@unsw.edu.au # or @student.unsw.edu.au #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR module load cfx/15.0 # or cfx/14.5 as appropriate cfx5solve -batch -def filename.def -part 16 \ -start-method "Platform MPI Local Parallel" Save the file by pressing CTRL-X and following the prompts
Steps 6 to 7: Submit and check on the job Once you have created the filename.sh script file, submit it into the Newton queue: Make sure you are in the correct directory Submit the job: sbatch filename.sh Take note of the job number: “Submitted batch job jobid” Once submitted, you do not need to be connected to the cluster Periodically check on the job status The job will start as soon as resources are available for it to run Emails will be sent to you on job start and completion Show queue status: squeue or squeue -l (letter L) Show node status: sinfo Cancel a running or queued job: scancel jobid
Running Ansys Fluent jobs Similar to running CFX jobs on the cluster Different files need to be transferred to and from the cluster Script file is also slightly different: #!/bin/bash #SBATCH --time=0-12:00:00 # for 0 days 12 hours #SBATCH --mem=30720 # 30GB memory #SBATCH --ntasks=1 # A single job #SBATCH --cpus-per-task=16 # 16 processor cores #SBATCH --mail-user=emailaddr@unsw.edu.au # or @student.unsw.edu.au #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR module load fluent/15.0 # or fluent/14.5 as appropriate fluent 3d -g -t16 -ssh <inputfilename.txt >outputfilename.txt # may replace “3d” with “2d” for two-dimensional meshes
Getting help with HPC John Zaitseff Whom to ask for help? Your colleagues Your supervisor/lecturer The HPC representative John Zaitseff J.Zaitseff@unsw.edu.au Available for consultations on Tuesdays 9:30am–4pm by appointment only. Image credit: John Zaitseff, UNSW