Download presentation
Presentation is loading. Please wait.
Published byMiloslava Králová Modified over 6 years ago
1
High Performance Computing in Bioinformatics
Medical Center Information Technology High-Performance Computing Core September 7th 2018 Michael Costantino 1
2
Introduction High Performance Computing (HPC) generally refers to the practice of aggregating computing power in a way that delivers much higher performance than what one could get out of a typical desktop computer. In NYU Langone the High Performance Computing (HPC) Core is the resource for performing computational research at scale and for analyzing big data. HPC is a Shared Resource. BigPurple is a HPC cluster running on Redhat 7.4 Operating system. It is comprised of 90 compute nodes (40 CPU cores each) of which 32 include the total of 156 GPU cards; 16 service nodes; InfiniBand EDR 100Gb interconnect. The attached storage runs on GPFS file system with .5 PB flash scratch (non replicated) and 4 PB spinning disk permanent (replicated). Bright Computing is the cluster manager suite. Slurm is chosen as the resource manager and Job scheduler with the “cgroup” feature deployed on BigPurple. 2 2
3
Network: Infiniband, EDR, 100Gb/s
BigPurple SSH DMN HN AN CN GN LN FN Network: Infiniband, EDR, 100Gb/s East Coast DATA WSN Backup HN: 2 Head Nodes LN: 4 Login Nodes AN: 7 Auxiliary Nodes CN: 54 Compute Nodes GN: 25 4-GPU Nodes GPU: 7 8-GPU Nodes FN: 4 FAT Nodes DMN: 4 Data Mover Nodes WSN: 1 Web Services Nodes GPU West Coast Storage: GPFS file system, 7.5 PB, 530 TB SSD scratch, Mirrored to east cost, Quota, snapshot, archive, ….
4
Node Specifications Head Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 384 GB RAM, EDR Infiniband interconnected, 2TB SATA and 2TB SSD local disks. Login Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 384 GB RAM, EDR Infiniband interconnected and 40Gb/s outbound connect, 2TB SATA and 2TB SSD local disks. Auxiliary Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 384 GB RAM, EDR Infiniband interconnected and 40Gb/s outbound connect, 2TB SATA and 2TB SSD local disks. Compute Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 384 GB RAM, EDR Infiniband interconnected, 2TB SATA and 2TB SSD local disks. 4-way GPU Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 384 GB RAM, 4 NVIDIA® Tesla V100 GPUs, GPU-to-GPU NVLINK, Supports GPUDirect RDMA , EDR Infiniband interconnected, 2TB SATA and 2TB SSD local disks. 8-way GPU Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 768 GB RAM, 8 NVIDIA® Tesla V100 GPUs, GPU-to-GPU NVLINK, Supports GPUDirect RDMA , EDR Infiniband interconnected, 2TB SATA and 4TB SSD local disks. FAT Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 1536 GB RAM, EDR Infiniband interconnected, 2TB SATA and 4TB SSD local disks. Data Mover Nodes: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 384 GB RAM, EDR Infiniband interconnected and 40Gb/s outbound connect, 2TB SATA and 2TB SSD local disks. Web Services Node: 2×intel Xion Skylake 2.4 GHz (total of 40 cores), 384 GB RAM, EDR Infiniband interconnected and 40Gb/s outbound connect, 2TB SATA and 2TB SSD local disks.
5
Accessing BigPurple Login to The System
Linux/Mac Open Terminal / Command Line ssh -X Windows Download install Putty Hostname=bigpurple.nyumc.org User name = NYU Langone Kerberos ID Password = NYU Langone password (same as ) More Information at:
6
Splash Screen
7
Course Environment Home Directory: /gpfs/home/<username>
Scratch Space: /gpfs/scratch/<username> Course Directory: /gpfs/data/courses/bmsc4449 Examples (RO): /gpfs/data/courses/bmsc4449/examples Sample Datasets (RO): /gpfs/data/courses/bmsc4449/samples Course Shared (RW): /gpfs/data/courses/bmsc4449/shared Course Group: course_bmsc4449
8
Utilizing BigPurple Plan your job: Test it and beware about how many cores, how much RAM it needs and also estimate how long it takes for the job to finish. Then scale up gradually. Login to the system: ssh –X Unless you want to move data, run a gui, use an auxilary node, …. Navigate and Prepare: Linux commands: cd, ls, mkdir, cp, rm, cat, which, head, tail, …, Familiarize yourself with man pages. Familiarize yourself with an editor in linux (nano). Beware of hidden characters when you cp/paste in a Windows environment Determine where (using scratch disk, stage-in-stage-out,…)and how (parallel, serial, interactive, queued, GUI, …) you want to run your job. Setup your environment: Setup your needed modules. module list, module avail, module load, module rm, …. Utilize queuing system slurm to submit, control and monitor your job: submision script, partitions, fare-share,….
9
Slurm Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Job submission, job monitoring, job control: sinfo, sbatch, srun, scontrol, squeue, sacct, … Partitions, fair-share Interactive jobs: srun -p cpu_short --mem-per-cpu=4G —time=1:00:00 --x11 --pty bash
10
Hands-on Lab ssh –X siavoa01@bigpurple.nyumc.org pwd ls mkdir project1
cd project1 nano #!/bin/bash #SBATCH --job-name=serial_job_test # Job name #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH # Where to send mail #SBATCH --ntasks=1 # Run on a single CPU #SBATCH --mem=1gb # Job memory request #SBATCH --time=00:05:00 # Time limit hrs:min:sec #SBATCH --output=serial_test_%j.log # Standard output and error log pwd; hostname; date sleep 300 date
11
module avail module load your_required_module module list sinfo squeue –u your_kerberos_id sbatch batch.job scontrol show jobid=job_id scontrol update JobId=job_id Partition=cpu_short
12
Examples Login to BigPurple 2. Execute the Following Commands:
cd /gpfs/data/courses/bmsc4449/examples/0_PrepareEnvironment/ ls cat README
13
Points To Remember A good source of information:
HPC Administrators for problem/questions: when you are sending us an please include the error and the command you issued. It is vital for troubleshooting. Remember that this is a shared resource. Login nodes are not for running jobs, instead use compute nodes, data mover nodes, auxiliary nodes,….
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.