ARCHER Advanced Research Computing High End Resource

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance.
PRAKTICKÝ ÚVOD DO SUPERPOČÍTAČE ANSELM Infrastruktura, přístup a podpora uživatelů David Hrbáč
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
HIGH PERFORMANCE COMPUTING ENVIRONMENT The High Performance Computing environment consists of high-end systems used for executing complex number crunching.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
JGI/NERSC New Hardware Training Kirsten Fagnan, Seung-Jin Sul January 10, 2013.
Zeus Users’ Quickstart Training January 27/30, 2012.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Research Support Services Research Support Services.
Introduction to HPC resources for BCB 660 Nirav Merchant
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Introduction to the HPCC Jim Leikert System Administrator High Performance Computing Center.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.
MaterialsHub - A hub for computational materials science and tools.  MaterialsHub aims to provide an online platform for computational materials science.
The Cray XC30 “Darter” System Daniel Lucio. The Darter Supercomputer.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
Lab System Environment
Common Practices for Managing Small HPC Clusters Supercomputing 12
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
1 Cray Inc. 11/28/2015 Cray Inc Slide 2 Cray Cray Adaptive Supercomputing Vision Cray moves to Linux-base OS Cray Introduces CX1 Cray moves.
DoC Private IaaS Cloud Thomas Joseph Cloud Manager
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
2011/08/23 國家高速網路與計算中心 Advanced Large-scale Parallel Supercluster.
Third-party software plan Zhengji Zhao NERSC User Services NERSC User Group Meeting September 19, 2007.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Intermediate Parallel Programming and Cluster Computing Workshop Oklahoma University, August 2010 Running, Using, and Maintaining a Cluster From a software.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
GRID COMPUTING.
Specialized Computing Cluster An Introduction
Welcome to Indiana University Clusters
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
HPC usage and software packages
Welcome to Indiana University Clusters
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Architecture & System Overview
MaterialsHub - A hub for computational materials science and tools.
HPCC New User Training Getting Started on HPCC Resources
Cray Announces Cray Inc.
Welcome to our Nuclear Physics Computing System
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Welcome to our Nuclear Physics Computing System
Advanced Computing Facility Introduction
Overview of HPC systems and software available within
Introduction to High Performance Computing Using Sapelo2 at GACRC
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk

Website Location http://www.archer.ac.uk support@archer.ac.uk

Machine overview About ARCHER ARCHER (a Cray XC30) is a Massively Parallel Processor (MPP) supercomputer design built from many thousands of individual nodes. There are two basic types of nodes in any Cray XC30: Compute nodes (4920) These only do user computation and are always referred to as “Compute nodes” 24 cores per node, therefore approx 120,000 cores Service/Login nodes (72/8) Login nodes – allow users to log in and perform interactive tasks Other misc service functions Serial/Post-Processing Nodes (2)

Interacting with the system User guide Users do not log directly into the system. Instead they run commands via an esLogin server. This server will relay commands and information via a service node referred to as a “Gateway node” Compute node LNET Nodes Gateway esLogin Lustre OSS Cray Aries Interconnect Cray XC30 Cabinets Cray Sonnexion Filesystem External Network Infiniband links Ethernet Serial node

Job submission example Quick start guide #!/bin/bash --login #PBS -l select=2 #PBS -N test-job #PBS -A budget #PBS -l walltime=0:20:0 # Make sure any symbolic links are resolved to absolute path export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR) aprun -n 48 -N 24 ./hello_world Test-job.o50818 Test-job.e50818 my_job.pbs PBS QUEUE Compute node Compute node Compute node Compute node Compute node Compute node nbrown23@eslogin008:~> qsub my_job.pbs 50818.sdb nbrown23@eslogin008:~> nbrown23@eslogin008:~> qstat –u $USER 50818.sdb nbrown23 standard test-job 29053 2 48 -- 00:20 R 00:00 nbrown23@eslogin008:~> qstat –u $USER 50818.sdb nbrown23 standard test-job -- 2 48 -- 00:20 Q --

Compute node architecture and topology ARCHER Layout Compute node architecture and topology

Cray XC30 node The XC30 Compute node features: Cray XC30 Compute Node NUMA Node 1 NUMA Node 0 Intel® Xeon® 12 Core die Aries Router Aries NIC 32GB PCIe 3.0 Aries Network QPI DDR3 The XC30 Compute node features: 2 x Intel® Xeon® Sockets/die 12 core Ivy Bridge 64GB in normal nodes 128GB in 376 “high memory” nodes 1 x Aries NIC Connects to shared Aries router and wider network

XC30 Compute Blade Cray Proprietary

Cray XC30 Rank1 Network Chassis with 16 compute blades 128 Sockets This is the All-to-all rank-1 topology. The little animation shows that for a message going from a node from Aries 3 to a node on Aries 11, the shortest route is one hop. It also illustrates how each packet can be routed adaptively on a non-minimal, 2-hop route. Point out that packets are at most 64 bytes of data and routing decisions are made on a packet-by-packet basis. Chassis with 16 compute blades 128 Sockets Inter-Aries communication over backplane Per-Packet adaptive Routing

Cray XC30 Rank-2 Copper Network 4/26/2017 Cray XC30 Rank-2 Copper Network 2 Cabinet Group 768 Sockets 6 backplanes connected with copper cables in a 2-cabinet group: This is a slide that helps you build up the topology of Dragonfly within a group. I would point out that each black wire actually represents 3 routes between the Aries. That is because the copper cables carry 3 router-tiles worth of traffic. 16 Aries connected by backplane Active optical cables interconnect groups 4 nodes connect to a single Aries

Copper & Optical Cabling Connections Copper Connections

ARCHER Filesystems Brief Overview

Nodes and filesystems Compute Nodes Login/PP Nodes RDF /home /work

ARCHER Filesystems /home (/home/n02/n02/<username>) User guide /home (/home/n02/n02/<username>) Small (200 TB) filesystem for critical data (e.g. source code) Standard performance (NFS) Fully backed up /work (/work/n02/n02/<username>) Large (>4 PB) filesystem for use during computations High-performance, parallel (Lustre) filesystem No backup RDF (/nerc/n02/n02/<username>) Research Data Facility Very large (26 PB) filesystem for persistent data storage (e.g. results) High-performance, parallel (GPFS) filesystem Backed up via snapshots

Research Data Facility RDF guide Mounted on machines such as: ARCHER (service and PP nodes) DiRAC Bluegene/Q (frontend nodes) Data Transfer Nodes (DTN) Jasmin Data Analytic Cluster (DAC) Run compute, memory, or IO intensive analyses on data hosted on the service. Nodes are specifically tailored for data intensive work with direct connections to the disks. Separate from ARCHER but very similar architecture

ARCHER Software Brief Overview

Cray’s Supported Programming Environment Programming Languages Fortran C C++ Programming models Distributed Memory (Cray MPT) MPI SHMEM PGAS & Global View UPC (CCE) CAF (CCE) Chapel Shared Memory OpenMP 3.0 OpenACC Compilers CrayPat Cray Apprentice2 Tools Environment setup Debuggers Modules Allinea (DDT) lgdb Debugging Support Tools Abnormal Termination Processing Performance Analysis STAT Scoping Analysis Reveal Optimized Scientific Libraries LAPACK ScaLAPACK BLAS (libgoto) Iterative Refinement Toolkit Cray Adaptive FFTs (CRAFFT) FFTW Cray PETSc (with CASK) Cray Trilinos (with CASK) I/O Libraries NetCDF HDF5 Cray Compiling Environment (CCE) GNU 3rd Party Compilers Intel Composer Python This is the PE Software that is packaged and shipped today. Third party compilers, such as the Intel and PGI compilers also work on the Cray, but users have to get the compiler directly from the provider. Other Libraries and tools, such as Valgrind, Vampir, work on the Cray system, but are not packaged and distributed with the Cray PE Cray developed Licensed ISV SW 3rd party packaging Cray added value to 3rd party Cray Inc. ConfidentialCray Proprietary Cray Inc. ConfidentialCray Proprietary 17

Module environment Software is available via the module environment Best practice guide Software is available via the module environment Allows you to load in different packages and different versions of packages Deals with potential library conflicts This is based around the module command List currently loaded modules: module list List all modules: module available Load a module: module load x Unload a module: module unload x

Service Administration https://www.archer.ac.uk/safe ARCHER SAFE Service Administration https://www.archer.ac.uk/safe

SAFE SAFE user guide SAFE is an online ARCHER management system which all users have an account on Request machine accounts Reset passwords View resource usage Primary way in which PIs manage their ARCHER projects Management of project users Track user’s project usage Email users of the project

Project resources Machine usage is charged in kAUs. Disk quotas User guide Machine usage is charged in kAUs. This is time running your jobs on each compute node, 0.36 kAUs for a node hour. There is no usage charge for time spent working on the login nodes, post processing nodes or RDF DAC You can track usage via the SAFE or the budgets command (calculated daily.) Disk quotas There is no specific charge made for disk usage, but all projects have quotas If you need more disk space then contact the PI or us if you manage the project

To conclude…. You will be using ARCHER during this course If you have any questions then let us know The documentation on the archer website is a good reference tool Especially the quick start guide In normal use if you have any questions or can not find something then contact the helpdesk support@archer.ac.uk