N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM1 Introduction to the T3E Mark Durst NERSC/USG ERSUG Training,

Slides:



Advertisements
Similar presentations
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER TotalView on the T3E and IBM SP Systems NERSC User Services June 12, 2000.
Advertisements

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Mixed Language Programming on Seaborg Mark Durst NERSC User Services.
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
Chapter 5 Data Management. – The Best & Most Convenient Way to Learn Salesforce.com 2 Objectives By the end of the module, you.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
DCC/FCUP Grid Computing 1 Resource Management Systems.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 LoadLeveler vs. NQE/NQS: Clash of The Titans NERSC User Services Oak Ridge National Lab 6/6/00.
11/13/01CS-550 Presentation - Overview of Microsoft disk operating system. 1 An Overview of Microsoft Disk Operating System.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
MEMORY MANAGEMENT By KUNAL KADAKIA RISHIT SHAH. Memory Memory is a large array of words or bytes, each with its own address. It is a repository of quickly.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Operating Systems Ch An Overview. Architecture of Computer Hardware and Systems Software Irv Englander, John Wiley, Bare Bones Computer.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
CCNA 2 v3.1 Module 2.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Comparison of Communication and I/O of the Cray T3E and IBM SP Jonathan Carter NERSC User.
JGI/NERSC New Hardware Training Kirsten Fagnan, Seung-Jin Sul January 10, 2013.
NERCS Users’ Group, Oct. 3, 2005 NUG Training 10/3/2005 Logistics –Morning only coffee and snacks –Additional drinks $0.50 in refrigerator in small kitchen.
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.
Synchronization and Communication in the T3E Multiprocessor.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Using the Batch System1 Using the Batch System at NERSC Mark Durst NERSC/USG ERSUG Training,
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Evolution of the NERSC SP System NERSC User Services Original Plans Phase 1 Phase 2 Programming.
Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill –
Katie Antypas User Services Group Lawrence Berkeley National Lab 17 February 2012 JGI Training Series.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
Operating System Principles And Multitasking
UNICOS. When it comes to solving real-world problems, leading-edge hardware is only part of the solution. A complete solution also requires a powerful.
CPSC 171 Introduction to Computer Science System Software and Virtual Machines.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
ARCHER Advanced Research Computing High End Resource
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
CT101: Computing Systems Introduction to Operating Systems.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
GRID COMPUTING.
PARADOX Cluster job management
Processes and threads.
HPC usage and software packages
Welcome to Indiana University Clusters
Chapter 2: System Structures
Welcome to our Nuclear Physics Computing System
Operation System Program 4
Shell & Kernel Concepts in Operating System
Compiling and Job Submission
Welcome to our Nuclear Physics Computing System
Introduction to High Performance Computing Using Sapelo2 at GACRC
Introduction to OS (concept, evolution, some keywords)
Quick Tutorial on MPICH for NIC-Cluster
Introduction to OS (concept, evolution, some keywords)
Working in The IITJ HPC System
Presentation transcript:

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM1 Introduction to the T3E Mark Durst NERSC/USG ERSUG Training, Argonne, IL 28 April 1999

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM2 Outline Hardware and Configuration Programming Environment Planning Runs Monitoring Execution Accounting Additional Resources Elvis Impression

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM3 NERSC T3E Configuration Commodity DEC Alpha EV-5 superscalar processor –450 MHz clock –900 Mflops/PE peak (only 5-10% typically achieved) Theoretical peak performance: 575 Gflops –256 MB memory per PE 692 PEs in 3 flavors –644 Application –33Command (ideally) –15OS Access via telnet, ssh, FTP Connect to NERSC mass storage, AFS

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM4 Interactive Environment UNICOS/mk Available shells: sh/ksh, csh, tcsh –csh: no file completion –tcsh not Cray-supported Home directories –2 GB file quota (with possible data migration) –3500 inode quota /usr/tmp –Used both for batch and temporary user space –75 GByte quota, 6K inode quota –Fastest transfer rates

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM5 modules modules manages user environment –Paths –Environment variables –Aliases Cray’s PrgEnv is modules -driven Provided startup files are critical! –Add to them, don’t clobber them –Add to paths, don’t set them –If you mess up, no compilers, etc. Largely automatic

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM6 More Fun with modules module list (tells you what’s loaded) module avail (lists them all) Other module subcommands –load –unload –switch –help Roll back compilers Test new versions

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM7 Other modules imsl (loads by default) nag (loads by default) scalapack (1.5) GNU (prepends) and GNU.tools (appends) tools ( tcsh, bash ) netcdf KCC (KAI C++ compiler) USG tedi

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM8 Programming Environment f90 cc / CC cam (assembler) cld (loader; usually unneeded) pghpf KCC (“ module load KCC ”) totalview (debugger) pat, apprentice (performance analysis)

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM9 f90 Conforms to Fortran 90 standard Much “standard” f77 wasn’t User-defined and abstract types Array syntax Allocatable objects and pointers Additional intrinsics cpp -like preprocessor

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM10 Important f90 options -f : source form (fixed or free) Defaults:.f fixed,.f90 free -c : Compile only -o name: Name executable Overrides -c (use -b name instead) -g, -G0, -G1 : debugging -O[0-3] : general optimization -Ra, -Rb : Argument/Bounds checking -dp : Double precision  64-bit single precision -i 32 / -s default32 : 32-bit integers / numbers -ev : Static memory allocation

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM11 Executables: Malleable or Fixed -Xnpes (e.g., -X64 ) creates “fixed” executable –Always runs on same number of (application) processors –Type./a.out to run -Xm or no -X option creates “malleable” executable –./a.out will run on command PE – mpprun -n npes./a.out runs on npes APP PEs

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM12 Execution Model In F90, C, C++, all processors execute same program Can ask for: –Process number (from zero up) MY_PE() (F90) _my_pe() (C/C++) –Total number of PEs NUM_PES() (F90) _num_pes() (C/C++) Above used to establish “master/slave” relationships Libraries still needed for communication

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM13 Libraries MPI (Message-Passing Interface) PVM (Parallel Virtual Machine) SHMEM (SHared MEMory; non-portable) BLACS (Basic Linear Algebra Communication Subprograms) ScaLAPACK (SCAlable [parts of] LAPACK) NetCDF (NETwork Common Data Format) HDF (Hierarchical Data Format) LIBSCI (including parallel FFTs), NAG, IMSL

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM14 Archival Storage in HPSS High-Performance Storage System Designed for scalability & hierarchies User storage quotas exist Access via ftp or new hsi utility Two systems: –hpss.nersc.gov ( hsi hpss ) –archive.nersc.gov ( hsi, hsi archive ) contains old CFS files –merger planned

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM15 Networking Issues AFS –Accounts must be requested –Tiny local quotas –Available on Crays through NFS/AFS gateway Non-trivial latencies Remote logins –.rhosts access not permitted; no incoming “r- commands” –ssh available xterm only “backwards”

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM16 Execution modes Interactive serial –< 60 minutes –on command PEs –slightly reduced memory Interactive parallel –< 30 minutes –< 64 processors Batch

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM17 Batch queues on mcurie.nersc.gov To see them: qstat -b pe16 through pe512 –4 hours “on the torus” –Routine parallel jobs serial_short : 4 hours on a single command PE debug_small : ½ hour, up to 32 PEs long128, gc128, gc256 : 12-hour queues –  64 PEs – gc queues restricted Largest queues shuffled in at night Other jobs checkpointed out Subject to change

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM18 Example daily job mix

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM19 Batch submission Jobs are shell scripts cqsub submits, returns task ID; cqdel deletes cqstatl/qstat gets status (many options) NQS parameters determine queue –#QSUB -l mpp_p= … (number of PEs) –#QSUB -l mpp_t= … (“parallel” time) –for serial jobs: use #QSUB -q serial not #QSUB -l mpp_p=1

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM20 Pipe Queues You submit to pipe queues, not batch queues –Use only pipe names in directives like: #QSUB -q serial Group batch queues: – serial = serial_short – debug = debug_small – production = pe128 through pe512 – long = long128, gc128, gc256 3 jobs per user in production + long 3 in serial, one in debug To see them: qstat -p

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM21 Scheduling Information Lots of NQS-related limits –Queue run limits –Queue “complex” run limits Global Resource Manager –Fits jobs into contiguous sets of PEs –Once started, jobs run to completion (mostly) –First-fit algorithm lets small jobs trample big ones – grmview shows PE status, waiting jobs

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM22 Scheduling Information (cont’d) pslist gives summary of GRM data –No man page; pslist -h instead Checkpointing –For system maintenance –To run test and “grand challenge” jobs –Shows “ Hop ” in qstat/cqstatl (held by operator) mppview more nuts-and-bolts

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM23 Accounting and allocations T3E allocations are in node-minutes –setcub view repo=reponame –setcub view user=username newacct reponame switches repos interactively –One login name per user; multiple repos #QSUB -A reponame charges batch jobs Charging updated daily; enforcement manual

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM24 On-line Resources T3E pages under “Computers” at home.nersc.gov –Read overview once, check “Changes” monthly Docs in Cray on-line system – “Topics” to T3E collection –Many other docs (e.g., F90, C manual sets) Cray Web site, –Technical documents, additional on-line docs NERSC T3E tutorials –“Training”  “NERSC Tutorials”

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM25 More on-line resources Other NERSC tutorials –Using the Cray f90 compiler at NERSC –Introduction to make –NQE: Using the batch system Look over NERSC Web generally

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER INTRODUCTION TO THE T3E SYSTEM26 man pages cqsub cqstatl f90 cc CC