© 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006
2 © 2005 IBM Corporation Agenda Hardware Software Documentation
3 © 2005 IBM Corporation Hardware Overview Processors: Nodes: Clusters:
4 © 2005 IBM Corporation Product Naming New NameOld NameMarketProcessor iSeriesAS400CommercialRS64 pSeries RS600 SP SP2 Technical POWER3 POWER4 POWER5 xSeries IA-32 IA-64 Server Xeon AMD zSeriesES9000MainframeRS64
5 © 2005 IBM Corporation Processor Progression ProcessorYearsClock RateFeature POWER – 60 MHzRISC P2SC – 150 MHzBandwidth POWER – – 450 MHzSingle Chip POWER – – 1.9 GHzDual Core POWER – 1.9 GHzMulti-Thread
6 © 2005 IBM Corporation POWER5 Systems POWER5 processors Single and Dual processor chips Modules Dual Chip Modules (DCM) Multi Chip Modules (MCM) Nodes Multiple modules p5-575 p5-595 Cluster Multiple nodes Connected with High Speed Switch (HPS)
7 © 2005 IBM Corporation Systems (“Nodes”) ModelProcessors Clock Rate (GHz) Memory (x 2^30 byte) p , p , p ,5, p , p p p ,
8 © 2005 IBM Corporation POWER5 Processor Systems MCM Chip Processor DCM p5-575 p5-595 Cluster
9 © 2005 IBM Corporation Cluster 1600 Multi Processor Nodes Physical View Logical View Network, Disk System
10 © 2005 IBM Corporation Local System Name IBM p5-575 nodes 1.9 GHz POWER5 processors Single processor chips 8 processors per node HPS interconnect “575” distinction: Dual Chip Module (DCM) 8 DCMs One or two processors per chip Single Core (SC) Dual Core (DC) “595” distinction: Multi Chip Module (MCM) construction 8 MCMs
11 © 2005 IBM Corporation POWER5 Processors Multi-processor chip High clock rate: Multiple GHz Three cache levels Bandwidth Latency hiding Shared Memory Large memory size
12 © 2005 IBM Corporation POWER5 Features Private L1 cache Shared L2 cache Shared L3 cache Interleaved memory Hardware Prefetch Multiple Page Size support
13 © 2005 IBM Corporation Processor Characteristics High frequency clocks Deep pipelines High asymptotic rates Superscalar Speculative out-of-order instructions Up to 8 outstanding cache line misses Large number of instructions in flight Branch prediction Hardware Prefetching
14 © 2005 IBM Corporation Processor Features POWER4POWER5 Clock 1.0 – 1.9 GHz1.5 – … GHz Caches Three levels L3 Speed 1/3 clock frequency½ clock frequency Virtualization Up to 32 partitionsUp to 254 partitions Partitions Unit processorFractional Power Mang. StaticDynamic Thread Execution Single ThreadMulti Threading Memory Store Single BufferDouble Buffer Renaming Registers GP: 72 FP: 80 GP: 120 FP: 120
15 © 2005 IBM Corporation Caches and Memory POWER4POWER5 L1 Cache Data: 32 kbyte Instruction: 64 kbyte 2-way Assoc., FIFO Data: 32 kbyte Instruction: 64 kbyte 4-way Assoc., LRU L2 Cache 1.5 Mbyte 8-way Assoc., FIFO 1.9 Mbyte 10-way Assoc., LRU L3 Cache 32 Mbyte 8-way Assoc., LRU 120 Cycles 36 Mbyte 12-way Assoc., LRU ~80 Cycles Memory Bandwidth 4 Gbyte/s / Chip16 Gbyte/s / Chip
16 © 2005 IBM Corporation POWER4+POWER5 Frequency (GHz) L2 Latency (Cycles) 12 L3 Latency (Cycles) Memory Latency (Cycles) Copy Bandwidth 4 proc. (Gbyte/s) 818 Linpack Rate N=1000 (Gflop/s) SPECint_base SPECfp_base POWER4 – POWER5 Comparison
17 © 2005 IBM Corporation POWER5 Design: Summary More gates 170 million 260 million Enhancements Increased cache associativity Increased number of rename registers Reduced L3 and cache latency New features Simultaneous Multi Threading Dynamic power management
18 © 2005 IBM Corporation Processor Systems (Nodes) Multiple processors Multiple modules Various construction formats Multi Chip Modules Dual Chip Modules Shared memory
19 © 2005 IBM Corporation Multi Chip and Dual Chip Modules Multi Chip Module (MCM) p5-590 p5-595 Chip POWER5 Processor Dual Chip Module (MCM) p5-570 p5-575
20 © 2005 IBM Corporation Dual Chip Module Each Module: 1 processor chip 1 L3 cache 1 Memory card Each Processor Chip 2 processors L1 caches Registers Functional units 1 L2 cache 1 path to memory 36 Mbyte L3 Memory
21 © 2005 IBM Corporation Multi Chip Module Each Module: 4 processor chips 4 L3 cache chips 2 Memory cards Each Processor Chip 2 processors L1 caches Registers Functional units 1 L2 cache 1 path to memory Memory
22 © 2005 IBM Corporation POWER5 Multi Chip Module Four POWER5 chips Four L3 cache chips 95mm 95mm 4,491 signal I/Os 89 layers of metal
23 © 2005 IBM Corporation POWER5 Dual Chip Module One POWER5 chip Single or Dual Core One L3 cache chips
24 © 2005 IBM Corporation L3 Modifications to POWER4 System Structure PP L2 Memory L3 Fab Ctl PP L2 L3 Memory L3 Fab Ctl L3 Mem Ctl
25 © 2005 IBM Corporation Switch Technology Internal network In lieu of GigEthernet, Myrinet, Quadrics, etc. Fourth generation HPS Switch (POWER2 generation) SP Switch (POWER2 -> POWER3) SP Switch 2 (POWER3 -> POWER4) HPS (POWER4 -> POWER5) Multiple links per node Match number of links to number of processors
26 © 2005 IBM Corporation High Performance Switch (HPS) Also Known As “Federation” Follow on to SP Switch2 Also known as “Colony” Specifications: 2 Gbyte/s (bidirectional) 5 microsecond latency Configuration: Up to four adaptors per node 2 links per adaptor 16 Gbyte/s per node
27 © 2005 IBM Corporation HPS Specifications Latency [microsec.] Bandwidth, single [Mbyte/s] Bandwidth, multiple [Mbyte/s] SP Switch HPS
28 © 2005 IBM Corporation Software Overview Operating System AIX Compilers C C++ Fortran Batch Queue LoadLeveler (IBM) LSF (Platform) PBS Gridware
29 © 2005 IBM Corporation AIX Current Version: AIX 5.3 Processors: POWER3 POWER4 POWER5 Linux Affinity Logical PARtitions (LPAR) Nodes Operating system Memory Network connections Kernel Address Size: 64-bit 32-bit
30 © 2005 IBM Corporation Linux on POWER Native Linux, SuSE7 SuSE8 Rpm's and package managers Cluster Systems Manager 64-bit kernel 32/64-bit applications support (SuSE8) CompilerUser Name CXlc C++xlC Fortranxlf
31 © 2005 IBM Corporation Compilers C and C++ Visual Age C and C++ Professional for AIX Versions 6, 7, 8 ANSI C C++ Compiler names: xlc xlC Fortran XL Fortran for AIX Versions 8, 9, 10 Fortran 77 Fortran 90 Compiler names: xlf77 xlf90
32 © 2005 IBM Corporation Compiler Names CompilerUser Name Fortran 77xlf77 Fortran 90xlf90 Cxlc C++xlC MPI compilempxlf, mpcc Reentrantxlf_r, xlc_r AIX uses different compiler names to perform some tasks which are handled by compiler flags on most other systems
33 © 2005 IBM Corporation Compiler Usage LanguageCommandFeatureExtension ANSI C xlc xlc_r ANSI Thread safe.c Extended C ccPre-ANSI.c MPI, C mpxlcMPI.c C++ xlC xlC_rThread safe.C.cc.cpp Fortran 77 xlf xlf_rThread safe.f Fortran 90 xlf90 xlf90_rThread safe.f MPI fortran mpxlfMPI.f
34 © 2005 IBM Corporation User Limits Set by the system administrator Ulimit: C or K shell built-in Sets or reports resource limits Limits are defined in /etc/security/limits Sizes are in 512 byte blocks Times are in seconds $ ulimit -a
35 © 2005 IBM Corporation Ulimit Defaults Value LimitDefinitionDefaultTypical fsizeFile Size Unlimited (-1) coreCore File Size Unlimited (-1) cpuPer Process limit-1 (unlimited)Unlimited (-1) dataData Segment Size262144Unlimited (-1) stackStack Segment Size65536*Unlimited (-1) No. filesFile Descriptor Limit2000 * 64-bit address mode
36 © 2005 IBM Corporation Other Defaults Thread control /etc/environment AIXTHREAD_SCOPE=S AIXTHREAD_MNRATIO=1:1 AIXTHREAD_COND_DEBUG=OFF AIXTHREAD_GUARDPAGES=4 AIXTHREAD_MUTEX_DEBUG=OFF AIXTHREAD_RWLOCK_DEBUG=OFF
37 © 2005 IBM Corporation Batch Queuing Compile on any AIX node Use –qarch=pwr5 Submit job with available batch utility Use appropriate queue name Available queuing systems: LoadLeveler PBS Gridware LSF
38 © 2005 IBM Corporation Cluster Layout Compile And Submit Node Node 0Node 1 Network Node 2
39 © 2005 IBM Corporation Documentation Software: Products A-Z X -> xl C, xl C/C++, xl Fortran Compilers /usr/vac/doc /usr/vacpp/doc /usr/lpp/xlf/doc Redbooks: IBM eServer p5 590 and 595 System Handbook
40 © 2005 IBM Corporation Documentation AIX Commands Reference AIX command: /usr/sbin/infocenter /opt/ibm_help/help_start.sh xcmdsrefbooks.htmhttp:// xcmdsrefbooks.htm Google search: “AIX Commands Reference”
41 © 2005 IBM Corporation Documentation Library Google Search: AIX 5L documentation Library
42 © 2005 IBM Corporation Summary: Architecture System architecture Processors Nodes Cluster Processors POWER5 Three levels of cache Nodes: Eight processor p5-575 Cluster: 14 p5-575 nodes HPS interconnect