© 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006.

Slides:



Advertisements
Similar presentations
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Advertisements

Office of Science U.S. Department of Energy Bassi/Power5 Architecture John Shalf NERSC Users Group Meeting Princeton Plasma Physics Laboratory June 2005.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Today’s topics Single processors and the Memory Hierarchy
Page 1 Dorado 400 Series Server Club Page 2 First member of the Dorado family based on the Next Generation architecture Employs Intel 64 Xeon Dual.
Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Chapter Hardwired vs Microprogrammed Control Multithreading
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Parallel Computer Architectures
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Computer performance.
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
© 2005 IBM Software Environment Some Useful System Commands.
The Intel Architecture and Windows Internals
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Executing Message-Passing Programs Mitesh Meswani.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Lecture 11 Multithreaded Architectures Graduate Computer Architecture Fall 2005 Shih-Hao Hung Dept. of Computer Science and Information Engineering National.
Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
© 2005 IBM MPI Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006.
Hyper-Threading Technology Architecture and Microarchitecture
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
IBM/Motorola/Apple PowerPC
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
The Intel 86 Family of Processors
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Chao Han ELEC6200 Computer Architecture Fall 081ELEC : Han: PowerPC.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Hewlett-Packard PA-RISC Bit Processors: History, Features, and Architecture Presented By: Adam Gray Christie Kummers Joshua Madagan.
© 2004 IBM Corporation Power Everywhere POWER5 Processor Update Mark Papermaster VP, Technology Development IBM Systems and Technology Group.
CEG 2400 FALL 2012 Linux/UNIX Network Operating Systems.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Background Computer System Architectures Computer System Software.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
© 2007 IBM Corporation Power6 Presentation Power of P6 Anita Devadason June 11 th, 2007.
UltraSparc IV Tolga TOLGAY. OUTLINE Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion Introduction History.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Itanium® 2 Processor Architecture
Chapter 1: A Tour of Computer Systems
Constructing a system with multiple computers or processors
Architecture & Organization 1
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Architecture & Organization 1
Chapter 6 Memory System Design
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Hardware Overview System P & Power5.
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT.
Types of Parallel Computers
Presentation transcript:

© 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006

2 © 2005 IBM Corporation Agenda Hardware Software Documentation

3 © 2005 IBM Corporation Hardware Overview Processors: Nodes: Clusters:

4 © 2005 IBM Corporation Product Naming New NameOld NameMarketProcessor iSeriesAS400CommercialRS64 pSeries RS600 SP SP2 Technical POWER3 POWER4 POWER5 xSeries IA-32 IA-64 Server Xeon AMD zSeriesES9000MainframeRS64

5 © 2005 IBM Corporation Processor Progression ProcessorYearsClock RateFeature POWER – 60 MHzRISC P2SC – 150 MHzBandwidth POWER – – 450 MHzSingle Chip POWER – – 1.9 GHzDual Core POWER – 1.9 GHzMulti-Thread

6 © 2005 IBM Corporation POWER5 Systems POWER5 processors Single and Dual processor chips Modules Dual Chip Modules (DCM) Multi Chip Modules (MCM) Nodes Multiple modules p5-575 p5-595 Cluster Multiple nodes Connected with High Speed Switch (HPS)

7 © 2005 IBM Corporation Systems (“Nodes”) ModelProcessors Clock Rate (GHz) Memory (x 2^30 byte) p , p , p ,5, p , p p p ,

8 © 2005 IBM Corporation POWER5 Processor Systems MCM Chip Processor DCM p5-575 p5-595 Cluster

9 © 2005 IBM Corporation Cluster 1600 Multi Processor Nodes Physical View Logical View Network, Disk System

10 © 2005 IBM Corporation Local System Name IBM p5-575 nodes 1.9 GHz POWER5 processors Single processor chips 8 processors per node HPS interconnect “575” distinction: Dual Chip Module (DCM) 8 DCMs One or two processors per chip Single Core (SC) Dual Core (DC) “595” distinction: Multi Chip Module (MCM) construction 8 MCMs

11 © 2005 IBM Corporation POWER5 Processors Multi-processor chip High clock rate: Multiple GHz Three cache levels Bandwidth Latency hiding Shared Memory Large memory size

12 © 2005 IBM Corporation POWER5 Features Private L1 cache Shared L2 cache Shared L3 cache Interleaved memory Hardware Prefetch Multiple Page Size support

13 © 2005 IBM Corporation Processor Characteristics High frequency clocks Deep pipelines High asymptotic rates Superscalar Speculative out-of-order instructions Up to 8 outstanding cache line misses Large number of instructions in flight Branch prediction Hardware Prefetching

14 © 2005 IBM Corporation Processor Features POWER4POWER5 Clock 1.0 – 1.9 GHz1.5 – … GHz Caches Three levels L3 Speed 1/3 clock frequency½ clock frequency Virtualization Up to 32 partitionsUp to 254 partitions Partitions Unit processorFractional Power Mang. StaticDynamic Thread Execution Single ThreadMulti Threading Memory Store Single BufferDouble Buffer Renaming Registers GP: 72 FP: 80 GP: 120 FP: 120

15 © 2005 IBM Corporation Caches and Memory POWER4POWER5 L1 Cache Data: 32 kbyte Instruction: 64 kbyte 2-way Assoc., FIFO Data: 32 kbyte Instruction: 64 kbyte 4-way Assoc., LRU L2 Cache 1.5 Mbyte 8-way Assoc., FIFO 1.9 Mbyte 10-way Assoc., LRU L3 Cache 32 Mbyte 8-way Assoc., LRU 120 Cycles 36 Mbyte 12-way Assoc., LRU ~80 Cycles Memory Bandwidth 4 Gbyte/s / Chip16 Gbyte/s / Chip

16 © 2005 IBM Corporation POWER4+POWER5 Frequency (GHz) L2 Latency (Cycles) 12 L3 Latency (Cycles) Memory Latency (Cycles) Copy Bandwidth 4 proc. (Gbyte/s) 818 Linpack Rate N=1000 (Gflop/s) SPECint_base SPECfp_base POWER4 – POWER5 Comparison

17 © 2005 IBM Corporation POWER5 Design: Summary More gates 170 million  260 million Enhancements Increased cache associativity Increased number of rename registers Reduced L3 and cache latency New features Simultaneous Multi Threading Dynamic power management

18 © 2005 IBM Corporation Processor Systems (Nodes) Multiple processors Multiple modules Various construction formats Multi Chip Modules Dual Chip Modules Shared memory

19 © 2005 IBM Corporation Multi Chip and Dual Chip Modules Multi Chip Module (MCM) p5-590 p5-595 Chip POWER5 Processor Dual Chip Module (MCM) p5-570 p5-575

20 © 2005 IBM Corporation Dual Chip Module Each Module: 1 processor chip 1 L3 cache 1 Memory card Each Processor Chip 2 processors L1 caches Registers Functional units 1 L2 cache 1 path to memory 36 Mbyte L3 Memory

21 © 2005 IBM Corporation Multi Chip Module Each Module: 4 processor chips 4 L3 cache chips 2 Memory cards Each Processor Chip 2 processors L1 caches Registers Functional units 1 L2 cache 1 path to memory Memory

22 © 2005 IBM Corporation POWER5 Multi Chip Module Four POWER5 chips Four L3 cache chips 95mm  95mm 4,491 signal I/Os 89 layers of metal

23 © 2005 IBM Corporation POWER5 Dual Chip Module One POWER5 chip Single or Dual Core One L3 cache chips

24 © 2005 IBM Corporation L3 Modifications to POWER4 System Structure PP L2 Memory L3 Fab Ctl PP L2 L3 Memory L3 Fab Ctl L3 Mem Ctl

25 © 2005 IBM Corporation Switch Technology Internal network In lieu of GigEthernet, Myrinet, Quadrics, etc. Fourth generation HPS Switch (POWER2 generation) SP Switch (POWER2 -> POWER3) SP Switch 2 (POWER3 -> POWER4) HPS (POWER4 -> POWER5) Multiple links per node Match number of links to number of processors

26 © 2005 IBM Corporation High Performance Switch (HPS) Also Known As “Federation” Follow on to SP Switch2 Also known as “Colony” Specifications: 2 Gbyte/s (bidirectional) 5 microsecond latency Configuration: Up to four adaptors per node 2 links per adaptor 16 Gbyte/s per node

27 © 2005 IBM Corporation HPS Specifications Latency [microsec.] Bandwidth, single [Mbyte/s] Bandwidth, multiple [Mbyte/s] SP Switch HPS

28 © 2005 IBM Corporation Software Overview Operating System AIX Compilers C C++ Fortran Batch Queue LoadLeveler (IBM) LSF (Platform) PBS Gridware

29 © 2005 IBM Corporation AIX Current Version: AIX 5.3 Processors: POWER3 POWER4 POWER5 Linux Affinity Logical PARtitions (LPAR) Nodes Operating system Memory Network connections Kernel Address Size: 64-bit 32-bit

30 © 2005 IBM Corporation Linux on POWER Native Linux, SuSE7  SuSE8 Rpm's and package managers Cluster Systems Manager 64-bit kernel 32/64-bit applications support (SuSE8) CompilerUser Name CXlc C++xlC Fortranxlf

31 © 2005 IBM Corporation Compilers C and C++ Visual Age C and C++ Professional for AIX Versions 6, 7, 8 ANSI C C++ Compiler names: xlc xlC Fortran XL Fortran for AIX Versions 8, 9, 10 Fortran 77 Fortran 90 Compiler names: xlf77 xlf90

32 © 2005 IBM Corporation Compiler Names CompilerUser Name Fortran 77xlf77 Fortran 90xlf90 Cxlc C++xlC MPI compilempxlf, mpcc Reentrantxlf_r, xlc_r AIX uses different compiler names to perform some tasks which are handled by compiler flags on most other systems

33 © 2005 IBM Corporation Compiler Usage LanguageCommandFeatureExtension ANSI C xlc xlc_r ANSI Thread safe.c Extended C ccPre-ANSI.c MPI, C mpxlcMPI.c C++ xlC xlC_rThread safe.C.cc.cpp Fortran 77 xlf xlf_rThread safe.f Fortran 90 xlf90 xlf90_rThread safe.f MPI fortran mpxlfMPI.f

34 © 2005 IBM Corporation User Limits Set by the system administrator Ulimit: C or K shell built-in Sets or reports resource limits Limits are defined in /etc/security/limits Sizes are in 512 byte blocks Times are in seconds $ ulimit -a

35 © 2005 IBM Corporation Ulimit Defaults Value LimitDefinitionDefaultTypical fsizeFile Size Unlimited (-1) coreCore File Size Unlimited (-1) cpuPer Process limit-1 (unlimited)Unlimited (-1) dataData Segment Size262144Unlimited (-1) stackStack Segment Size65536*Unlimited (-1) No. filesFile Descriptor Limit2000 * 64-bit address mode

36 © 2005 IBM Corporation Other Defaults Thread control /etc/environment AIXTHREAD_SCOPE=S AIXTHREAD_MNRATIO=1:1 AIXTHREAD_COND_DEBUG=OFF AIXTHREAD_GUARDPAGES=4 AIXTHREAD_MUTEX_DEBUG=OFF AIXTHREAD_RWLOCK_DEBUG=OFF

37 © 2005 IBM Corporation Batch Queuing Compile on any AIX node Use –qarch=pwr5 Submit job with available batch utility Use appropriate queue name Available queuing systems: LoadLeveler PBS Gridware LSF

38 © 2005 IBM Corporation Cluster Layout Compile And Submit Node Node 0Node 1 Network Node 2

39 © 2005 IBM Corporation Documentation Software: Products A-Z X -> xl C, xl C/C++, xl Fortran Compilers /usr/vac/doc /usr/vacpp/doc /usr/lpp/xlf/doc Redbooks: IBM eServer p5 590 and 595 System Handbook

40 © 2005 IBM Corporation Documentation AIX Commands Reference AIX command: /usr/sbin/infocenter /opt/ibm_help/help_start.sh xcmdsrefbooks.htmhttp:// xcmdsrefbooks.htm Google search: “AIX Commands Reference”

41 © 2005 IBM Corporation Documentation Library Google Search: AIX 5L documentation Library

42 © 2005 IBM Corporation Summary: Architecture System architecture Processors Nodes Cluster Processors POWER5 Three levels of cache Nodes: Eight processor p5-575 Cluster: 14 p5-575 nodes HPS interconnect