\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD
\course\eleg652-03F\Topic1a- 03F.ppt2 A processor that is capable of adding two vectors by streaming the two sectors through a pipelined adder Pipelined Adder Multiport Memory System Stream A Stream B Stream C = A + B
\course\eleg652-03F\Topic1a- 03F.ppt3 Key of Performance Keeping up the bandwidth of C : = A + B Problem: RAM can only support 1 word/cycle 3 mem reference per cycle for oprands/result
\course\eleg652-03F\Topic1a- 03F.ppt4 MEMORY intermediate “buffer” memory Arithmatic pipeline Multiple use per data is favorable for bandwidth Must avoid bottleneck here!
\course\eleg652-03F\Topic1a- 03F.ppt5 The Architecture of a Vector Computer Scalar Functional Pipelines Scalar Control Unit Main Memory (Program and Data) Vector Control Unit Vector Registers Vector Func. Pipe. Vector Instructions Vector Data Control Scalar Processor Scalar Instructions Instruction Scalar Data Mass Storage Host Computer I/O (User) Vector Processor
\course\eleg652-03F\Topic1a- 03F.ppt6 SIMD Architectures
\course\eleg652-03F\Topic1a- 03F.ppt7 ILLIAC IV Univ. of Illinois + BSP Objective: 10 9 op/sec. 256 PE + 4CU Achieved: - 64 PE + 1CU x 10 6 op/sec Applications - weather forecasting - nuclear engineering
\course\eleg652-03F\Topic1a- 03F.ppt8 Function of CU - store user program - decode all instructions and determine where they are to be executed - execute scalar instructions - broadcast vector instructions Function of PE : perform the same function - lock-step - masking scheme - data routing Function of interconnection network - comm. between PEs (data exchanges) data bus broadcasted from CU {
\course\eleg652-03F\Topic1a- 03F.ppt9 PE 0 PEM 0 Data & Instructions PE 1 PEM 1 PE n-1 PEM n-1 Interconnection network Data bus Control bus Cont Configuration I (Illiac IV)... CU memory CU
\course\eleg652-03F\Topic1a- 03F.ppt10 I/O Data bus Configuration II (BSP)... CU memory CU Alignment network PE 0 PE 1 PE n-1 M1M1 M p-1 M0M0... Cont
\course\eleg652-03F\Topic1a- 03F.ppt PE routing connections
\course\eleg652-03F\Topic1a- 03F.ppt (a) Electrical connectivity Layout for ILLIAC-IV
\course\eleg652-03F\Topic1a- 03F.ppt Shifts of 20 Shifts of 21 (b) The physical layout
\course\eleg652-03F\Topic1a- 03F.ppt14 Input Alignment Network Output Alignment Network MMMM... PPPP 17 Inputs 16 Outputs 16 Inputs 17 Outputs 16 Processors17 Memories The data flow and processor/memory structure of the Burroughs Scientific Processor (BSP)
\course\eleg652-03F\Topic1a- 03F.ppt15 Mesh connected - multi-dimensional(cont’d) ICL DAP ICL 2980 Host 2D - nearest neighbor connection 64 x 64 (4096 PEs) (16PE/ board) AMTVLSI DAP 500 DAP 510: 32 x 32 array 64 PE /chip logic in memory
\course\eleg652-03F\Topic1a- 03F.ppt16 The AMT DAP 500 ARRAY MEMORY 32 32K BITS FAST DATA CHANNEL PROCESSOR ELEMENTS O A C D ACCUMULATOR ACTIVITY CONTROL CARRY DATA HOST CONNECTION UNIT MASTER CONTROL UNIT USER INTERFACE CODE MEMORY
\course\eleg652-03F\Topic1a- 03F.ppt17 (a) Multivector track Illiac IV (Barnes et al, 1968) Goodyear MPP (Batcher, 1980) BSP (Kuck and Stokes, 1982) DAP 610 (AMT, Inc. 1987) CM2 (TMC, 1990) MasPar MP1 (Nickolls, 1990) IBM GF/11 (Beetem et al, 1985) CDC 7600 (CDC, 1970) CDC Cyber 205 (Levine, 1982) Cray 1 (Russell, 1978) ETA 10 (ETA, Inc. 1989) Cray Y-MP (Cray Research, 1989) Cray/MPP (Cray Research, 1993) Fujitsu, NEC, Hitachi Models (b) SIMD track