Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan SIMD Architectures Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan.

Similar presentations


Presentation on theme: "Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan SIMD Architectures Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan."— Presentation transcript:

1 Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan
SIMD Architectures Laxmi Narayan Bhuyan

2

3

4

5

6

7

8

9

10 Data Parallel Model Operations can be performed in parallel on each element of a large regular data structure, such as an array 1 Control Processsor broadcast to many PEs (see Ch. 1, Fig. 1-25, page 45 of [CSG99]) When computers were large, could amortize the control portion of many replicated PEs Condition flag per PE so that can skip Data distributed in each memory Early 1980s VLSI => SIMD rebirth: bit PEs + memory on a chip was the PE Data parallel programming languages lay out data to processor

11 Data Parallel Model Vector processors have similar ISAs, but no data placement restriction SIMD led to Data Parallel Programming languages Advancing VLSI led to single chip FPUs and whole fast µProcs (SIMD less attractive) SIMD programming model led to Single Program Multiple Data (SPMD) model All processors execute identical program Data parallel programming languages still useful, do communication all at once: “Bulk Synchronous” phases in which all communicate after a global barrier

12 SIMD Programming – High-Performance Fortran (HPF)
Single Program Multiple Data (SPMD) FORALL Construct similar to Fork: FORALL (I=1:N), A(I) = B(I) + C(I), END FORALL Data Mapping in HPF 1. To reduce interprocessor communication 2. Load balancing among processors

13 How does an SIMD computer work?
A Host computer is necessary to do the I/O operations The user program is loaded into the control memory The data is distributed to all the memory modules The control unit decodes the instn and executes it if it is a scalar instn. If it is a vector instn, it broadcasts the control signals to the PEs to do the executions Before broadcasting the control signals, the CU broadcasts an enable vector which will enable the PEs

14 Masking and Data Routing Mechanisms
A,B,C – working registers Si = status (1 active, 0 inactive) Ri – Data routing register Di – holds address Ii – Index register

15 Example

16 Matrix Multiplication

17

18

19 N * N Mesh

20 The Illiac IV Architecture
Distributed memory architecture 64 PEs connected as an 8X8 2-D mesh with end around connection LDB: Local Data Buffer 64, 64-bit each PEM: 2K X 64 bits memory

21 The Illiac IV Network

22 Maspar MP-1 Architecture
Configuration with 1K-16K PEs are available Each PE has a 4-bit ALU, 1-bit logic unit, a 64-bit mantissa unit, a 16-bit exponent unit, communication input and output ports Each PE has bit registers available to the programmer Each processor board has 1024 PEs arranges as 64 PE clusters (PECs) with 16 PEs per cluster Each PEC is a chip connected to 8 neighbors via an octagonal mesh Another network, called Multistage Crossbar Network, with three router stages gives a function of 1024X1024 crossbar for routing from any PEC to another PEC

23

24


Download ppt "Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan SIMD Architectures Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan."

Similar presentations


Ads by Google