Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.

Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006

Anshul Kumar, CSE IITD Data Parallel Architectures SIMD Processors –Multiple processing elements driven by a single instruction stream Associative Processors –SIMD like processors with associative memory Vector Processors –Uni-processors with vector instructions Systolic Arrays –Application specific VLSI structures

Anshul Kumar, CSE IITD SIMDSIMD C P P M IS DS One of the earliest model of parallel computer

Anshul Kumar, CSE IITD ILLIAC IV SIMD Model P M P M P M P M Interconnection network PE1PE2PEn CU I/O bus Planned for 64 x 4 PEs, built only 64

Anshul Kumar, CSE IITD Burroughs Scientific Processor (BSP) Model P M P1P1 M1M1 P2P2 M2M2 PnPn MkMk Interconnection network CU I/O bus

Anshul Kumar, CSE IITD SIMD algorithms: sum of vector elements Si = ai + ai+1 i = 0,2,4,6 Si = Si + Si+2 i = 0,4 Si = Si + Si+4 i = 0 a0a1a2a3a4a5a6a7 a0+a1a2+a3a4+a5a6+a7 a0+a1+ a2+a3 a4+a5+ a6+a7 a0+a1+a2+a3+ a4+a5+a6+a7 step 1: step 2: step 3: Si = ai + ai+4 i = 0,1,2,3 Si = Si + Si+2 i = 0,1 Si = Si + Si+1 i = 0 OR

Anshul Kumar, CSE IITD No. of processors vs time Adding vector elements: –n processors – log n steps –n/log n processors – log n steps Matrix multiplication: –n processor – n 2 steps –n 2 processors – n steps –n 3 processors – log n steps –n 3 /log n processors – log n steps Important factors: data distribution, network

Anshul Kumar, CSE IITD Rise and fall of SIMDs Introduced in 60’s (e.g. Illiac, BSP) Problems: –not cost effective –serial fraction and Amdahl’s law –I/O bottle neck Overshadowed by Vector Processors Resurrected in 80’s (MPP from Goodyear, Connection machine from Thinking Machines Inc., MP-1 from MasPar) Did not survive because of high cost

Anshul Kumar, CSE IITD Related ideas Coarse grain SIMD with off the shelf processors (synchronized MIMD), e.g. CM5 of Thinking Machines This gave rise to SPMD (single program multiple data) MMX and SIMD instructions in Pentium

Anshul Kumar, CSE IITD Vector Processors I-cache D-cache Mem control I-unit and control V-regGPRs address unit VFU FU Buses Memory

Anshul Kumar, CSE IITD Four Generations of CRAY systems (vector processors) SystemCPUsClockFlops/WordsMflopsGates/ MHzclock/moved/chip CPUclk/CPU CRAY-1 1 80 21 80 2 X-MP 410523 840 16 Y-MP 8166 23 2667 2500 C90 16240 4615360 10000

Anshul Kumar, CSE IITD Cray History http://www.cray.com/company/history.html

Anshul Kumar, CSE IITD CRAY C90 8GB central memory shared by 16 CPUs 128 CPU - mem paths word = 64 bits + 16 ECC Dual vector pipes 128 element segments Memory 8 sections 8x8 sub sections 8x8x2 bank groups 8x8x2x8 banks

Anshul Kumar, CSE IITD Convex C4/XA system CPU: 7.5 ns clock, 1620 MFLOPs Mem: 32 MB x 32 banks, 64 bit word, 50ns access time 3 FP pipes, 2 results each Vector regs - FPU cross bar 1.1 GB/s per I/O port 5 x 5 crossbar CPUs memories I/Outilities

Anshul Kumar, CSE IITD Other examples NEC SX - X 4 CPUs 4 x 2 pipes each Fujitsu VP5000 7 - 222 CPUs 2 LS pipes 3 Func pipes 2 mask pipes Fujitsu VP2000 1 - 2 CPUs

Anshul Kumar, CSE IITD Systolic Arrays (H.T. Kung 1978) Simplicity, Regularity, Concurrency, Communication Example : Band matrix multiplication

B 11 B 12 B 21 B 31 A 11 A 12 A 21 A 22 A 31 A 23 T=0

B 11 B 12 B 21 B 31 B 22 A 11 A 12 A 21 A 22 A 31 A 23 A 32 T=1

A 11 A 12 A 21 A 22 A 31 A 23 A 32 A 33 B 11 B 12 B 21 B 31 B 22 B 32 T=2

A 21 A 22 A 31 A 23 A 32 A 33 A 34 B 12 B 31 B 22 B 32 B 42 A 11 B 11 A 42 B 23 A 12 B 21 T=3

A 22 A 31 A 23 A 32 A 33 A 34 B 31 B 22 B 32 B 42 A 11 B 11 A 12 B 21 A 42 B 23 A 11 B 12 A 21 B 11 B 33 A 43 T=4

A 23 A 32 A 33 A 34 B 31 B 32 B 42 A 42 B 23 B 33 A 43 A 11 B 12 A 12 B 22 A 21 B 12 A 21 B 11 A 22 B 21 C 11 A 31 B 11 T=5

A 33 A 34 B 32 B 42 A 42 B 33 A 43 A 21 B 12 A 22 B 22 A 21 B 11 A 22 B 21 A 23 B 31 C 11 A 31 B 12 A 31 B 11 A 32 B 21 C 12 A 12 B 23 A 53 A 44 B 43 T=6

Anshul Kumar, CSE IITD WARP: Programmable Systolic Processor [Kung, CMU 1987] Complete contrast to the original idea not application specific not a single VLSI complex cell (pipelined FP adder, mult, FIFOs, RAM, cross bar) linear asynchronous

Anshul Kumar, CSE IITD ReferencesReferences D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997. K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993.

Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.

Similar presentations

Presentation on theme: "Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.

Similar presentations

Presentation on theme: "Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006."— Presentation transcript:

Similar presentations

About project

Feedback