Single Instruction Multiple Data

Single Instruction Multiple Data
SIMD Single Instruction Multiple Data Roughly follows Tannenbaum Copyright © Curt Hill

SIMD Only successful when the data is highly parallel
Where there is a very large amount of time spent on array processing The array element processing is somewhat independent Such as adding corresponding array elements of two arrays There are plenty of applications but they are specialized, usually scientific Copyright © Curt Hill

Data level parallelism
Suppose, we have two arrays of 32 floating point operands and we want to add them A single processor will go down the line summing one at a time If it is superscalar and it has two FPUs it can do this slightly more than 16 units of time otherwise 32 Not bad but generally outperformed by array and vector processors Copyright © Curt Hill

Array Processor Single control unit that drives multiple ALUs
The ALUs usually have individual memories In the previous case it will take units Here if there are 32 floating point units and the vector register contains 32 slots it will take one unit When adding two scalar variables the two would be the same speed, but when adding two array variables (length<=32) then the vector processor would be 32 times faster Copyright © Curt Hill

Why In most applications such parallelism would be a waste, but in many scientific applications an array of size 32 is pretty small and substantial use could be made of this parallelism An array processor is a large number of identical processors that perform the same instruction on different pieces of data Single control unit for the many processors Parallel memories for the parallel processors Copyright © Curt Hill

Examples Thinking Machines Connection Machine 1 and 2
ILLIAC IV was the first in the late 60s Largely used by NASA for fluid dynamics calculations Very large amount of parallelism in this application Thinking Machines Connection Machine 1 and 2 Goodrich Massively Parallel Processor MasPar MP 1 and 2 Copyright © Curt Hill

Disadvantages: Hardware heavy – expensive Difficult to program
Never mass produced since they fit a niche market Register/memory configuration is unusual Difficult to program Most languages have no support High Performance FORTRAN is usual choice Exceptional performance but only on truly parallel computations Copyright © Curt Hill

Vector processor Essentially a normal processor, usually superscalar, heavily pipelined What it has different are vector registers A normal register contains a single value, either integer or floating point of some size A vector register contains an array of these items that can be added with array arithmetic Copyright © Curt Hill

Crays Most of the early Cray super computers were vector processors
Cray also makes MIMDs Programmed more like a regular processor There was usually a vector load/store instruction The number of values in a vector register was usually modest: 4-8 This made the cost more reasonable The performance was not so lopsided for vector operations Copyright © Curt Hill

Commercially The market for these sorts of array and vector processors is very limited There are few organizations that will always be able to utilize them Usually national laboratories and sophisticated engineering companies In general it is a niche market However there are some common examples as well Copyright © Curt Hill

Intel MMX instructions
The Pentium should not be considered a vector processor Yet it has vector operations in the MMX subset The SSE sets extend these These allow one 32 bit register to be considered four eight-bit registers or two 16 bit registers This allows array processing of 8 bit pixels or 16 bit sound samples Copyright © Curt Hill

GPU The graphics processing unit is the most common vector processor
The pixel manipulation present in a GPU is an ideal SIMD environment Shading, for example, can be easily done in parallel Lets consider one GPU: ATI Radeon HD 4870 This is now several years old They are faster now Copyright © Curt Hill

Radeon HD 4870 There are 10 cores Each core has 256 registers
Each is a SIMD core Each core has 256 registers Each of these registers is actually a vector register of size 64 The contents of one of these slots is a 4 byte float Multiply this out and it is 2.5MB of register storage Copyright © Curt Hill

Exploiting the GPU There is substantial power sitting in the GPU
If 3D moving displays (such as games) or video playing most of this power is sitting idle A number of options are now available to use this for scientific computing GPGPU – General Purpose computing on Graphics Processing Unit Copyright © Curt Hill

Super Computers A number of groups have organized clusters of GPUs to achieve super computers Example: Chinese Mole 8.5 (2011) 2200 NVIDIA Tesla GPUs Used to simulate an H1N1 influenza virus Copyright © Curt Hill

Finally The scientific big computers are a niche market
Supercomputers have been fabricated using clusters of GPUs This is likely the future of SIMD Copyright © Curt Hill

Single Instruction Multiple Data

Similar presentations

Presentation on theme: "Single Instruction Multiple Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Single Instruction Multiple Data

Similar presentations

Presentation on theme: "Single Instruction Multiple Data"— Presentation transcript:

Similar presentations

About project

Feedback