Download presentation
Presentation is loading. Please wait.
Published byClifford Chapman Modified over 9 years ago
1
Copyright © 2011-2014 Curt Hill SIMD Single Instruction Multiple Data
2
SIMD Only successful when the data is highly parallel There is a very large amount of time spent on array processing The array element processing is somewhat independent –Such as adding corresponding array elements of two arrays There are plenty of applications but they are specialized, usually scientific Copyright © 2011-2014 Curt Hill
3
Data level parallelism Suppose, we have two arrays of 32 floating point operands and we want to add them A single processor will go down the line summing one at a time –If it is superscalar and it has two FPUs it can do this slightly more than 16 units of time otherwise 32 Not bad but generally outperformed by array and vector processors
4
Copyright © 2011-2014 Curt Hill Array Processor Single control unit that drives multiple ALUs –The ALUs usually have individual memories In the previous case it will take 16-32 units Here if there are 32 floating point units and the vector register contains 32 slots it will take one unit When adding two scalar variables the two would be the same speed, but when adding two array variables (length<=32) then the vector processor would be 32 times faster
5
Copyright © 2011-2014 Curt Hill Why In most applications such parallelism would be a waste, but in many scientific applications an array of size 32 is pretty small and substantial use could be made of this parallelism An array processor is a large number of identical processors that perform the same instruction on different pieces of data –Single control unit for the many processors –Parallel memories for the parallel processors
6
Examples ILLIAC IV was the first in the late 60s –Largely used by NASA for fluid dynamics calculations –Very large amount of parallelism in this application Thinking Machines Connection Machine 1 and 2 Goodrich Massively Parallel Processor MasPar MP 1 and 2 Copyright © 2011-2014 Curt Hill
7
Disadvantages: Hardware heavy – expensive –Never mass produced since they fit a niche market –Register/memory configuration is unusual Difficult to program –Most languages have no support –High Performance FORTRAN is usual choice Only exceptional on truly parallel computations
8
Copyright © 2011-2014 Curt Hill Vector processor Essentially a normal processor, usually superscalar, heavily pipelined What it has different are vector registers –A normal register contains a single value, either integer or floating point of some size –A vector register contains an array of these items that can be added with array arithmetic
9
Copyright © 2011-2014 Curt Hill Crays Most of the Cray super computers were vector processors Programmed more like a regular processor –There was usually a vector load/store instruction The number of values in a vector register was usually modest: 4-8 –This made the cost more reasonable –The performance was not so lopsided on vector operations
10
Commercially The market for these sorts of array and vector processors is very limited There are few organizations that will always be able to utilize them In general it is a niche market However there are some common ones as well Copyright © 2011-2014 Curt Hill
11
Intel MMX instructions The Pentium should not be considered a vector processor Yet it has vector operations in the MMX subset –The SSE sets extend these These allow one 32 bit register to be considered four eight-bit registers or two 16 bit registers This allows array processing of 8 bit pixels or 16 bit sound samples Copyright © 2011-2014 Curt Hill
12
GPU The graphics processing unit is the most common vector processor The pixel manipulation present in a GPU is an ideal SIMD environment Shading, for example, can be easily done in parallel Lets consider one GPU: ATI Radeon HD 4870 Copyright © 2011-2014 Curt Hill
13
Radeon HD 4870 There are 10 cores –Each is SIMD Each core has 256 registers Each of these registers is actually a vector register of size 64 The contents of one of these slots is a 4 byte float Multiply this out and it is 2.5MB of register Copyright © 2011-2014 Curt Hill
14
Exploiting the GPU There is substantial power sitting in the GPU If 3D moving displays (such as games) or video playing most of this power is sitting idle A number of options are now available to use this for scientific computing GPGPU – General Purpose computing on Graphics Processing Unit Copyright © 2011-2014 Curt Hill
15
Super Computers A number of groups have organized clusters of GPUs to achieve super computers Example: Chinese Mole 8.5 (2011) –2200 NVIDIA Tesla GPUs –Used to simulate an H1N1 influenza virus Copyright © 2011-2014 Curt Hill
16
Finally The scientific big computers are a niche market Supercomputers have been fabricated using clusters of GPUs –This is likely the future of SIMD Copyright © 2011-2014 Curt Hill
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.