Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Similar presentations


Presentation on theme: "1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures."— Presentation transcript:

1 1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures into four classes, based on how many instruction and data streams can be observed in the architecture. They are:  SISD - Single Instruction, Single Data Operate sequentially on a single stream of instructions in single memory. Classic “Von Neumann” architecture. Machines may still consist of multiple processors, operating on independent data - these can be considered as multiple SISD systems.  SIMD - Single Instruction, Multiple Data A single instruction stream (broadcast to all PE*s), acting on multiple data. The most common form of this architecture class are Vector processors. These can deliver results several times faster than scalar processors. * PE = Processing Element

2 2 Computer Science, University of Warwick Architecture Classifications  MISD - Multiple instruction, Single data  There is a debate about whether the architecture with uniformly shared memory and separated cache is MISD or MIMD (MIMD is favoured)  No other practical implementations of this architecture.  MIMD - Multiple instruction, Multiple data  Independent instruction streams, acting on different (but related) data  Note the difference between multiple SISD and MIMD

3 3 Computer Science, University of Warwick Architecture Classifications MIMD: SMP, NUMA, MPP, Cluster SISD: Machine with a single scalar processor SIMD: Machine with vector processors

4 4 Computer Science, University of Warwick Architecture Classifications Shared memory (uniform memory access)  Processors share access to a common memory space. Implemented over a shared memory bus or communication network.  Memory locks required  Local cache is critical: If not, bus contention (or network traffic) reduces the systems efficiency. For this reason, pure shared memory systems do not scale (Scalability is the measure of how well the system performance improves linearly to the number of processing elements) Naturally, cache introduces problems of coherency (ensuring that stale cache lines are invalidated when other processors alter shared memory).  Support for critical sections are required Shared Memory Interconnect PE 0 PE n

5 5 Computer Science, University of Warwick Architecture Classifications Shared memory (Non- uniform memory access)  PE may be fetching from local or remote memory - hence non-uniform access times. NUMA cc-NUMA (cache-coherent Non-Uniform Memory Access)  Groups of processors are connected together by a fast interconnect (SMP)  These are then connected together by a high-speed interconnect.  Global address space. PE (m-1)n+1 PE m.n Shared Memory m Interconnect PE 1 PE n Shared Memory 1

6 6 Computer Science, University of Warwick Architecture Classifications Distributed Memory  Each processor has it’s own local memory.  When processors need to exchange (or share data), they must do this through an explicit communication Message passing (MPI language)  Typically larger latencies between PEs (especially if they communicate via over- network interconnections).  Scalability, however, is good if the problems can be sufficiently contained within PEs.  Typically, coarse-grained work units are distributed. Interconnect M0M0 MnMn PE 0 PE n

7 7 Computer Science, University of Warwick In-processor Parallelism Pipelines  Instruction pipelines Reduces the idle time of hardware components. Good performance with independent instructions.  Performing more operations per clock cycle.  Discrepancy between peak and actual performance often caused by pipeline effects Difficult to keep pipelines full. Branch prediction helps.

8 8 Computer Science, University of Warwick In-processor Parallelism Vector architectures  Fast I/O - powerful busses and interconnections.  Large memory bandwidth and low latency access.  No cache because of above.  Perform operations involving large matrices, commonly encountered in engineering areas

9 9 Computer Science, University of Warwick In-processor Parallelism Commodity processors increasingly provide performance as good as dedicated Vector processors  Price/performance is also far better.  Commodity processors now offer good performance for vectorizable code.  Explicit support for vectorization with SIMD instructions on COTS processors Altivec on PowerPC SSE (Streaming SIMD Extension) on x86

10 10 Computer Science, University of Warwick Multiprocessor Parallelism Use multiple processors on the same program:  Divide workload up between processors.  Often achieved by dividing up a data structure.  Each processor works on it’s own data.  Typically processors need to communicate. Shared or distributed memory is one approach Explicit messaging is increasingly common.  Load balancing is critical for maintaining good performance.

11 11 Computer Science, University of Warwick Multiprocessor Parallelism CPU Mem CPU Mem CPU Single ProcessorSymmetric Multiprocessor with Shared Memory MPP System CPU Mem CPU Mem CPU Mem Net

12 12 Computer Science, University of Warwick Clusters Built using COTS components.  Brought about by improved processor speed as well as networking and switching technology.  Mass-produced commodity off-the-shelf (COTS) hardware, rather than expensive proprietary hardware built solely for supercomputers.

13 13 Computer Science, University of Warwick Clusters Clusters are simpler to manage:  Single image, single identity  Often run familiar operating systems. Linux is probably the most popular Commodity compilers and support  Node for node swap-out on failure.  Can run multi-processor parallel tasks.  Or run sequential tasks for multiple users (job-level parallelism).

14 14 Computer Science, University of Warwick Clusters Clustering of SMPs  Attractive method of achieving high performance.  SMPs reduce the network overhead

15 15 Computer Science, University of Warwick Parallel Efficiency Main issues that effect parallel efficiency are:  Ratio of computation to communication Higher computation usually yields better performance.  Communication bandwidth & latency Latency has the biggest impact.  Scalability How does the bandwidth & latency scale with the number of processors.

16 16 Computer Science, University of Warwick Dependency and Parallelism  Granularity of parallelism: the size of the computations that are being performed in parallel  Four types of parallelism (in order of granularity size)  Instruction-level parallelism (e.g. pipeline)  Thread-level parallelism (e.g. run a multi-thread java program)  Process-level parallelism (e.g. run an MPI job in a cluster)  Job-level parallelism (e.g. run a batch of independent single- processor jobs in a cluster)

17 17 Computer Science, University of Warwick Dependency and Parallelism  Dependency: If event A must occur before event B, then B is dependent on A  Two types of Dependency  Control dependency: waiting for the instruction which controls the execution flow to be completed IF (X!=0) Then Y=1.0/X: Y has the control dependency on X!=0  Data dependency: dependency because of calculations or memory access Flow dependency: A=X+Y; B=A+C; Anti-dependency: B=A+C; A=X+Y; Output dependency: A=2; X=A+1; A=5;

18 18 Computer Science, University of Warwick Identifying Dependency  Draw a Directed Acyclic Graph (DAG) to identify the dependency among a sequence of instructions  Anti-dependency: a variable appears as a parent in a calculation and then as a child in a later calculation  Output dependency: a variable appears as a child in a calculation and then as a child again in a later calculation


Download ppt "1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures."

Similar presentations


Ads by Google