Department of Computer Science University of the West Indies
How did we learn to fly ? By constructing a machine that flaps its wings like a bird ? Answer By applying aerodynamics principles demonstrated by the nature... Likewise we model parallel processing after those of biological species. Computing Components
1.Aggregated speed with which complex calculations carried out by neurons 2.Individual response is slow (measured in ms). This demonstrates the feasibility of parallel processing. Motivating Factors
PPPPPP Microkernel Multi-Processor Computing System Threads Interface Hardware Operating System Process Processor Thread P P Applications Programming paradigms Computing Components
Simple classification by Flynn: (No. of instruction and data streams) > SISD - conventional > SIMD - data parallel, vector computing > MISD - > MIMD - very general, multiple approaches. Current focus is on MIMD model, using general purpose processors. Processing Elements
SISD : A Conventional Computer Speed is limited by the rate at which computer can transfer information internally. Processor Data Input Data Output Instructions Examples: PC, Macintosh, Workstations
The MISD Architecture More of an intellectual exercise than a practical configuration. Few built, but commercially not available Data Input Stream Data Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction Stream C
SIMD Architecture Examples: CRAY machine vector processing, Thinking machine CM Intel MMX (multimedia support) Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C
Unlike SIMD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C
MEMORYMEMORY BUSBUS Shared Memory MIMD machine Comm: Source PE writes data to GM & destination retrieves it Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandibility. A memory component or any processor failure affects the whole system. Increase of processors leads to scalability problems. Examples : Silicon graphics supercomputers.... MEMORYMEMORY BUSBUS Global Memory System Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS
SMM Examples qDual and quad Pentiums qPower Mac G5s q Dual processor (2 GHz each)
Quad Pentium Shared Memory Multiprocessor Processor L1 cache L2 cache Bus interface Processor L1 cache L2 cache Bus interface Processor L1 cache L2 cache Bus interface Processor L1 cache L2 cache Bus interface Processor/ memory bus I/O interface Memory controller Memory I/O bus Shared memory
qAny memory location is accessible by any of the processors qA single address space exists, meaning that each memory location is given a unique address within a single range of addresses qGenerally shared memory programming is more convenient although it does require access to shared data to be controlled by the programmer qInter-process communication is done in the memory interface through reads and writes. qVirtual memory address maps to a real address.
Shared Memory Address Space qDifferent processors may have memory locally attached to them. qDifferent instances of memory access could take different amounts of time. Collisions are possible. qUMA (i.e., shared memory) vs. NUMA (i.e., distributed shared memory)
Building Shared Memory systems Building SMM machines with more than 4 processors is very difficult and very expensive e.g. Sun Microsystems E10000 “Starfire” server q 64 processors q Price: $US several million
MEMORYMEMORY BUSBUS Distributed Memory MIMD lCommunication : IPC on High Speed Network. lNetwork can be configured to... Tree, Mesh, Cube, etc. lUnlike Shared MIMD easily/ readily expandable Highly reliable (any CPU failure does not affect the whole system) Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS MEMORYMEMORY BUSBUS Memory System A Memory System A Memory System B Memory System B Memory System C Memory System C IPC channel IPC channel
Distributed Memory Decentralized memory (memory module with CPU) Lower memory latency Drawbacks Longer communication latency Software model more complex
Decentralized Memory versions Message passing "multi-computer" with separate address space per processor Can invoke software with Remote Procedure Call (RPC) Often via library, such as MPI: Message Passing Interface Also called “synchronous communication" since communication causes synchronization between 2 processes
Message Passing System qInter-process communication is done at the program level using sends and receives. qReads and writes refer only to a processor’s local memory. qData can be packed into long messages before being sent, to compensate for latency. qGlobal scheduling of messages can help avoid message collisions.
MIMD program structure Multiple Program Multiple Data (MPMD) Each processor will have its own program to execute Single Program Multiple Data (SPMD) A single source program is written, and each processor executes its own personal copy of the program
Speedup factor S(n) = Execution time on a single processor Execution time on a multiprocessor with n processors S(n) gives increase in speed by using a multiprocessor Speedup factor can also be cast in terms of computational steps S(n) = Number of steps using one processor Number of parallel steps using n processors Maximum speedup is n with n processors (linear speedup) - this theoretical limit is not always achieved
Maximum Speedup - Amdahl’s Law Serial sectionParallelizable sections tsts ft s (1-f)t s One processor Multiple processors (1-f)t s /n tptp S(n) = n 1 + f(n-1)
Parallel Architectures Function-parallel architectures Instruction level PAs Thread level PAs Process level PAs (MIMDs) Distributed Memory MIMD Shared Memory MIMD Data-parallel architectures