Introduction to parallel programming Embedded Systems MPSoC Architectures Introduction to parallel programming Alberto Bosio bosio@lirmm.fr
Definitions What is parallel programming? Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem
Serial vs parallel Serial Computing Traditionally, software has been written for serial computation: To be run on a single CPU; A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment
Serial vs parallel Simultaneous use of multiple compute resources to solve a computational problem: To be run using multiple CPUs A problem is broken into discrete parts, solved concurrently Each part is broken down to a series of instructions Instructions from each part execute simultaneously on different CPUs
Serial vs parallel
Why parallel computing? Save time and/or money Solve larger problems Provide concurrency Use of non-local resources Limits to serial computing, physical and practical Transmission speeds Miniaturization Economic limitations
Shared Memory Shared memory: all processors access all memory as a global address space. CPUs operate independently but share memory resources. Changes in a memory location effected by one processor are visible to all other processors. Two main classes based upon memory access times: UMA and NUMA.
Uniform Memory Access Uniform Memory Access (UMA): Identical processors Equal access and access times to memory Cache coherent: if one processor updates a location in shared memory, all the other processors know about the update.
Non-Uniform Memory Access Non-Uniform Memory Access (NUMA): Often made by physically linking two or more SMPs One SMP can directly access memory of another SMP Not all processors have equal access time to all memories Memory access across link is slower
Shared Memory Advantages Global address space is easy to program Data sharing is fast and uniform due to the proximity memory/CPUs Disadvantages Lack of scalability between memory and CPUs Programmer responsibility for synchronization
Distributed Memory Communication network to connect inter- processor memory. Processors have their own local memory. No concept of global address space CPUs operate independently. Change to local memory have no effect on the memory of other processors. Cache coherency does not apply. Data communication and synchronization are programmer's responsibility Connection used for data transfer varies (es. Ethernet)
Distributed Memory