Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 1 – Parallel Programming Primer

Similar presentations


Presentation on theme: "Lecture 1 – Parallel Programming Primer"— Presentation transcript:

1 Lecture 1 – Parallel Programming Primer
CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

2 Outline Introduction to Parallel Computing Parallel Architectures
Parallel Algorithms Common Parallel Programming Models Message-Passing Shared Address Space

3 Introduction to Parallel Computing
Moore’s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer assumption

4 Introduction to Parallel Computing
Impediments to Moore’s Law Theoretical Limit What to do with all that die space? Design complexity How do you meet the expected performance increase?

5 Introduction to Parallel Computing
Parallelism Continue to increase performance via parallelism.

6 Introduction to Parallel Computing
From a software point-of-view, need to solve demanding problems Engineering Simulations Scientific Applications Commercial Applications Need the performance, resource gains afforded by parallelism

7 Introduction to Parallel Computing
Engineering Simulations Aerodynamics Engine efficiency

8 Introduction to Parallel Computing
Scientific Applications Bioinformatics Thermonuclear processes Weather modeling

9 Introduction to Parallel Computing
Commercial Applications Financial transaction processing Data mining Web Indexing

10 Introduction to Parallel Computing
Unfortunately, greatly increases coding complexity Coordinating concurrent tasks Parallelizing algorithms Lack of standard environments and support

11 Introduction to Parallel Computing
The challenge Provide the abstractions , programming paradigms, and algorithms needed to effectively design, implement, and maintain applications that exploit the parallelism provided by the underlying hardware in order to solve modern problems.

12 Parallel Architectures
Standard sequential architecture RAM CPU BUS Bottlenecks

13 Parallel Architectures
Use multiple Datapaths Memory units Processing units

14 Parallel Architectures
SIMD Single instruction stream, multiple data stream Processing Unit Interconnect Processing Unit Control Unit Processing Unit Processing Unit Processing Unit

15 Parallel Architectures
SIMD Advantages Performs vector/matrix operations well EX: Intel’s MMX chip Disadvantages Too dependent on type of computation EX: Graphics Performance/resource utilization suffers if computations aren’t “embarrasingly parallel”.

16 Parallel Architectures
MIMD Multiple instruction stream, multiple data stream Processing/Control Unit Interconnect Processing/Control Unit Processing/Control Unit Processing/Control Unit

17 Parallel Architectures
MIMD Advantages Can be built with off-the-shelf components Better suited to irregular data access patterns Disadvantages Requires more hardware (!sharing control unit) Store program/OS at each processor Ex: Typical commodity SMP machines we see today.

18 Parallel Architectures
Task Communication Shared address space Use common memory to exchange data Communication and replication are implicit Message passing Use send()/receive() primitives to exchange data Communication and replication are explicit

19 Parallel Architectures
Shared address space Uniform memory access (UMA) Access to a memory location is independent of which processing unit makes the request. Non-uniform memory access (NUMA) Access to a memory location depends on the location of the processing unit relative to the memory accessed.

20 Parallel Architectures
Message passing Each processing unit has its own private memory Exchange of messages used to pass data APIs Message Passing Interface (MPI) Parallel Virtual Machine (PVM)

21 Parallel Algorithms Algorithm Parallel Algorithm
a sequence of finite instructions, often used for calculation and data processing. Parallel Algorithm An algorithm that which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result

22 Parallel Algorithms Challenges
Identifying work that can be done concurrently. Mapping work to processing units. Distributing the work Managing access to shared data Synchronizing various stages of execution.

23 Parallel Algorithms Models
A way to structure a parallel algorithm by selecting decomposition and mapping techniques in a manner to minimize interactions.

24 Parallel Algorithms Models Data-parallel Task graph Work pool
Master-slave Pipeline Hybrid

25 Parallel Algorithms Data-parallel Mapping of Work Mapping of Data
Static Tasks -> Processes Mapping of Data Independent data items assigned to processes (Data Parallelism)

26 Parallel Algorithms Data-parallel Computation Load Balancing
Tasks process data, synchronize to get new data or exchange results, continue until all data processed Load Balancing Uniform partitioning of data Synchronization Minimal or barrier needed at end of a phase Ex: Ray Tracing

27 Parallel Algorithms Data-parallel O P D D O P D D O P D O P O P

28 Parallel Algorithms Task graph Mapping of Work Mapping of Data Static
Tasks are mapped to nodes in a data dependency task dependency graph (Task parallelism) Mapping of Data Data moves through graph (Source to Sink)

29 Parallel Algorithms Task graph Computation Load Balancing
Each node processes input from previous node(s) and send output to next node(s) in the graph Load Balancing Assign more processes to a given task Eliminate graph bottlenecks Synchronization Node data exchange Ex: Parallel Quicksort, Divide and Conquer approaches

30 Parallel Algorithms Task graph P P P P O D P D O D P

31 Parallel Algorithms Work pool Mapping of Work/Data Computation
No desired pre-mapping Any task performed by any process Computation Processes work as data becomes available (or requests arrive)

32 Parallel Algorithms Work pool Load Balancing Synchronization
Dynamic mapping of tasks to processes Synchronization Adding/removing work from input queue Ex: Web Server

33 Parallel Algorithms Work pool Work Pool Output queue Input queue P P P

34 Parallel Algorithms Master-slave Modification to Worker Pool Model
One or more Master processes generate and assign work to worker processes Load Balancing A Master process can better distribute load to worker processes

35 Parallel Algorithms Pipeline Mapping of work Mapping of Data
Processes are assigned tasks that correspond to stages in the pipeline Static Mapping of Data Data processed in FIFO order Stream parallelism

36 Parallel Algorithms Pipeline Computation Load Balancing
Data is passed through a succession of processes, each of which will perform some task on it Load Balancing Insure all stages of the pipeline are balanced (contain the same amount of work) Synchronization Producer/Consumer buffers between stages Ex: Processor pipeline, graphics pipeline

37 Parallel Algorithms Pipeline Input queue Output queue buffer buffer P

38 Common Parallel Programming Models
Message-Passing Shared Address Space

39 Common Parallel Programming Models
Message-Passing Most widely used for programming parallel computers (clusters of workstations) Key attributes: Partitioned address space Explicit parallelization Process interactions Send and receive data

40 Common Parallel Programming Models
Message-Passing Communications Sending and receiving messages Primitives send(buff, size, destination) receive(buff, size, source) Blocking vs non-blocking Buffered vs non-buffered Message Passing Interface (MPI) Popular message passing library ~125 functions

41 Common Parallel Programming Models
Message-Passing send(buff1, 1024, p3) receive(buff3, 1024, p1) Workstation Workstation Workstation Workstation P1 P2 P3 P4 Data

42 Common Parallel Programming Models
Shared Address Space Mostly used for programming SMP machines (multicore chips) Key attributes Shared address space Threads Shmget/shmat UNIX operations Implicit parallelization Process/Thread communication Memory reads/stores

43 Common Parallel Programming Models
Shared Address Space Communication Read/write memory EX: x++; Posix Thread API Popular thread API Operations Creation/deletion of threads Synchronization (mutexes, semaphores) Thread management

44 Common Parallel Programming Models
Shared Address Space Workstation T1 T2 T3 T4 Data RAM SMP

45 Parallel Programming Pitfalls
Synchronization Deadlock Livelock Fairness Efficiency Maximize parallelism Reliability Correctness Debugging

46 Prelude to MapReduce MapReduce is a paradigm designed by Google for making a subset (albeit a large one) of distributed problems easier to code Automates data distribution & result aggregation Restricts the ways data can interact to eliminate locks (no shared state = no locks!)

47 Prelude to MapReduce Next time…
MapReduce parallel programming paradigm

48 References Introduction to Parallel Computing, Grama et al., Pearson Education, 2003 Distributed Systems Distributed Computing Principles and Applications, M.L. Liu, Pearson Education 2004


Download ppt "Lecture 1 – Parallel Programming Primer"

Similar presentations


Ads by Google