Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3.

Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3

RZ: Christian TerbovenFolie 2  Basic Concepts  Parallelization Strategies  Possible Issues  Efficient Parallelization Agenda

RZ: Christian TerbovenFolie 3 Basic Concepts

RZ: Christian TerbovenFolie 4  Process: Instance of a computer program  Thread: A sequence of instructions (own Program Counter)  Parallel / Concurrent execution:  Multiple Processes  Multiple Threads  Per process: Address Space, Files, Environment, …  Per thread: Program Counter, Stack, … Processes and Threads process thread program counter computer

RZ: Christian TerbovenFolie 5  Even on a multi-socket multi-core system you should not make any assumption which process / thread is executed when and where!  Two threads on one core:  Two threads on two cores: Process and Thread Scheduling by the OS Thread1 Thread 2 System thread thread migration “pinned” threads

RZ: Christian TerbovenFolie 6  Memory can be accessed by several threads running on different cores in a multi-socket multi-core system: Shared Memory Parallelization a=4 CPU 1 CPU 2 a c=3+a

RZ: Christian TerbovenFolie 7 Parallelization Strategies

RZ: Christian TerbovenFolie 8  Chances for concurrent execution:  Look for tasks that can be executed simultaneously (task parallelism)  Decompose data into distinct chunks to be processed independently (data parallelism) Finding Concurrency Organize by task Task parallelism Divide and conquer Organize by data decomposition Geometric decomposition Recursive data Organize by flow of data Pipeline Event-based coordination

RZ: Christian TerbovenFolie 9 Divide and conquer Problem Subproblem split Subproblem split Subsolution solve Subsolution merge Solution merge

RZ: Christian TerbovenFolie 10 Geometric decomposition Example: ventricular assist device (VAD)

RZ: Christian TerbovenFolie 11  Analogy assembly line  Assign different stages to different PEs Pipeline C1C2C3C4C5 C1C2C3C4C5 C1C2C3C4C5 C1C2C3C4C5 Stage 1 Stage 2 Stage 3 Stage 4 Time

RZ: Christian TerbovenFolie 12  Recursive data pattern  Parallelize operations on recursive data structures  See example later on: Fibonacci  Event-based coordination pattern  Decomposition into semi-independent tasks interacting in an irregular fashion  See example later on: While-loop with tasks Further parallel design patterns

RZ: Christian TerbovenFolie 13 Possible Issues

RZ: Christian TerbovenFolie 14  Data Race: Concurrent access of the same memory location by multiple threads without proper synchronization  Let variable x have the initial value 1  Depending on which thread is faster, you will see either 1 or 5  Result is nondeterministic (i.e. depends on OS scheduling)  Data Races (and how to detect and avoid them) will be covered in more detail later! Data Races / Race Conditions x=5; printf(x);

RZ: Christian TerbovenFolie 15  Serialization: When threads / processes wait „too much“  Limited scalability, if at all  Simple (and stupid) example: Serialization Send Recv Data Transfer Send Recv Data Transfer Send Recv Data Transfer Calc Wait

RZ: Christian TerbovenFolie 16 Efficient Parallelization

RZ: Christian TerbovenFolie 17  Overhead introduced by the parallelization:  Time to start / end / manage threads  Time to send / exchange data  Time spent in synchronization of threads / processes  With parallelization:  The total CPU time increases,  The Wall time decreases,  The System time stays the same.  Efficient parallelization is about minimizing the overhead introduced by the parallelization itself! Parallelization Overhead

RZ: Christian TerbovenFolie 18 Load Balancing perfect load balancing time load imbalance time – All threads / processes finish at the same time – Some threads / processes take longer than others – But: All threads / processes have to wait for the slowest thread / process, which is thus limiting the scalability

RZ: Christian TerbovenFolie 19  Time using 1 CPU:T(1)  Time using p CPUs:T(p)  Speedup S:S(p)=T(1)/T(p)  Measures how much faster the parallel computation is!  Efficiency E: E(p)=S(p)/p  Example:  T(1)=6s, T(2)=4s  S(2)=1.5  E(2)=0.75  Ideal case: T(p)=T(1)/p  S(p)=p E(p)=1.0 Speedup and Efficiency

RZ: Christian TerbovenFolie 20  Describes the influence of the serial part onto scalability (without taking any overhead into account).  S(p)=T(1)/T(p)=T(1)/(f * T(1) + (1-f) * T(1)/p)=1/(f+(1-f)/p)  f:serial part (0  f  1)  T(1) :time using 1 CPU  T(p):time using p CPUs  S(p):speedup; S(p)=T(1)/T(p)  E(p):efficiency; E(p)=S(p)/p  It is rather easy to scale to a small number of cores, but any parallelization is limited by the serial part of the program! Amdahl‘s Law

RZ: Christian TerbovenFolie 21  If 80% (measured in program runtime) of your work can be parallelized and „just“ 20% are still running sequential, then your speedup will be: Amdahl‘s Law illustrated 1 processor: time: 100% speedup: 1 2 processors: time: 60% speedup: 1.7 4 processors: time: 40% speedup: 2.5  processors: time: 20% speedup: 5

RZ: Christian TerbovenFolie 22  After the initial parallelization of a program, you will typically see speedup curves like this: Speedup in Practice speedup 12345678... 1 2 3 4 5 6 7 8 Ideal speedup S(p)=p Realistic speedup p Speedup according to Amdahl’s law

Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3.

Similar presentations

Presentation on theme: "Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3.

Similar presentations

Presentation on theme: "Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven 20.11.2012 / Aachen, Germany Stand: 19.11.2012 Version 2.3."— Presentation transcript:

Similar presentations

About project

Feedback