Distributed and Parallel Processing George Wells
Performance Analysis Sources of performance loss – Overheads – not present in sequential program – Sequential code – Idle processors (load balancing) – Contention for resources
Parallel Computing Terminology The worst-case time complexity of a parallel algorithm is a function f(n) that is the maximum over all inputs of size n. The cost of a parallel algorithm is defined as its complexity times p, the number of processors.
Parallel Computing Terminology The speedup achieved by a parallel algorithm running on p processors is the ratio between the time taken on the parallel computer executing the fastest sequential algorithm and executing the parallel algorithm with p processors The efficiency of a parallel algorithm running p processors is speedup / p
Speedup Linear speedup: s = p – Efficiency = 100% Superlinear speedup: s > p – Unusual (but wonderful!) Usually: – s < p – Efficiency < 100%
Amdahl’s Law If f is the inherently sequential fraction of a computation to be solved by p processors, then the speedup s is limited according to the formula:
The ≤ symbol represents both sequential overhead, and non-ideal load balancing. f = fraction of the overall computation that is inherently sequential Part that can be done in parallel
Assume f = 0 i.e. you can’t get better than linear speedup (according to Amdahl)
This time try substituting for f = 0.1, and p = ∞ The sequential component is a limiting factor
Suppose that a parallel computer E has been built by using the herd-of-elephants approach, where each processor in the computer is capable of performing sequential operations at the rate of X megaflops. Suppose that another parallel computer A has been built by using the army-of-ants approach; each processor in this computer is capable of performing sequential operations at the rate of β X megaflops, for some β where 0 < β << 1. If parallel computer A attempts a computation whose inherently sequential fraction f is greater than β, then A will execute the algorithm more slowly than a single processor of parallel computer E. Why? Because the time taken to execute the sequential part of A is greater than the time taken to execute the entire algorithm on E.
If a parallel computer built by using the army-of- ants approach is to be successful, one of the following conditions must be true: – at least one processor must be capable of extremely fast sequential computations, or – the problem being solved must admit a solution containing virtually no sequential components. Computers capable of high processing speeds must have high memory bandwidth and high I/O rates too.
Multithreading
Multithreading Example: Mandelbrot Set A fractal calculation – embarrassingly parallel The value of each point in a 2D space can be calculated independently of any other – Values are false-coloured to give graphical representation
Mandelbrot Set
Sequential Code A little complicated by graphics event handling in Java
Parallel Code Can create threads to perform the calculations Threads in Java: – Implement Runnable interface Specifies one method: public void run (); – Create Thread object – Call start method In turn calls the run method
Creating Threads in Java Implement the Runnable interface Can also extend the Thread class – Override the run() method First approach is preferred – Single-inheritance limitation in Java – Better OO style is-a relationship implied by inheritance
Java Threads We can think of a thread as having three parts: – A virtual CPU (provides execution ability) – Code – Data
Mandelbrot Application WorkerThread w = new WorkerThread(i, i+taskSize); Thread t = new Thread(w); t.start(); class WorkerThreadw Thread t
Threads in Java Simplified lifecycle New Runnable Running Dea d Blocked Blocking event Unblocked Scheduler run() completes start() Thread scheduling is preemptive, but not necessarily time-sliced – Use Thread.sleep(x) or Thread.yield()