Parallel System Performance CS 524 – High-Performance Computing.

Parallel System Performance CS 524 – High-Performance Computing

CS 524 (Au 05-06)- Asim Karim @ LUMS2 Parallel System Performance Parallel system = algorithm + hardware Measure of problem  Problem size: e.g. the dimension N in vector and matrix computations  Floating point operations  Execution time Measure of hardware  Number of processors, p  Interconnection network performance (channel bandwidth, cost, diameter, etc)  Memory system characteristics (sizes, bandwidth, etc)

CS 524 (Au 05-06)- Asim Karim @ LUMS3 Sources of Overhead in Parallel Programs Interprocess interaction  Typically the most significant parallel overhead Idling  Proecssor may become idle because of load imbalance, synchronization, and presence of serial computation Excess computations  Difference in computation performanced by the parallel program and the best sequential program is the excess computation overhead incurred by the parallel program

CS 524 (Au 05-06)- Asim Karim @ LUMS4 Performance Metrics Execution time  Serial run time is the time elapsed between the beginning and the end of execution on a sequential computer (T S )  Parallel run time is the time that elapses from the moment parallel execution starts to the moment the last processor finishes execution (T P ) Total parallel overhead Speedup (S): the ratio of the serial execution time of the best sequential algorithm to the parallel execution time Efficiency (E): the effective fractional utilization of parallel hardware Cost (C): the sum of times each processor spends on the problem

CS 524 (Au 05-06)- Asim Karim @ LUMS5 Total Parallel Overhead Parallel overhead is encapsulated into a single expression referred to as the overhead function Overhead function (or total overhead), T o, of a parallel system is defined as the total time collectively spent by all the processing elements over and above that required by the fastest known serial algorithm for solving the same problem on a single processing element T o = pT p – T s

CS 524 (Au 05-06)- Asim Karim @ LUMS6 Speedup Speedup, S = T S /T P  Measures benefit of parallelizing program  Usually less than number of processors, p (sublinear speedup)  Can S be greater than p (super linear speedup)? p S sublinear (typical) linear superlinear

CS 524 (Au 05-06)- Asim Karim @ LUMS7 Efficiency and Cost Efficiency, E = S/p  Measures utilization of processors for problem computation only  Usually ranges from 0 to 1  Can efficiency be greater than 1? Cost, C = pT P (also known as work or processor-time product)  Measures sum of times spent by each processor  Cost-optimal: cost of solving a problem on a parallel computer is proportional to the execution time of the fastest known sequential algorithm on a single processor E = T S /C

CS 524 (Au 05-06)- Asim Karim @ LUMS8 Amdahl’s Law Let W = work needed to solve a problem and W S = work that is serial (i.e. is not parallelizable) The maximum possible speedup on p processors (assuming no superlinear speedup) is obtained as: S = W/[W S + (W – W S )/p]  If a problem has 10% of serial computation, the maximum speedup is 10  If a problem has 1% of serial computation, the maximum speedup is 100 Speedup is upper bounded by W/W S as the number of processor p increases

CS 524 (Au 05-06)- Asim Karim @ LUMS9 Execution Time In a distributed memory model, the execution time T P = t comp + t comm.  t comp : computation time  t comm : communication time for explicit send and receive of messages In a shared memory model, the execution time T P consists of computation time and communication time for memory access. Communications are not specified explicitly. Hence, execution time is CPU time, determined in a manner similar to that for sequential algorithms.

CS 524 (Au 05-06)- Asim Karim @ LUMS10 Message Passing Communication Overhead Parameters for determining communication time, t comm  Startup time (t s ): The time required to handle a message at the sending processor including the time to prepare the message, the time to execute the routing algorithm, and the time to establish an interface between the local processor and router.  Per-hop time (t h ): The time for the message header to travel between two directly connected processors. Also known as node latency.  Per-word transfer time (t w ): The time for a word to traverse a link. If the channel bandwidth is r words per second, then per-word transfer time is t w = 1/r. t comm = t s + t h + t w

CS 524 (Au 05-06)- Asim Karim @ LUMS11 Store-and-Forward Routing (1) Store-and-forward routing: a message is traversing a path with multiple links; each intermediate processor on the path forwards the message to the next processor after it has received and stored the entire message

CS 524 (Au 05-06)- Asim Karim @ LUMS12 Store-and-Forward Routing (2) Communication overhead/cost  Message size = m words  Path length = l links  Communication overhead, t comm = t s + (mt w + t h )l  Usually t h is small compared to mt w. Therefore, the communication cost is simplified to t comm = t s + mt w l

CS 524 (Au 05-06)- Asim Karim @ LUMS13 Cut-Through Routing (1) Cut-through routing: a message is forwarded at intermediate node without waiting for entire message to arrive

CS 524 (Au 05-06)- Asim Karim @ LUMS14 Cut-Through Routing (2) Wormhole routing: is cut-through routing with pipelining through the network  Message is partitioned in small pieces, called flits (flow control digits)  There is no buffering in memory; busy link causes worm to stall; deadlock may ensue Communication cost/overhead  Message size = m words  Path length = l links  Communication cost t comm = t s + mt w + lt h  Again, considering t h to be small compared to mt w, the communication cos tis simplified to t comm = t s + mt w

Parallel System Performance CS 524 – High-Performance Computing.

Similar presentations

Presentation on theme: "Parallel System Performance CS 524 – High-Performance Computing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel System Performance CS 524 – High-Performance Computing.

Similar presentations

Presentation on theme: "Parallel System Performance CS 524 – High-Performance Computing."— Presentation transcript:

Similar presentations

About project

Feedback