Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6); }
A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors: –execution time –scalability –efficiency
A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors: n Also must take into account the costs: –memory requirements –implementation costs –maintenance costs etc.
A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors: n Also must take into account the costs: n Mathematical performance models are used to asses these costs and predict performance.
Defining Performance n How do you define parallel performance? n What do you define it in terms of? n Consider –Distributed databases –Image processing pipeline –Nuclear weapons testbed
Amdahl's Law n Every algorithm has a sequential component. n Sequential component limits speedup Sequential Component Maximum Speedup = 1/s = s
Amdahl's Law s Speedup
What's wrong? n Works fine for a given algorithm. –But what if we change the algorithm? n We may change algorithms to increase parallelism and thus eventually increase performance. –May introduce inefficiency
Metrics for Performance n Efficiency n Speedup n Scalability n Others …………..
Efficiency pT p T1T1 E The fraction of time a processor spends doing useful work n What about when pT p < T 1 –Does cache make a processor work at 110%?
Speedup SpeedP Speed S 1 What is Speed? What algorithm for Speed1? What is the work performed? How much work?
Two kinds of Speedup n Relative –Uses parallel algorithm on 1 processor –Most common n Absolute –Uses best known serial algorithm –Eliminates overheads in calculation.
Speedup n Algorithm A –Serial execution time is 10 sec. –Parallel execution time is 2 sec. n Algorithm B –Serial execution time is 2 sec. –Parallel execution time is 1 sec. n What if I told you A = B?
Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of logic is the syllogism, consisting of a major and minor premise and a conclusion.
Example n Major Premise: Sixty men can do a piece of work sixty times as quickly as one man. n Minor Premise: One man can dig a post- hole in sixty seconds. n Conclusion: Sixty men can dig a post-hole in one second.
Performance Analysis Statements n There is always a trade-off between time and solution quality. n We should compare the quality of the answer for a given execution time. n For any performance reporting, find and clearly state the quality measure.
Speedup n Conventional speedup is defined as the reduction in execution time. n Consider running a problem on a slow parallel computer and on a faster one. –Same serial component –Speedup will be lower on the faster computer.
Speedup and Amdahl's Law n Conventional speedup penalizes faster absolute speed. n Assumption that task size is constant as the computing power increases results in an exaggeration of task overhead. n Scaling the problem size reduces these distortion effects.
Solution n Gustafson introduces scaled speedup. n Scale the problem size as you increase the number of processors. n Calculated in two ways –Experimentally –Analytical models
Traditional Speedup )( )( 1 NT NT Speedup P T 1 is time taken on a single processor T P is time taken on P processors
Scaled Speedup )( )( 1 PNT T Speedup P T 1 is time taken on a single processor T P is time taken on P processors
Scaled Speedup vs Traditional
Traditional Speedup ideal measured Number of Processors Speedup
Scaled Speedup ideal Number of Processors Speedup Small problem Medium problem Large Problem
Performance Measurement n There is not a perfect way to measure and report performance. n Wall clock time seems to be the best. n But how much work do you do? n Best Bet: –Develop a model that fits experimental results.
A Parallel Programming Model n Goal: Define an equation that predicts execution time as a function of –Problem size –Number of processors –Number of tasks –Etc.,....),(PNfT
A Parallel Programming Model n Execution time can be broken up into –Computing –Communicating –Idling P i i idle P i i comm P i i comp TTT P T
Computation Time n Normally depends on problem size n Also depends on machine characteristics –Processor speed –Memory system –Etc. n Often, experimentally obtained
Communication Time n The amount of time spent sending & receiving messages n Most often is calculated as –Cost of sending a single message * #messages n Single message cost –T = startuptime + time_to_send_one_word * #words
Idle Time n Difficult to determine n This is often the time waiting for a message to be sent to you. n Can be avoided by overlapping communication and computation.
Finite Difference Example n Finite Difference Code n 512 x 512 x 5 Elements n Nine-point stencil n Row-wise decomposition –Each processor gets n/p*n*z elements n 16 IBM RS6000 workstations n Connected via Ethernet znn
Finite Difference Model n Execution Time (per iteration) –ExTime = (Tcomp + Tcomm)/P n Communication Time (per iteration) –Tcomm = 2 (lat + 2*n*z*bw) n Computation Time –Estimate using some sample code
Estimated Performance
Finite Difference Example
What was wrong? n Ethernet –Shared bus n Change the computation of Tcomm –Reduce the bandwith –Scale the message volume by the number of processors sending concurrently. –Tcomm = 2 (lat + 2*n*z*bw * P/2)
Finite Difference Example
Using analytical models n Examine the control flow of the algorithm n Find a general algebraic form for the complexity (execution time). n Fit the curve with experimental data. n If the fit is poor, find the missing terms and repeat. n Calculate the scaled speedup using formula.
Example n Serial Time = N seconds n Parallel Time = N/P + 5P seconds n Let N/P = 128 n Scaled Speedup for 4 processors is: )4(5)4/)128(4(124 ))128(4(122 )( )( 1 PNC C P
Performance Evaluation n Identify the data n Design the experiments to obtain the data n Report data
Performance Evaluation n Identify the data –Execution time –Be sure to examine a range of data points n Design the experiments to obtain the data n Report data
Performance Evaluation n Identify the data n Design the experiments to obtain the data –Make sure the experiment measures what you intend to measure. –Remember: Execution time is max time taken. –Repeat your experiments many times –Validate data by designing a model n Report data
Performance Evaluation n Identify the data n Design the experiments to obtain the data n Report data –Report all information that affects execution –Results should be separate from Conclusions –Present the data in an easily understandable format.