Performance Measurement

Performance Measurement

Performance Analysis Paper and pencil.
Don’t need a working computer program or even a computer. These could be considered some of the advantages of performance analysis over performance measurement.

Some Uses Of Performance Analysis
determine practicality of algorithm predict run time on large instance compare 2 algorithms that have different asymptotic complexity e.g., O(n) and O(n2) If the complexity is O(n2), we may consider the algorithm practical even for large n. An O(2n) algorithm is practical only for small n, say n < 40. The worst-case run time of an O(n2) algorithm will quadruple with each doubling of n. So if the worst-case time is 10 sec when n = 100, it will be approximately 40 sec when n = 200.

Limitations of Analysis
Doesn’t account for constant factors. but constant factor may dominate 1000n vs n2 and we are interested only in n < 1000

Modern computers have a hierarchical memory organization with different access time for memory at different levels of the hierarchy.

Memory Hierarchy MAIN L2 L1 ALU R 8-32 32KB 512KB 512MB 1C 2C 10C 100C
1 cycle to add data that are in the registers 2 cycles to copy data from L1 cache to a register 10 cycles to copy from L2 to L1 and register L1 ALU R 8-32 32KB 512KB 512MB 1C 2C 10C 100C

Our analysis doesn’t account for this difference in memory access times. Programs that do more work may take less time than those that do less work. A program that does 100 operations on the same data would take less time than A program that performs a single operation on say 50 different pieces of data (assuming, in a simple model) that each time data is fetched from main memory, only one piece of data is fetched. The first program would make one fetch at a cost of 100 cycles and perform 100 operations at a cost of 1 cycle each for a total of 200 cycles. The second program would need 50 * 100 cycles just to fetch the data.

Performance Measurement
Measure actual time on an actual computer. What do we need?

Performance Measurement Needs
programming language working program computer compiler and options to use gcc –O2

Performance Measurement Needs
data to use for measurement worst-case data best-case data average-case data timing mechanism --- clock

Timing In C++ double clocksPerMillis = double(CLOCKS_PER_SEC) / 1000;
// clock ticks per millisecond clock_t startTime = clock(); // code to be timed comes here double elapsedMillis = (clock() – startTime) / clocksPerMillis; // elapsed time in milliseconds

Shortcoming Clock accuracy assume 100 ticks
Repeat work many times to bring total time to be >= 1000 ticks Preceding measurement code is acceptable only when the elapsed time is large relative to the accuracy of the clock.

Accurate Timing clock_t startTime = clock(); long numberOfRepetitions;
do { numberOfRepetitions++; doSomething(); } while (clock() - startTime < 1000) double elapsedMillis = (clock()- startTime) / clocksPerMillis; double timeForCode = elapsedMillis/numberOfRepetitions;

Accuracy Now accuracy is 10%.
first reading may be just about to change to startTime + 100 second reading (final value of clock())may have just changed to finishTime so finishTime - startTime is off by 100 ticks

Accuracy first reading may have just changed to startTime
second reading may be about to change to finishTime + 100 so finishTime - startTime is off by 100 ticks

Accuracy Examining remaining cases, we get trueElapsedTime =
finishTime - startTime ticks To ensure 10% accuracy, require elapsedTime = finishTime – startTime >= 1000 ticks

What Went Wrong? clock_t startTime = clock();
long numberOfRepetitions; do { numberOfRepetitions++; insertionSort(a, n); } while (clock() - startTime < 1000) double elapsedMillis = (clock()- startTime) / clocksPerMillis; double timeForCode = elapsedMillis/numberOfRepetitions; Suppose we are measuring worst-case run time. The array a Initially is in descending order. After the first iteration, the data is sorted. Subsequent iterations no longer start with worst-case data. So the measured time is for one Worst-case sort plus counter-1 best-case sorts!

The Fix clock_t startTime = clock(); long numberOfRepetitions; do {
// put code to initialize a[] here insertionSort(a, n) } while (clock() - startTime < 1000) double elapsedMillis = (clock()- startTime) / clocksPerMillis; Now the measured time is for counter number of worst-case sorts plus counter number of initializes. We can measure the time to initialize separately and subtract. Or, if We are to compare with other sort methods, we could ignore the initialize time as it is there in the measurements for all sort methods. Or we could argue that for large n the worst case time to sort using insertion sort (O(n^2)) overshadows the initialize time, which is O(n), and so we may ignore the initialize time.

Time Shared System UNIX time MyProgram
Using System.currentTimeMillis() is a problem, because your program may run for only part of the physical time that has elapsed. The UNIX command time gives the time for which your program ran.

Bad Way To Time do { counter++; startTime = clock(); doSomething();
elapsedTime += clock() - startTime; } while (elapsedTime < 1000) Suppose the clock is updated every 100ms and doSomething takes 99ms and it takes another 1ms do do everything else in the loop. If the clock is updated exactly at the start of the loop then it is not updated between the startTime = and elapsedTime += statements. So elapsedTime is always 0.

Performance Measurement

Similar presentations

Presentation on theme: "Performance Measurement"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Performance Measurement

Similar presentations

Presentation on theme: "Performance Measurement"— Presentation transcript:

Similar presentations

About project

Feedback