Download presentation
Presentation is loading. Please wait.
Published byLucy Figgins Modified over 9 years ago
1
1.Calculate number of events by searching for event in assembly file or analytical model. 2.Validate the numbers from step one with a simulator. 3.Compare numbers with those generated by counters. 1.Calculate number of events by searching for event in assembly file or analytical model. 2.Validate the numbers from step one with a simulator. 3.Compare numbers with those generated by counters. Evaluation of Hardware Performance Counters on the R12000 Microprocessor The necessity for accurate performance counters became apparent when we began defining the resource usage of Sweep 3D, an ASCI benchmark from the DOD used to evaluate high performance computers. For years, many computer scientists have used performance counters to help find problem areas in code. This study shows that performance counters on modern microprocessors provide rudimentary performance measurements that may or may not be accurate. Below shows the methodology used to determine the accuracy of this hardware feature on the R12000 as well as results. High Performance Experiments Performance Counters are used mainly to optimize code. for i = 1 to n do for j = 1 to n do a[i j]: = a[i j] + 1 For example, this piece of code has a nested loop and accesses data in a matrix. The way the matrix is stored in memory determines the number of cache misses. Cache misses increase execution time. If this code was analyzed using performance counters and the results showed that there are many cache misses during execution of this code, the analytical model programmer could try to tune the code to decrease this miss rate and, thus, decrease execution time. To quantify the accuracy of performance counters, the number of events a program generates must be known. Thus, microbenchmarks were designed to generate events for which we could predict counts. For example, if we used the above code, we could measure the number of cache misses generated by the code. Certain types of code measure certain events. Below is a diagram of three types of microbenchmarks and the events they can generate. Microbenchmarks Two counters can count up to 30 total events, we studied nine. To generate events, use small programs, or Microbenchmarks. Two counters can count up to 30 total events, we studied nine. To generate events, use small programs, or Microbenchmarks. Based on results, conclusions are made about problem areas in code. 1. Decoded instructions 2. Decoded loads 3. Decoded stores 4. Conditional resolved branches 5. Primary instruction cache misses 6. Translation Lookaside Buffer misses 7. Primary data cache misses 8. Secondary data cache misses 9. Secondary instruction cache misses 1. Decoded instructions 2. Decoded loads 3. Decoded stores 4. Conditional resolved branches 5. Primary instruction cache misses 6. Translation Lookaside Buffer misses 7. Primary data cache misses 8. Secondary data cache misses 9. Secondary instruction cache misses Loop Linear Array Data a = 1; b = 1; c = 1; a = b + 1; b = a + 1; c = a + b; a = b + c; b = a + c; c = a + b; #define MAXSIZE 1000000 int main (int argc, char *argv[]) { int a[MAXSIZE], ARRAYSIZE, i; ARRAYSIZE = atoi(argv[1]); for (i=0; i<ARRAYSIZE;i++) a[i] = a[i] + 1;} Use grep on the assembly file to find events such as loads, stores, branches. Validate predictions Predictions Simulations Counter Data Compare results Use sim-outorder from the SimpleScalar simulation tool suite and an R12000 configuration file. Use the perfex and libperfex interfaces to access the counters. Compare the numbers from steps 1, 2, and 3. Conclusions Accuracy depends on: Per figures A-F below, counters accessed by perfex exhibit poorer accuracy than those accessed by libperfex for microbenchmarks with small numbers of events. Per D and E, cache miss counts were not accurate using either interface; Per A, load counts were accurate when the number generated events was large enough. The linear microbenchmark neither generated enough data cache misses to provide accurate counts nor did it provide accurate instruction counts. the interface used the event begin measured the application run to generate the events the interface used the event begin measured the application run to generate the events ABCDEF Methodology Wendy Korn, Senior SSEAL, Computer ScienceMentor: Dr. Patricia Teller
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.