1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth University of Maryland Department of Computer Science
2 University of Maryland Introduction Cache behavior information is important –Processor speed increasing faster than memory Should relate cache info to data structures –More useful to programmer in tuning applications Collect using hardware –Software techniques, such as simulation, are slow –In the past, limited hardware support –Situation is changing, hardware support more common
3 University of Maryland Outline Measuring cache misses –Sampling Information about evictions –What is required –Sampling Simulation-based study –The simulator and applications used –Results Conclusions and future work
4 University of Maryland Finding Objects With Most Cache Misses Handling every cache miss is slow –Use sampling, requirements: Periodic interrupt on cache miss Ability to determine miss address Associate count with each object –Variable or dynamically allocated memory Interrupt after every n cache misses –Obtain address of miss –Find object containing it and increment count
5 University of Maryland Interactions Between Objects Why does data leave the cache? –What object caused it to be replaced? Hardware could provide eviction information –When miss occurs, save address of evicted data Not difficult to provide physical address –Can calculate from tag of evicted cache line –Information in OS can map physical to virtual May be imprecise due to paging
6 University of Maryland Measuring Eviction Information Use sampling, store more at each miss –Object that caused the miss –Object containing the data that was evicted –Part of code it happened in Questions –“Buckets” much smaller, will sampling be accurate? –Data structure more complicated, how efficient?
7 University of Maryland Experiments Implemented in simulation –Simulator uses ATOM binary rewriting tool Instrument load/stores for cache simulation Instrument basic blocks for virtual cycle count Simulates necessary hardware support –Miss and eviction sampling runs under simulation Tested using SPEC95/2000 applications –su2cor, applu, equake, gzip, mgrid, swim, wupwise, … –Sampled 1 in 25,000 misses
8 University of Maryland Accuracy of Sampling Cache Misses ApplicationVariable ActualSample Rank% % su2cor U R-loops S W2-intact W2-sweep swim UNEW PNEW VNEW CU H56.99
9 University of Maryland Eviction Results: mgrid
10 University of Maryland Evictions By Code Region: mgrid VariableFunctionLine ActualSample Rank% % U resid interp interp interp interp Vresid R psinv resid % of total evictions of U by U, V, and R in each line of code.
11 University of Maryland Cache Misses Due to Instrumentation
12 University of Maryland Instrumentation Overhead
13 University of Maryland Simulation Overhead
14 University of Maryland Using Dyninst Better knowledge about objects –Local variables –FORTRAN common blocks Can instrument memory allocation routines –Track objects created/destroyed Measure by code using hardware counters –Save counts at significant points, like Paradyn Function entries/exits/calls –Turn counting on & off around areas of interest
15 University of Maryland Instrumenting Loads and Stores New BPatch_point type –BPatch_loadStore –New method, isStore(), returns true or false New expression type – BPatch_effectiveAddr Only valid at BPatch_loadStore points Returns the effective address being accessed
16 University of Maryland Future Work Run miss sampling on real hardware –IBM POWER3, POWER4 –Use Dyninst Visualization tool –Save all data in compact format tool understands For tested applications, largest file is 15MB –Filter by objects, parts of code –Compare data from different runs Use results to optimize applications
17 University of Maryland Future Work Continued More uses of eviction information –For estimating portion of object in cache Use difference of misses and evictions –For finding lost opportunities for reuse Track evicted data to until next load Measure interval in time, cache misses, etc.
18 University of Maryland Conclusions Features are appearing in new processors –Possible to implement cache miss sampling now –Much more efficient than software simulation Eviction information in hardware practical –Sampling is efficient and accurate Could use Dyninst –For simulation or for hardware