Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session
Introduction Structure definitions group fields by semantics, not access contemporaneity Data access profiling can be used to improve cache performance by reordering for contemporaneity In this context, contemporaneity is a measure of how close in time two data accesses to structure fields occur
Problem Statement Obtaining contemporaneity information for structure fields Exploiting this information to improve the ordering of the fields Doing this within the CASH/Pegasus environment
Approach Pegasus Implementation Data Access Profiling to track contemporaneous field accesses to build the Field Affinity Graphs Modify Simulator interface to SimpleScalar (3 rd party cache simulator) to achieve this Regrouping Algorithm Field Affinity Graphs built by the modified Simulator are then used to recommend reorderings based on a new regrouping algorithm
Project Design
Design Overview 1. Build stage: Tag structure field accesses in the Pegasus IR 2. Simulation stage: Propagate tag information through SimpleScalar to the new regroup library 3. Final stage: Invoke regrouping algorithm to calculate reordering recommendations
Build Stage, Tagging Accesses Objective: Identify and tag structure field accesses in the Pegasus IR Not trivial, since SUIF/C2DIL do not preserve required type information during transformation to IR Need to identify patterns that indicate structure field accesses
Field Accesses in Pegasus
Actual Pegasus Illustration int foo(struct my_t stestfoo) { int retval = stestfoo.f2; return(retval); } Which wire here should have struct type? int foo(struct my_t* stestfoo) { return(stestfoo->f2); } Which wire here has struct type?
Simulation Process Tag info on loads and stores is propagated through SimpleScalar to the regrouping library that builds the field affinity graph (done online, during simulation)
Regrouping Stage After simulation, analyze collected profiling data to produce reordering recommendation Can be done better than has been done in previous work (greedy) Cannot be done optimally (NP-hard) Field Affinity Graph (one per structure): Vertices: fields in a structure Edge weights: represent degree of contemporaneity of accesses between the fields
Matching Heuristic Find a maximum weight matching in the field affinity graph Fields that will not fit into a cache line together anyway are identified and ignored Structure is reordered by placing matched fields together
Greedy vs. Matching
NP-Hardness NP-Hardness is shown by reducing graph coloring problem to regrouping problem
Results Implemented successfully to handle structure field accesses done through pointers (ptr->fld) So far, only small programs have been tested Reordering is done manually and fed into simulator again to obtain the number of cycles for comparison
Results - Example Original: struct my_t { int f1; int f2; char nu[4096]; int f3; int f4; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; } 750 Cycles per Call 745 Cycles per Call (one less cache miss) Modified: struct my_t { int f1; int f4; int f2; char nu[4096]; int f3; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; }
Conclusion Performance improvements are achievable even on simple programs using reorganization recommendations Propagation of full type information in SUIF/c2dil from source would be required to optimize non-pointer accesses Less memory-exposed languages would allow for easy and quick implementation of the reordering recommendation
References Trishul M. Chilimbi, Bob Davidson, and James R. Larus, “Cache-Conscious Structure Definition,'' in Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 13-24, May Mathprog (Weighted Matching Algorithm) Pegasus: SUIF: SimpleScalar Tool set: