Download presentation
Presentation is loading. Please wait.
Published byBuck Shields Modified over 9 years ago
1
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
2
Open64 Workshop 20082 Outline Motivation Types of structure layout optimizations Criteria for structure layout optimizations Implementation details Performance results Future work Conclusion
3
Open64 Workshop 20083 Motivation Poor data locality in many applications High data cache miss rates Growing gap between processor and memory speeds Our Approach Change layout of data structures Requires whole-program optimization Use Inter-Procedural Analysis and Optimizations (IPA) Our Aim Make applications more cache-friendly
4
Open64 Workshop 20084 IPA Summarization Analysis Optimization
5
Open64 Workshop 20085 Types of Structure Layout Optimizations Structure splitting Structure peeling struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; };
6
Open64 Workshop 20086 Structure Splitting Example struct new_struct_A { double d1; int i; long long l; struct new_struct_A * next; struct cold_sub_struct_A * p; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct cold_sub_struct_A { double d2; float f; char c; };
7
Open64 Workshop 20087 Structure Peeling Example struct new_struct_A { double d1; int i; long long l; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; struct cold_sub_struct_A { double d2; float f; char c; };
8
Open64 Workshop 20088 Criteria for structure layout optimizations Legality Analysis Type cast Address of a field is taken Escaped types Parameter types Full visibility to IPA Alignment restrictions Profitability Analysis Hotness Affinity Field accesses at loop level Size
9
Open64 Workshop 20089 Implementation Details Step 1: Type information summarization (IPL) Step 2: Symbol table merging (IPA) Step 3: Legality and profitability analysis (IPA analysis) Step 4: Transforming the program (IPA optimization)
10
Open64 Workshop 200810 Implementation Details: Type information summarization Information summarization in IPL Framework for computing static profiles using heuristics New TY flag TY_NO_SPLIT SUMMARY_TY_INFO SUMMARY_LOOP For each DO_LOOP, WHILE_DO, DO_WHILE Bit-vector to track field accesses of up to N structure for each loop Considers field accesses immediately inside loop These fields are considered affine to each other Execution count of statements immediately inside loop From statically estimated profiles or from runtime feedback
11
Open64 Workshop 200811 Implementation Details: IPA Analysis Inter-procedurally update statically estimated execution count of PUs Update statically estimated loop frequencies in SUMMARY_LOOP Consider SUMMARY_LOOP from the hottest P PUs Determine candidates for structure-layout transformation Determine new layout of structures
12
Open64 Workshop 200812 Implementation Details: IPA Analysis Example F4F4 F3F3 F2F2 F1F1 BV L1L1 22 0101 L2L2 140010 L3L3 12 0101 L4L4 881100 L5L5 660101 F4F4 F3F3 F2F2 F1F1 AG 1 40 AG 2 14 AG 3 88 L i — Loops F j — Fields in a struct AG k — Affinity groups
13
Open64 Workshop 200813 Implementation Details: Transforming the program struct S struct T { // N fields // AG1 fields struct T * p; // AG2 fields // M fields }; }; // peel T struct S { // N fields struct T1 * p1; struct T2 * p2; // M fields }; New type definitions Field table update Field access statements New symbols Assignment statements Example: struct T1 struct T2 { // AG1 fields // AG2 fields };
14
Open64 Workshop 200814 Implementation Details: Transforming the program (continued) Function calls to memory management routines Example: p = (T *) malloc (N * sizeof (T)) if (p == NULL) exit (1); Detect memory management routine calls involving transformed type T Replicate call, assignment statements Update size of memory being allocated Handle comparisons involving pointer p
15
Open64 Workshop 200815 Performance Results Compilations options: -Ofast at 32-bit ABI Speedup due to structure layout optimizations Benchmarks AMD Opteron™ (2.8GHz, 4GB, 1MB) AMD Barcelona(2. 0GHz, 8GB, 512KB) Intel® EM64T(3.4G Hz, 4GB, 1MB) Intel® Core™(3.0 GHz, 4GB, 4MB) SiCortex MIPS®(500MHz, 4GB, 256KB) Geometric Mean 179.art134%66%56%47%41%62.5% 181.mcf24%23% 31%13%22.0% 462.libquantum32%17%40%72%62%39.6% Geometric Mean46.9%29.6%37.2%47.2%32.1% 37.9%
16
Open64 Workshop 200816 Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Speedup due to structure layout optimizations Benchmarks AMD Opteron™ (2.8GHz, 4GB, 1MB) AMD Barcelona(2. 0GHz, 8GB, 512KB) Intel® EM64T(3.4G Hz, 4GB, 1MB) Intel® Core™(3.0 GHz, 4GB, 4MB) SiCortex MIPS®(500MHz, 4GB, 256KB) Geometric Mean 179.art169%66%53%60%45%69.3% 181.mcf25%35%12%30%7%18.6% 462.libquantum82%51%75%70%69%68.6% Geometric Mean70.2%49.0%36.3%50.1%27.9% 44.6%
17
Open64 Workshop 200817 Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Multiple copies of 462.libquantum running on multi-core chip Platform: Quad-core AMD Barcelona (2.0 GHz, 8GB, 512KB, 2MB) 3 rd level cache shared among 4 cores Speedup from structure layout optimizations Benchmark1 copy2 copies4 copies 462.libquantum51%69%123%
18
Open64 Workshop 200818 Future Work Tune static profile estimation Less restrictions Integrate with field-reordering
19
Open64 Workshop 200819 Conclusion A framework for performing structure layout transformations is now available in the Open64 compiler. The superior infrastructure in the Open64 compiler helped us implement the optimizations cleanly and with relatively less effort. Substantial speedups are possible on some of the CPU2000 and CPU2006 SPEC benchmarks. Structure layout optimization is a required feature for a compiler to remain competitive.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.