Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ann Gordon-Ross and Frank Vahid*

Similar presentations


Presentation on theme: "Ann Gordon-Ross and Frank Vahid*"— Presentation transcript:

1 A First Look at the Interplay of Code Reordering and Configurable Caches
Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine Nikil Dutt Center for Embedded Computer Systems School for Information and Computer Science University of California, Irvine This work was supported by the U.S. National Science Foundation, and by the Semiconductor Research Corporation

2 Optimizations Optimization is an important part of the design of an application or system Area Performance Power and/or energy

3 Instruction Cache Optimizations
The instruction cache is a good candidate for optimizations Gordon-Ross ‘04 Instruction caches have predictable spatial and temporal locality. 90% of execution time is spent in 10% of the code ARM920T(Segars ‘01) Power hungry - 29% of power consumption

4 Instruction Cache Tuning - Code Reordering
Tune the instruction stream for increased cache utilization and thus increased performance Reorder the code so that infrequently executed regions of code do not pollute the instruction cache. int x; x = 5; Download Compile Link obj file App Code reordering is typically applied during link time however runtime methods do exist but incur undesirable runtime overhead. Execute

5 Instruction Cache Tuning - Code Reordering
while (input) while (input) Read input Read input no 100 Is the input valid? Is the input valid? Code Reordering yes yes 1 no Process input Error handling routine Process input Done Done Error handling routine

6 Instruction Cache Tuning - Configurable Cache Tuning
Tune the cache to the instruction stream for decreased energy and/or increased performance Cache tuning can be performed during application/platform design or even in system during runtime incurring no runtime overhead (Zhang - DATE’04) OR

7 Instruction Cache Tuning - Configurable Cache Tuning
Tunable parameters include: Cache Associativity Total Cache Size Cache Line Size L1 Cache L1 Cache L1 Cache { }

8 Motivation - Code Reordering + Cache Configuration
Cache configuration tunes the cache to the instruction stream How do these optimizations affect each other? Complement? Obviate? int x; x = 5; Degrade? Instruction Cache Code reordering tunes the instruction stream for the cache

9 Pettis and Hansen Code Reordering
Many current code reordering techniques are based heavily off of the Pettis and Hansen code reordering algorithm Reorder basic blocks using edge profiling to increase locality Orders basic blocks so that the most frequently executed path through the basic blocks is placed as straight-line code

10 Pettis and Hansen Bottom-up Positioning Algorithm
Control Flow Graph Process arc weights in decreasing order For each arc, merge basic blocks at the source and destination of each arc to form a chain If one of the blocks is already in the middle of a chain, form a new chain Reordered basic block chains Execution frequencies Basic Blocks

11 Configurable Cache Architecture
We used the configurable cache architecture proposed by Zhang - ISCA’03

12 Configurable Cache Architecture
The base cache consists of 4 2KByte banks that may individually be shutdown for size configuration Way concatenation allows for configurable associativity Way shutdown 8 KBytes 4 KBytes 8 KBytes 2-way

13 Configurable Cache Heuristic
L1 Cache …then tune cache line size… 16, 32, and 64 bytes …and finally tune cache associativity L1 Cache Direct-mapped, 2-way and 4-way L1 Cache First tune cache size… { } 2, 4, and 8 KBytes

14 Evaluation Framework Chosen cache configuration
Cache Exploration Heuristic No code reordering Powerstone MediaBench EEMBC Exhaustive search for comparison purposes Chosen cache configuration Instrument the executable to gather edge profiles Execute the application Code reordered executable PLTO* Pentium Link Time Optimizer Hit and miss ratios for each configuration Provide edge profiles to perform code reordering Execute the application to gather edge profiles Cache energy - Cacti Main memory energy - Samsung memory *Provided by the University of Arizona

15 Results - Energy Savings
Base cache = 2KB, d-m, 16 byte line size Base Cache Without Code Reordering Base Cache With Code Reordering Configured Cache Without Code Reordering Configured Cache With Code Reordering 1.5 1.5 Code reordering alone = 3.5% energy reduction Cache configuration alone = 15% energy reduction Cache configuration + code reordering = 17% energy reduction

16 Results - Performance Benefits
Base Cache Without Code Reordering Base Cache With Code Reordering Configured Cache Without Code Reordering Configured Cache With Code Reordering 1.5 1.6 Code reordering alone = 3.5% performance benefit Cache configuration alone = 17% performance benefit Cache configuration + code reordering = 18.5% performance benefit On average, code reordering gives little additional benefit over cache configuration alone. However a few benchmarks see added benefits.

17 Change in Cache Requirements Due to Code Reordering
x x x x * x * * x * * x x * *Powerstone **Mediabench ***EEMBC x - larger line size * - smaller cache size - reduction in cache area

18 Conclusions We explore the interplay of two instruction cache optimization techniques - code reordering and cache configuration Cache configuration largely obviates the need for code reordering with respect to energy and performance Cache configuration applied dynamically during runtime eliminates the need for designer applied code reordering Code reordering improved cache utilization in 52% of the benchmarks Reduced instruction cache size by an average of 13% and as high as 90% - beneficial for small custom synthesized embedded systems where area is critical

19 Future Work We plan to use a more advanced code reordering methodology that will take into account set assiociativity or multiple levels of cache We plan to study the iterative interplay of code reordering and cache configuration using a code reordering technique that takes the cache configuration into consideration


Download ppt "Ann Gordon-Ross and Frank Vahid*"

Similar presentations


Ads by Google