Download presentation
Presentation is loading. Please wait.
Published byWilfrid Richardson Modified over 8 years ago
1
1 Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache Rajiv Ravindran, Pracheeti Nagarkar, Ganesh Dasika, Robert Senger, Eric Marsman, Scott Mahlke, Richard Brown Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor University of Michigan Electrical Engineering and Computer Science
2
2 Introduction Instruction fetch power dominant in low-power embedded processors ~ 27% for the StrongARM ~ 50% for the Motorola MCORE Two alternatives + No hardware overhead + Part of the physical address space - Managed in software Scratch-pad + Hardware managed + Transparent to the user - Power hungry tag-checking and comparison logic Instruction-cache
3
3 Focus Of This Work Explore the use of scratch-pad for reducing instruction fetch power Two possible software managed schemes Static –Map ‘hot’ regions prior to execution –Contents do not change during execution Dynamic –Allow contents to change during execution –Explicit copying of ‘hot’ regions
4
4 Scratch-pad Management: Static Approach BB1 BB2 BB3 BB4 BB6 BB7` BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1
5
5 Scratch-pad Management: Static Approach T1 BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 sizefreqprofit T1641006400 T232100032000 T332501600 profit = size * freq
6
6 Scratch-pad Management: Static Approach T1 BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 96 bytes T2 32 bytes T1 64 bytes Equivalent to bin-packing sizefreqprofit T1641006400 T232100032000 T332501600 profit = size * freq
7
7 Scratch-pad Management: Dynamic Approach copy T1 Scratch-pad space Scratch-pad size (96 bytes) T1 time 64b 32b BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1
8
8 Scratch-pad Management: Dynamic Approach copy T1 copy T2 Scratch-pad space Scratch-pad size (96 bytes) T1 time 64b 32b BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1
9
9 Scratch-pad Management: Dynamic Approach copy T1 copy T2 copy T3 over T2 Scratch-pad space Scratch-pad size (96 bytes) T1 time T3 64b 32b BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1
10
10 Scratch-pad Management: Dynamic Approach copy T1 copy T2 copy T3 over T2 Scratch-pad space Scratch-pad size (96 bytes) T1 time T3 64b 32b copy T2 over T3 BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1
11
11 Scratch-pad Management: Dynamic Approach Copy2 for T2 Copy1 for T1 Copy3 for T2 Copy4 for T3 copy T1 copy T2 copy T3 over T2 Scratch-pad space Scratch-pad size (96 bytes) T1 time copy1copy4copy3copy4 T3 copy2 T3 64b 32b copy T2 over T3 copy T3 over T2 BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1
12
12 Objectives Of This Work Develop a dynamic compiler managed scheme to exploit scratch-pad Prior work [Verma et al,’04] –ILP based solution –Not scalable –Limits scope of analysis to single procedure, loop-nests Practical solution –Scalable –Handle arbitrary control flow graphs –Inter-procedural analysis
13
13 Our Approach Two phases –Trace selection & scratch-pad (SP) allocation Identify frequently executed traces Select the most energy beneficial traces Place them with possible overlap to reduce copy overhead –Copy placement Insert copies to realize the placement Hoist within the control flow graph to minimize overhead Fix branch targets into selected traces
14
14 SP Allocation: Computing Energy Gain Benefit = ProfileWeight * Size * FetchEnergy CopyCost = Size * ( FetchEnergy + WriteEnergy) Energy Gain = Benefit - CopyCost BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1 Benefit: Energy savings when the trace is executed from scratch-pad instead of memory CopyCost: Overhead associated with copying the trace once
15
15 SP Allocation: Placing Traces T1 T1 T2 T2 T2 T1 T1 T2 T2 T2 T3 T1 T1 T2 T2 T2 T3 initial copy of T1 initial copy of T2 recopy of T1 recopy of T2 Dynamic Copy Cost: # copies of T1 * CopyCost (T1) + # copies of T2 * CopyCost(T2) BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1
16
16 Temporal Relationship Graph [Gloy et al,’97] T3 T1T2 Edge Weights between two nodes denote the Dynamic Copy Cost T1 T1 T2 T2 T2 T1 T1 T2 T2 T2 T3 T1 T1 T2 T2 T2 T3 copy of T2 copy of T1 copy of T2 2 * CopyCost (T1) + 2 * CopyCost(T2)
17
17 SP Allocation: Placing Traces T2 96-bytes Energy Gain: T1 3104nJ Energy Gain: T2 15952nJ Energy Gain: T3 752nJ T2
18
18 SP Allocation: Placing Traces T2 96-bytes Energy Gain: T1 3104nJ Energy Gain: T2 15952nJ Energy Gain: T3 752nJ T2 T1
19
19 SP Allocation: Placing Traces T2 96-bytes T2 T1 T3 T1T2 432nJ 96nJ144nJ
20
20 SP Allocation: Placing Traces T2, T3 96-bytes T2, T3 T1 T3 T1T2 432nJ 96nJ144nJ
21
21 Copy Placement Initially, naively place copies at trace entry points –Guarantees correct but inefficient execution Reduce the copy overhead –Identify frequently executed copies –Iteratively hoist copies to less frequently executed blocks –Remove redundant copies –Ensure that the hoists and removal are legal –Traces are present prior to execution
22
22 Copy Placement: Initial Placement BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1 C1-T1 C3-T2 C1-T2 C3-T1 C2-T1 C2-T3 C1-T3
23
23 Copy Placement: Redundant Copies BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1 C1-T1 C3-T2 C1-T2 C3-T1 C2-T1 C2-T3 C1-T3 T2, T3 T1
24
24 Copy Placement: Hoisting Live-Range T1 BB4, BB6, BB7 T2 BB9, BB10 T3 BB12 BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1 C1-T1 C1-T2 C1-T3
25
25 Copy Placement: Hoisting T2, T3 T1 BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1 C1-T2 C1-T3 Live-Range of T2 before hoist
26
26 Copy Placement: Hoisting T2, T3 T1 BB1 BB2 BB3 BB4 BB6 BB7 BB5 BB14 BB8 BB9 BB10 BB11 BB12 BB13 T2 T3 T1 C1-T2 C1-T3 Live-Range of T2 after hoist legal
27
27 Experimental Setup Trimaran compiler framework Measured instruction fetch power Varied scratch-pad size from 32-bytes to 4-Kbytes Two configurations WIMS microcontroller at the Univ. of Michigan –On-chip memory and scratch-pad –Static vs dynamic schemes –PowerMill Conventional processor –Off-chip memory, on-chip scratch-pad vs on-chip I-cache –CACTI model –Scratch-pad vs I-cache DMA copying –2 bytes per cycle, stalling
28
28 Energy Savings: Static vs Dynamic Average savings for Dynamic: 28% Average savings for Static: 17% WIMS Energy Savings, 64-Byte scratch-pad 0 10 20 30 40 50 60 fir rawcaudio rawdaudio g721encodeg721decode mpeg2encmpeg2decpegwitencpegwitdec pgpencodepgpdecode gsmencodegsmdecode epic unepic cjpeg djpeg sha blowfish average % Energy Improvement dynamic static
29
29 0 20 40 60 80 100 3264128256512102420484096 SP Size (Bytes ) % Hit Rate Static Hit Rate Dynamic Hit Rate 0 5 10 15 20 25 30 35 SP Size (bytes ) % Energy Savings Static Energy Dynamic Energy Effect of Varying Scratch-pad Size pegwitenc 3264128256512102420484096
30
30 Scratch-pad Size For 95% Hit Rate 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 cjpeg djpeg epic unepic g721encode g721decode gsmencode gsmdecode mpeg2enc mpeg2dec pegwitenc pegwitdec pgpencode pgpdecode rawcaudio rawdaudio blowfish fir sha average Size (bytes) static dynamic Dynamic is 2.5x better than static
31
31 Energy Savings: SP vs I-Cache Cacti energy savings, 64b scratch-pad/I-cache -60 -40 -20 0 20 40 60 80 100 120 fir rawcaudio rawdaudio g721encodeg721decode mpeg2encmpeg2decpegwitencpegwitdec pgpencodepgpdecode gsmencodegsmdecode epic unepic cjpeg djpeg sha blowfish average % Energy Improvement dynamic static I-cache Average savings for Dynamic: 48% Average savings for Static: 25% Average savings for I-cache: 30%
32
32 Conclusions Compiler directed dynamic placement in scratch-pad –Arbitrary control flow graph –Inter-procedural –Two phases SP allocation & copy placement 28% savings for dynamic as compared to 16% for static for a 64-byte scratch-pad 41% savings for dynamic as compared to 31% for static for 256-byte scratch-pad 2 to 10% stall cycles Within 0 to 11 % of optimal, but scalable
33
33 For more information http://cccp.eecs.umich.edu Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.