Presentation is loading. Please wait.

Presentation is loading. Please wait.

August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC.

Similar presentations


Presentation on theme: "August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC."— Presentation transcript:

1 August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC

2 ICDFN 2006 August Compilation Process

3 ICDFN 2006 August Our Optimization Process

4 ICDFN 2006 August CLOU is a Link-time Optimizer for UniCore Code Data Meta Code Data Meta Code Data Meta Code Data Code Data Code Data Translation to IR CFG construction & Optimizations CFG construction & Optimizations Exec Layout; Assembling Linking A Graph Modified From Diablo

5 ICDFN 2006 August Code Compaction based on CLOU Motivation of code compaction –Limited memory and energy resources for embedded systems –Code density affects both memory and energy consumption Goal: reducing code size without losing performance Code compaction in different levels 1. Typical optimizations for code size reduction at link-time 2. Hot/cold code splitting 3. New mixed code generation method 123

6 ICDFN 2006 August Typical Optimizations for Code Size Reduction Redundant code elimination –Computations whose results have been computed previously and are guaranteed to be available at that point Unreachable code elimination –Code fragments which there is no control flow path to from the entry node –Many of them are following useless comparisons Dead code elimination –Computations whose results are never used Peephole optimization Procedural abstraction -- might lead to performance loss

7 ICDFN 2006 August Experiments for Typical Optimizations for Code Size Reduction Benchmark: Mediabench Code size reduction –Average: 12.8% –Max: 22.3% Performance improvement –Average: 2.4% –Max: 4.2%

8 ICDFN 2006 August Less code transferred from remote to local, from disk to memory, or from memory to cache –Question: might be too conservative or lead to performance loss? Get hot/cold code splitted through basic block reordering Hot/Cold Code Splitting Condition 2 Hot CodeCold Code More Code Code 1 Hot Code More Code Cold Code Condition Code 3 Hot Code Cold Code More Code Condition Code

9 ICDFN 2006 August Hot/Cold Code Splitting PH: A popular greedy approach Structural Analysis Based Basic Block Reordering –Most part of a program can be decomposed into several typical structures –Cost Module for each structure –Minimal-cost layout  Optimal layout for each local structure based on profiling information

10 ICDFN 2006 August Basic Block Reordering Cost Model –Different kinds of control flow edges have different cost –For a specific order, –A list can be got for each structure f (structure, frequencies of all edges)  the best order of basic blocks for the local structure control flow edges

11 ICDFN 2006 August Experiments Complexity: O(N*log N) , N: number of basic blocks Experiment results (not using other link-time optimizations) Normalized cycle counts Normalized cache miss rate

12 ICDFN 2006 August Mixed Code Generation Dual-width Instruction Set –32-bit ISA: more powerful –16-bit ISA: more compact Less coding space for operations Less register field Less immediate field 32-bit: add r0, r0, 0xff800000 16-bit: str r2, [addr] mov r2, 0xff lsl r2, #1 addr2, #1 lslr2, 24 addr0, r2 ld r2, [addr]

13 ICDFN 2006 August Mixed Code Generation Related works in dual-width Instruction Set design and mixed code generation –Coarse-grained function-level mixed code generation By BX in arm and JALX in MIPS –Simple fine-grained instruction-level mixed code generation By BX in arm and JALX in MIPS By single specific mode-changing instruction –Specialized coding One-leading instruction word indicates one 32-bit instruction; Zero-leading instruction word indicates two 16-bit instruction. 16-bit ISA extensions Problem: Always lead to performance loss

14 ICDFN 2006 August Potential benefit Analysis of Programs in Mediabench 27851 different instructions in all programs: Log(27851)=15 Rank Unicore32 Instruction Average Percentage 1 mov23% 2 ldr16% 3 cmp 8% 4 add 8% 5 str 6% 6 b 5% Total66% 12

15 ICDFN 2006 August Two Main Kinds of Frequent Instructions Two-operand instructions mov rd, rm or short immediate cmp rn, rm or short immediate Branch/Jump –Distribution of immediate- offsets of branch instructions.

16 ICDFN 2006 August The Idea of Mode-Changing Instruction Set (MC) Extend the 32-bit ISA to add a small MC Instruction Set (using the reserved coding space) –Change the CPU mode –Perform its own normal operation Scan for suitable 32-bit instructions to be encoded into 16-bit instructions A mixed code fraction with MC instructions 32-bit instructions MC instructionUniCore16 instruction …… MC instructionUniCore16 instruction 32-bit instructions

17 ICDFN 2006 August Modification to Micro Architecture Mixed code execution in Unicore-I pipeline Improved mixed code execution in Unicore-I pipeline  No extra cycles  One more 16-bit instruction-fetch buffer  An MC-decoder

18 ICDFN 2006 August Mixed Code Generation program Mode -Changing Instructions Instruction Analyzer Link-Time Optimizer Mixed coded Program program Simulator

19 ICDFN 2006 August Experiment Results Normalized code size (results not using other link-time optimizations)

20 ICDFN 2006 August Conclusion Code compaction on Link-Time Optimization Platform –Compiler optimizations applied at link time Typical optimizations for code size reduction –Program layout optimization Hot/cold code splitting through basic block reordering –Machine code generation Mixed code generation Experiment Results –Average code size reduction: 32.9% –Average performance improvement: 9.1%

21 ICDFN 2006 August Thank you

22 ICDFN 2006 August

23 ICDFN 2006 August Instruction Analysis 3 regs, all in r0-r7 / r8-r15 / r16-r23/ r24-r31 2 regs, one in r0-r31, one in r0-r16 / r17-r31 1 reg and 1 imme, imme field: 4-6 bits 1 imme, imme field: 9 bits reg: short for register imme: short for immediate field Instruction format type classifications

24 ICDFN 2006 August EXPERIMENT RESULTS Normalized dynamic instruction numbers Normalized cycle counts


Download ppt "August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC."

Similar presentations


Ads by Google