Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 WCET-aware Register Allocation based on Integer-Linear Programming Heiko Falk, Norman.

Similar presentations


Presentation on theme: "Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 WCET-aware Register Allocation based on Integer-Linear Programming Heiko Falk, Norman."— Presentation transcript:

1 Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 WCET-aware Register Allocation based on Integer-Linear Programming Heiko Falk, Norman Schmitz, Florian Schmoll TU Dortmund Computer Science 12 Design Automation for Embedded Systems

2 © H. Falk | 2011-07-06 ECRTS 2011 Slide 2 / 18 Outline  Introduction  State of the Art in Compiler Design  Register Allocation  Traditional ILP-based Register Allocation  ILP Model  Limitations  WCET-aware Register Allocation using ILP  Model of the WCET  Model of Pipeline-Related Spill Costs  Results  Summary & Future Work

3 © H. Falk | 2011-07-06 ECRTS 2011 Slide 3 / 18 Current State of the Art in Compiler Design Objective Function of Compiler Optimizations  Usually reduction of Average-Case Execution Times (ACET): Accelerate a “typical” execution of a program using “typical” input data  No statements about WCETs possible Optimization Strategy  Naive: Current compilers lack precise ACET timing model  Application of an optimization if “promising”  Effect of optimizations on a program’s ACET fully unknown to the compiler itself.  ACET-optimizations not useful for WCET minimization

4 © H. Falk | 2011-07-06 ECRTS 2011 Slide 4 / 18 Register Allocation Goals  Considered the most important compiler optimization  Registers are fastest and most efficient memories  Register Allocation should make optimal use of registers Tasks  Assembly code before register allocation: virtual registers (VREGs)  Map all (potentially many) VREGs to (usually few) physical registers (PHREGs) of a processor  Insert memory loads and stores (spill code) whenever VREGs don’t fit into the register file

5 © H. Falk | 2011-07-06 ECRTS 2011 Slide 5 / 18 Well-Known Register Allocators Graph Coloring  De-facto standard approach nowadays  Heuristics decide about allocation and spill code generation  Fast approach of moderate complexity  Spill heuristic might lead to poor code quality Register Allocation via Integer- Linear Programming (ILP)  Formal mathematical model of allocation and spilling  Achieves minimal spill code overhead, i.e. minimizes total number of spill instructions  Relatively high complexity, but optimal quality [P. Briggs, Register Allocation via Graph Coloring, 1992] [D. W. Goodwin, K. D. Wilken, Optimal and Near-optimal Global Register Allocation Using 0-1 Integer Programming, 1996]

6 © H. Falk | 2011-07-06 ECRTS 2011 Slide 6 / 18 Traditional ILP-based Register Allocation Spilling decisions Constraints Guarantee correctness of allocation and spilling decisions, e.g.  ensure that each VREG is assigned to at least one PHREG,  that at most one VREG can be assigned to a single PHREG, ... Allocation decisions Variables, and map VREGs to PHREGs

7 © H. Falk | 2011-07-06 ECRTS 2011 Slide 7 / 18 Traditional ILP-based Register Allocation Objective Function  Minimizes spill code-related overhead  Under the assumption:  Each spill instruction contributes by same constant amount to objective function  Example: minimization of spill-related code size

8 © H. Falk | 2011-07-06 ECRTS 2011 Slide 8 / 18 WCET Minimization via ILP-based Allocation? Limitation of the traditional approach  Assumption:  Each spill instruction contributes by same constant amount to objective function  Assumption only holds for trivial objectives like e.g. code size Challenges  How to model and minimize Worst-Case Execution Time (WCET) as non-trivial objective?  How to deal with complex processor pipelines executing spill instructions in parallel with other code?

9 © H. Falk | 2011-07-06 ECRTS 2011 Slide 9 / 18 Challenge 1: ILP Model of the WCET The Worst-Case Execution Path (WCEP)  WCET of a program = Length of the program’s longest execution path (WCEP)  WCET Minimization: Optimization of only those parts of a program lying on the WCEP  Code optimization apart the WCEP will not reduce WCET  Only those spill-related decision variables must contribute to the ILP’s objective function that actually lie on the WCEP.  But: Spilling decisions affect WCET of basic blocks and thus the WCEP within a program.  How to model the WCEP via ILP depending on spill-related decision variables?

10 © H. Falk | 2011-07-06 ECRTS 2011 Slide 10 / 18  Costs of basic block :  models WCET of depending on the WCET of potentially inserted spill code  WCET without any spill code, plus WCET of all spill code inside Spill Code-dependent Costs

11 © H. Falk | 2011-07-06 ECRTS 2011 Slide 11 / 18 Intraprocedural Control Flow  Modeling of a function’s control flow: A CB D E Acyclic sub-graphs:(Reducible) Loops: B A C D E  Treat body of inner- most loop like acyclic sub-graph  Fold loop  Costs of :  Continue with next innermost loop = WCET of longest path starting at A Loop L B, C, D

12 © H. Falk | 2011-07-06 ECRTS 2011 Slide 12 / 18 Objective Function  WCET of entire function:  Each function has dedicated entry block  Variable models WCET of longest path within starting at  Variable models WCET of entire function

13 © H. Falk | 2011-07-06 ECRTS 2011 Slide 13 / 18 Challenge 2: Pipeline-Related Spill Costs Example: The Infineon TriCore Pipelines  Integer I-Pipeline: Executes usual integer ALU instructions  Load/Store LS-Pipeline: Executes memory loads/stores and address arithmetic  Ideal case: One I- and one LS-instruction executed in parallel within same clock cycle  However... (Some even more subtle cases of the TriCore pipelines omitted here…) add d0,d1,d2;# d0 = d1 + d2 ld d0,[a0];# d0 = mem[a0] I-instruction LS-instruction WAW hazard (write after write)Stalled by 1 cycle

14 © H. Falk | 2011-07-06 ECRTS 2011 Slide 14 / 18 ILP Example for Costs of Spill Instruction s Case 1  If is LS-instruction: . costs 1 cycle if is actually generated: Case 2  If is spill-load and is I-instruction: . costs 1 cycle if is actually generated and WAW hazard between and exists via PHREG : st [a1],d1; # i: mem[a1] = d1 ld d0,[a0]; # s: d0 = mem[a0] add d0,d1,d2; # i: d0 = d1 + d2 ld d0,[a0]; # s: d0 = mem[a0]

15 © H. Falk | 2011-07-06 ECRTS 2011 Slide 15 / 18 Results – Worst-Case Execution Times  Target Processor: TriCore TC1796  100%: WCET EST using Graph Coloring  Compiler: WCC at optimization level -O3 (42 optimizations) [H. Falk, WCET-aware Register Allocation based on Graph Coloring, DAC 2009] 98% 19% 80% x2

16 © H. Falk | 2011-07-06 ECRTS 2011 Slide 16 / 18 Results – Average-Case Execution Times  Target Processor: TriCore TC1796  100%: ACET using Graph Coloring  Compiler: WCC at optimization level -O3 (42 optimizations)

17 © H. Falk | 2011-07-06 ECRTS 2011 Slide 17 / 18 Results – CPU Runtimes ILP-based Allocator  Runtimes range from 1 CPU second to 54:08 CPU minutes  Including WCET analysis and ILP solver  Average runtime for 55 benchmarks: 3:33 CPU minutes WCET-aware Graph Coloring  Average runtime for 55 benchmarks: 4:13 CPU minutes  Reason: Performs a costly WCET analysis after register allocation for each individual basic block

18 © H. Falk | 2011-07-06 ECRTS 2011 Slide 18 / 18 Summary & Future Work Summary  Current state of the art: Compilers are unaware of timing, naive optimization strategies  Standard register allocators unaware of worst-case properties  May thus lead to spill code generation along WCEP  WCET-aware ILP-based register allocation: Sophisticated models of WCET and pipeline-related spill costs  Average WCET reductions over 55 benchmarks: 20.2%  Outperforms WCET-aware graph coloring by factor 2 Future Work  Reduce runtimes of ILP-based register allocator  Improve code quality further by integrating rematerialization

19 © H. Falk | 2011-07-06 ECRTS 2011 Slide 19 / 18 Bookmarks 3.Compiler DesignCompiler Design 4.Register AllocationRegister Allocation 5.Well-Known AllocatorsWell-Known Allocators 6.Traditional ILP-RA (vars & constrs)Traditional ILP-RA (vars & constrs) 7.Traditional ILP-RA (obj)Traditional ILP-RA (obj) 8.Motivation / LimitationsMotivation / Limitations 9.The WCEPThe WCEP 10.ILP: Block CostsILP: Block Costs 11.ILP: Control FlowILP: Control Flow 12.ILP: ObjectiveILP: Objective 13.TriCore PipelineTriCore Pipeline 14.ILP: Spill CostsILP: Spill Costs 15.Results WCETResults WCET 16.Results ACETResults ACET 17.Results CPU RuntimesResults CPU Runtimes 20.Traditional ILP-RA (vars)Traditional ILP-RA (vars) 21.Workflow / WCCWorkflow / WCC 22.Unstable WCEPsUnstable WCEPs 25.TriCore Register FileTriCore Register File 27.Worst-Case Execution TimesWorst-Case Execution Times 28.Graph ColoringGraph Coloring 29.Problem of Graph ColoringProblem of Graph Coloring 30.Chicken-Egg ProblemChicken-Egg Problem 31.WCET-aware Graph ColoringWCET-aware Graph Coloring

20 © H. Falk | 2011-07-06 ECRTS 2011 Slide 20 / 18 Traditional ILP-based Register Allocation Instructions defining VREG add v, w, x; # v = w + x Instructions using VREG mul y, v, z; # y = v * z

21 © H. Falk | 2011-07-06 ECRTS 2011 Slide 21 / 18 Support of the ILP by WCC Infrastructure WCET EST of BB b k [ http://ls12-www.cs.tu-dortmund.de/research/activities/wcc] Max. Iteration counts of loop L:

22 © H. Falk | 2011-07-06 ECRTS 2011 Slide 22 / 18 Instability of the WCEP main a b c d 10 Cyc. 50 Cyc. 80 Cyc. 65 Cyc. 120 Cyc.

23 © H. Falk | 2011-07-06 ECRTS 2011 Slide 23 / 18 Instability of the WCEP main a b c d 10 Cyc. 50 Cyc. 80 Cyc. 65 Cyc. 120 Cyc.  Initial WCEP: basic blocks main, a, b, c  In the following: better spilling decisions inside b

24 © H. Falk | 2011-07-06 ECRTS 2011 Slide 24 / 18 Instability of the WCEP main a b c d 10 Cyc. 50 Cyc. 80 Cyc. 65 Cyc. 120 Cyc.  Novel WCEP: basic blocks main, d, c 40

25 © H. Falk | 2011-07-06 ECRTS 2011 Slide 25 / 18 Example Infineon TriCore 1.3:  Separate address- & data registers Heterogeneous Register Files A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0 Address Registers Data Registers D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0

26 © H. Falk | 2011-07-06 ECRTS 2011 Slide 26 / 18 Example Infineon TriCore 1.3:  Separate address- & data registers  Special-purpose registers  64-bit Data registers (extended Regs)  Upper & lower context (UC & LC): UC automatically saved upon function calls, LC not Heterogeneous Register Files A15 (Implicit AREG) A14 A13 A12 A11 (Return Addr) A10 (Stack Ptr) A9 (Global AREG) A8 (Global AREG) A7 A6 A5 A4 A3 A2 A1 (Global AREG) A0 (Global AREG) Address Registers Data Registers D15 (Implicit DREG) D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0 E14 E12 UC E8 E6 E4 E2 E0 LC E10

27 © H. Falk | 2011-07-06 ECRTS 2011 Slide 27 / 18 Worst-Case Execution Times (WCET)  WCET in general not computable  Upper timing bounds can be statically estimated (  WCET EST ) Real-time constraint [Borrowed from Reinhard Wilhelm]

28 © H. Falk | 2011-07-06 ECRTS 2011 Slide 28 / 18 Workflow of Graph Coloring RA 1.Initialization: Build Interference Graph G = (V, E) with V = {virtual registers}  {K physical processor registers}, e = (v, w)  E  VREGs v and w may never share the same PHREG, i. e. v and w interfere 2.Simplification: Remove all nodes v  V with degree < K 3.Spilling: After step 2, each node of G has degree  K. Select one v  V; mark v as potential spill; remove v from G 4.Repeat steps 2 and 3 until G =  5.Coloring: Successively re-insert nodes v into G in reverse order; if there is a free color k v, color v; else, mark v as actual spill [A. W. Appel, Modern compiler implementation in C, 1998]

29 © H. Falk | 2011-07-06 ECRTS 2011 Slide 29 / 18 Problem of Standard Graph Coloring 3.Spilling: After step 2, each node of G has degree  K. Select one v ∊ V; mark v as potential spill; remove v from G Which node v should be selected as potential spill? Common graph coloring implementations select …  … the first node v according to the order in which VREGs were generated during code selection, ... the node with highest degree in the interference graph, ... a node with high degree, with many DEFs/USEs, in some inner loop – maybe depending on profiling data.  Uncontrolled spill code generation – potentially along Worst-Case Execution Path (WCEP) defining the WCET!

30 © H. Falk | 2011-07-06 ECRTS 2011 Slide 30 / 18 A Chicken-Egg Problem  A WCET-aware Register Allocator…  …relies on WCET data provided by WCET analysis using aiT  …but can’t obtain WCET data since code containing virtual registers is not statically analyzable!  The Way out:  Start by marking all VREGs as actual spill (each VREG is spilled onto stack  code is fully analyzable)  Perform WCET analysis, get WCEP  Allocate VREGs of that basic block b with most worst-case spill code executions to PHREGs using standard GC  Re-evaluate novel WCEP

31 © H. Falk | 2011-07-06 ECRTS 2011 Slide 31 / 18 WCET-Aware Graph Coloring (1) LLIR WCET_GC_RA( LLIR P ) { // Iterate until current WCEP is fully allocated. while ( true ) { // Clone P, spill all VREGs of P’ onto stack. LLIR P’ = P.copy(); P’.spillAllVREGs(); // Compute Worst-Case Execution Path for fully spilled LLIR. set WCEP = computeWCEP( P’ ); // If there are no more VREGs, the allocation loop is over. if ( getVREGs( WCEP ) ==  ) break;

32 © H. Falk | 2011-07-06 ECRTS 2011 Slide 32 / 18 WCET-Aware Graph Coloring (2) // Determine that block on the WCEP with highest product of // Worst-Case Execution Count * spilling instructions. basic_block b’ = getMaxSpillCodeBlock( WCEP ); basic_block b = getBlockOfOriginalP( b’ ); // Collect all VREGs of this most critical block. list vregs = getVREGs( b ); // Sort VREGs by #occurrences, apply standard graph coloring. vregs.sort( occurrences of VREG in b ); traditionalGraphColoring( P, vregs ); } // Allocate all remaining VREGs not lying on the WCEP. traditionalGraphColoring( P, getVREGs( P ) ); return P; }

33 © H. Falk | 2011-07-06 ECRTS 2011 Slide 33 / 18 Results – Worst-Case Execution Times 100% = WCET EST using Standard Graph Coloring (highest degree) 93% 24% 69%

34 © H. Falk | 2011-07-06 ECRTS 2011 Slide 34 / 18 Results – Average-Case Execution Times 100% = ACET using Standard Graph Coloring (highest degree) -6% – -12%


Download ppt "Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 WCET-aware Register Allocation based on Integer-Linear Programming Heiko Falk, Norman."

Similar presentations


Ads by Google