Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Scan Register Allocation Massimiliano Poletto, Vivek Sarkar A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Sathyanarayanan.

Similar presentations


Presentation on theme: "Linear Scan Register Allocation Massimiliano Poletto, Vivek Sarkar A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Sathyanarayanan."— Presentation transcript:

1 Linear Scan Register Allocation Massimiliano Poletto, Vivek Sarkar A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Sathyanarayanan Thammanur, Santosh Pande

2 Linear Scan Register Allocation
NOT based on graph coloring faster than algorithms based on graph coloring scans all the live ranges in a single pass, allocating registers to variables in a greedy fashion. useful in situations where both compile time & code quality are important

3 Linear Scan Register Allocation
Program model – intermediate representation that consists of RTL-like quads or pseudo-instructions. Register candidates (live ranges) represented by an unbounded set of variable names or virtual registers. variables are not live on entry to the start node.

4 Linear Scan Register Allocation
Assumptions – intermediate representation pseudo-instructions are numbered according to some order. order in which pseudo-instructions appear in the intermediate representation. depth first order choice of instruction ordering does not affect correctness of the algorithm may affect the quality of allocation.

5 Linear Scan Register Allocation
Live Interval [i,j] : live interval for variable v if there is no instruction with number j´ > j such that v is live at j´, and there is no instruction with number i´ < i such that v is live at i´ conservative approximation of live ranges there may be sub ranges [i,j] in which v is not live trivial live range for any variable – [1,N]

6 Linear Scan Register Allocation
The Linear Scan Algorithm compute the live intervals. live intervals are stored in a list sorted in order of increasing start point. At each step, the algorithm maintains a list, active, of live intervals that overlap the current point and have been placed in registers. active list is sorted in order of increasing end point.

7 LinearScanRegisterAllocation
active {} foreach live interval i, in order of increasing start point ExpireOldIntervals(i) if length(active) =R then SpillAtInterval(i) else register[i] a register removed from pool of free registers add i to active, sorted by increasing end point foreach interval j in active, in order of increasing end point if endpoint [j] ≥ startpoint [i] then return remove j from active add register[j] to pool of free registers spill last interval in active if endpoint [spill] > endpoint [i] then register[i] register[spill] location[spill] new stack location remove spill from active location[i] new stack location

8 Linear Scan Register Allocation
An Example

9 Linear Scan Register Allocation
Complexity O(V), if R is constant But R can be large! worst case execution time complexity dictated by time taken to insert into active O(log R) for insertion, if balanced binary tree used O(R), if linear search for insertion point worst case complexity – O(V * R)

10 Linear Scan Register Allocation
Evaluation two different infrastructures – one to measure compile-time performance, and one to measure the run-time performance of the generated code. ICODE infrastructure to evaluate compile time performance SUIF infrastructure to evaluate run time performance

11 Linear Scan Register Allocation

12 Linear Scan Register Allocation

13 Linear Scan Register Allocation

14 usage density based register allocator
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems usage density based register allocator usage density : represents both frequency and density of uses. geared towards embedded systems wherein speed, code size & memory requirements are of equal concern. does not make use of live range and interval analysis.

15 Goal : optimize the following parameters -
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Goal : optimize the following parameters - speed of execution of generated code, speed of the allocator, size of the generated code, size of the allocator, and amount of memory required (memory footprint) during the allocation.

16 Graph-coloring based allocators –
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Graph-coloring based allocators – summarize liveness info in terms of interference graph. heuristically attempt to color the graph quality of code produced is very efficient cost (in terms of speed & space) increases as size of interference graph increases. prioritize the quality of generated code over speed of compilation & memory requirements.

17 Linear scan register allocation –
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Linear scan register allocation – tries to detect and resolve conflicts locally operates faster than graph coloring suffers from code quality a spilled variable cannot be reassigned to a register memory requirements lower than graph coloring based allocators still has quadratic memory requirements due to the need to maintain live intervals.

18 Tradeoffs in Register Allocation –
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Tradeoffs in Register Allocation – is it necessary to expend effort in finding the live ranges and forming live intervals in order to make good spill decisions? combine the effects of frequencies of references and their density/sparsity to emulate the notion of interfering live intervals => usage density. usage density information : linear in terms of program size, reducing memory demands during allocation.

19 allocate registers to variables that have a high usage density.
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Overview keep track of the usage density of variables at any given point of a program allocate registers to variables that have a high usage density. Usage density of a variable x at any point p is the ratio of the total number of uses of a value since its last definition to the average distance between the uses. keep the variables with highest usage densities in registers until that point

20 Usage density based register allocation –
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Usage density based register allocation – traverse the CFG in topological order. usage information about each variable at different program points is maintained in a table called the usage density table for each variable, last use statement, total number of uses since the last definition, average distance between the uses, basic block where used last, and usage density are maintained.

21 calculating the usage information –
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems calculating the usage information – traverse the CFG in topological order for each definition of a variable, reset to zero- total number of uses average distance last use basic block set last use to current instruction label initialize usage density to zero

22 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems
for each use of a variable, update the following – last use: this is updated to the statement number corresponding to the statement where the use of the variable occurred. total distance: This is the distance between the last use and the corresponding definition(s). If there is only one definition, the distance is just the total number of instructions that elapse between the definition and the use. For multiple ones, the average distance is calculated from the definition points to the join point where the corresponding SSA merged definition is located. From this point, simple distance is calculated to each of the uses and is added to the average distance found earlier to get the total distance. total number of uses: This is incremented by 1. average distance: This is updated to ratio of total distance to total number of uses. usage density: This is updated to the ratio of total number of uses to average distance. active window: An active window of a variable is a program point until which its usage density would remain equal to or higher than its current value if a use of that variable were to occur within that window.

23 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems
The Algorithm at a program point p if _use(p) = x if(r = _free_register()) != Φ _allot(r,x) else for all y Є V, s.t y is in a register if p !Є _active_window(y) _update_usg_dens(y) min = _min_usg_dens(V) if _usg_dens(x) > _usg_dens(min) _allot(_reg(min),x) endif if _def(p) = x _reset(x)

24 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems
An Example

25 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

26 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems
LEMMA 1. The active window d(p) of a variable at program point p of its use must obey the relation d(p) ≤ (2ad(p) + 1/ud(p)), where ad(p) is the average distance and ud(p) is the usage density at program point p. COROLLARY 1. For simplicity of calculation, it is safe to use an active window size equal to twice the average distance at program point p. Implications of corollary 1 – calculate usage densities of variables only at points of its uses at other points, calculate usage densities only on demand recalculate usage densities of only those variables for which current program point is outside their active windows. spill the variables with minimum usage densities.

27 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems
Evaluation and Comparison Performance evaluation was done with respect to the following parameters: the compile time needed by the allocator, the execution time of the generated code, size of the generated code, including the number of loads/stores generated size of the allocator itself, and the amount of dynamic memory required during the allocation for different benchmark suites. All experiments were carried on an unloaded Sun Ultra 5 Workstation. Times measured are the sums of system and user times returned by the UNIX getrusage system call.

28 Speed and Code Quality –
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Speed and Code Quality –

29 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

30 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

31 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

32 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

33 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

34 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

35 binary size of the allocators.
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Memory Requirements – We evaluate the space efficiency of each of the allocators by comparing each of the following: static instructions generated by each of the allocators for the various benchmarks, dynamic memory required for the operation of each of the allocators, and binary size of the allocators.

36 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

37 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

38 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

39 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

40 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

41 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

42 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

43 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

44 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems

45 keeps the compilation time close to that of linear scan
A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Conclusion - usage density based allocation is a simple, fast technique for embedded systems. keeps the compilation time close to that of linear scan The usage density of a variable is an indicator of the frequency as well as the distribution of the uses of the variable at a program point and allows performing effective register allocation without the use of traditional live range or live interval information.

46 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems
The memory requirements in terms of code size generated, size of the allocator, and amount of dynamic memory utilized for its operation is less than that needed for other allocators. The amount of information used by the usage density algorithm is linearly proportional to program size The algorithm allows lazy computation of usage densities using the property of an active window,

47 A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems
Thank You


Download ppt "Linear Scan Register Allocation Massimiliano Poletto, Vivek Sarkar A Fast, Memory-Efficient Register Allocation Framework for Embedded Systems Sathyanarayanan."

Similar presentations


Ads by Google