Graph based Code Selection Techniques for Embedded Processors Part II

Graph based Code Selection Techniques for Embedded Processors Part II
exact code selection for DFGs using constraint logic programming problem: handling of common sub-expressions (CSEs) in traditional tree based code selection D P R X Y The second part of the talk is concerned with exact code selection for DFGs using CLP. Here I focus on embedded processors with highly irregular data paths, by means of small distributed RFs and FUs. The problem of tree based code selection is the handling of CSEs. I will give a short outline of CP concepts then introduce the CS approach and experimental results.

Problems of traditional Tree based Code Selection
splitting of DFG into trees required: unique locations for roots generally memory selected a b * c + t X:=M[a] Y:=M[b] Y:=X*Y R:=X+Y R:=R+Y memory X:=M[c] M[t]:=Y Y:=M[t] transfer routes between definitions and uses of roots not taken into account overhead in data transfers The tree based CS approach reqires the splitting of DFG representations of basic blocs into trees. Generally the splitting is performed at nodes denoting CSEs. Furthermore the tree based approach requires unique locations for the tree roots. For irregular data paths generally the memory is selected. The problem is that the transfer costs for the routes between def’s and uses of tree roots are not taken into account properly. The location of roots to memory often leads to unnecessary store and load operations. This leads to an overhead in transfer operation.

Constraint Satisfaction Problems CSPs
x  {1..3} y  {1..3} domain variables PROBLEM SPECIFICATION constraints x < y SOLUTION of a CSP mapping of variables to domain members fulfilling all constraints search = labeling variables  domain members construct search tree one path at the time backtracking on constraint violation 3 1 x y 2 1) CSP + Solution One way to find a solution is to traverse the variables in a certain order, therby assigning a certain value (from it's domain) to each variable. This can be illustrated by a search tree where nodes represent the variables and outgoing edges the possible assignments. A path denotes the order of traversing the variables together with its mapping. If we are at a point in the tree, where constraints are violated , we have to perform backtracking .... Another aspect is to find and optimal solution .... optimal search branch and bound cost function: f({x,y}) backtracking

Constraint guided Search
constraints are active in the background automatic reactivation by the system x < y constraints effect search x  {1..2} y  {2..3} local reduction of domains x  {1..3} y  {1..3} constraint propagation (CP) pruning impossible paths !!! search strategy variable and value selection high impact on efficiency x 3 We have seen that the constraints guide search by means of indicating backtracking on constraint violation. Furthermore, constraint also perform local reduction of variable domains. Here now members of the domains are eliminated which cannot occur in any solution. This concept is also know as CP and is interleaved with search. The effects can be obeyed in the search tree, where now the set of possible paths is pruned in advance. In each labeling step (going down one move in the tree) now CP is performed, maybe leading to further reductions. This can lead to drastically reductions of the search space. CP together with a certain search strategy now has a high impact on the efficiency of search. Backtracking and CP are automatically handled by the CLP system. 1 2 y activate CP activate CP y y 1 2 3 1 2 3 1 3 2

minimize(labeling(Vs),f(Vs))
Overall Approach CSP model select variables Vs predefined & user defined objective function f(Vs) search strategy labeling(Vs) Specifying and solving a certain problem consists of first specifying a CSP model and defining a search strategy and the a search strategy over the variables. The CLP system we use offers a set of predefinded strategies. Furthermore there exist predefined and generic brach and bound procedures for finding optimal results. These expect a search strategy and a objective function as input. predefined, generic & user defined optimize minimize(labeling(Vs),f(Vs))

CSP Model of Code Selection Representation for Alternative Covers
d:= o1+o2 d{R,X}, o1{R,X}, o2=Y + R := X + Y X := R + Y d=R  o1=X R|X:=R|X+Y combine alternative resources mutual dependencies given by constraints specification of instruction set data transfer restrictions extended information: COSTS, FUNCTIONAL UNITS, etc With regards to the overall approach we first give a CSP model for the code selection problem. This is based on representation of alternative covers for a DFG. We consider the matching RTs at each node of a DFG and combine the RF locations for each definition and operands to single sets of alternative RFs. This can be specified by domain variables, one for each definition and each operand. Since not all combinations of RFs are allowed, like seen in the example, we specify them as constraints over the Rf variables. In order to perform optimal covering we also introduce variable for the costs togethtr with the corresponding constraints. This model can be further extended to include information of FUs, etc with regards to RA and IS

Labeling based Code Selection
CSP representation for all alternative covers of the DFG b + a minimize  ci labeling(Vs) variables d := o1+o2 c - operation costs - transfer costs IDEA 1: labeling of RF variables IDEA 2: labeling of cost variables labeling of d-variable of each CSE reduced search space We now have the CSP model for CS. Each node of a DFG is associated with Variables denoting alternative RFs and Costs. The constraints specify the mutual dependencies between the RFs w.r.t the instruction set. By this we get a representation of all alternative covers of the DFG, w.r.t. to the specified instructions set. The next step is to define a search strategy to find an optimal covering. As an OF we simply accumulate over the set of cost variables. The first idea was to label the RF variables. But a better approach was to label the cost variables and the d-RF variables of each CSE. The first advantage is to have a reduced search space. constraints

Features CSEs part of chained operations common data routes
alternative covers X|Y:=M[a] X|Y:=M[b] a b R:=M[c] * c multiple covers given by remaining alternatives + The model comprises the following features 1) common data routes of CSEs are taken into account 2) CSEs as subpatterns of chained operations. In the example the multiplication is part of two MAC operations. 3) due to labeling the cost variables certain RF variables are not bound to a certain RF, which can be interpreted to have a set of optimal covers. + delayed binding of resources R:=R + X|Y * X|Y R:=R + X|Y * X|Y more flexibility for subsequent phases

Results for ADSP210x Exact DFG Covering
sequential costs time in seconds iir lattice test1 test2 test3 test4 example complex update complex multiply nodes edges CSEs We applied the approach to Benchmarks of the DSPStone Benchmarks and some internal benchmarks with large basic blocks. We had improvements between 20-50% w.r.t. the tree based approach where tree roots where located to memory. For small basic blocks we had eacceptable runtimes. But for lareg BBs the runtimes got unacceptable, The largest one didn#t terminate after 4 weeks. However, the optimal results - 1 was computed within the first 1000 seconds for the larger examples. The rest of the time was spend to verify that there is no better results. DSPStone Benchmarks and internal test set improvements between 20-50% very high run-times for large DFGs

Results for ADSP210x Heuristic DFG Covering
iir lattice test1 test2 test3 test4 example complex update complex multiply We tried a variant of the approach where we partitioned the DFG into smaller DFGs and deriving optimal solutions for each. With this approach we got solutions for all DFGs within 300 seconds (I should note that all but the largest DFG where solved within 5 seconds). This surely leads to suboptimal results. But these where equal or very close to the optimal results. partitioning into smaller DFGs results equal or very close to optimal results also acceptable run-times for large DFGs

Phase Integration of CS
new concept for phase integration of code selection ADSP210x comparable with hand written code TI TMS320C5x up to 50% cost reduction compared to TI compiler CS constraints very high run-times full integration exploit alternative covers exact CS RA,IS constraints register allocation exact CS,RA,IS instruction scheduling

CLP based Advantages search & backtracking high specificaton level
internal code generation model phase coupling problems user control over search exchange strategies while remaining the model „easy“ to extend

Related Work code selection approaches: RTG criterion [Araujo,Malik95]
delayed binding of RFs [Paulin 95] exact DFG covering & extended delayed binding of resources heuristic phase coupling approaches: mutation scheduling [Novak,Nicolau 95] data routing [Hartmann 92, IMEC 94] extended delayed binding of resources exact phase coupling approaches : ILP/LP based [Wilson 95 / Gebotys 97] binate covering [Liao 95, Hanono 97] more expressive power  smaller models more impact on search strategies

Conclusions DFG based code selection
extended OLIVE approach combined with ILP exploitation of SIMD instructions constraint programming handling of CSEs phase integration of code selection high quality code intuitive and manageable code generation models higher run-times, but acceptable reduce complexity: partitioning of graph, etc.

Graph based Code Selection Techniques for Embedded Processors Part II

Similar presentations

Presentation on theme: "Graph based Code Selection Techniques for Embedded Processors Part II"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graph based Code Selection Techniques for Embedded Processors Part II

Similar presentations

Presentation on theme: "Graph based Code Selection Techniques for Embedded Processors Part II"— Presentation transcript:

Similar presentations

About project

Feedback