Graph based Code Selection Techniques for Embedded Processors Part II

Slides:

Advertisements

Similar presentations

Part IV: Memory Management

Advertisements

A Constraint Logic Programming Solution to the Teacher Relocation Problem Nagehan Ilhan Assoc.Prof.Dr Zeki Bayram Eastern Mediterranean University Famagusta.

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.

Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.

ICS-271:Notes 5: 1 Lecture 5: Constraint Satisfaction Problems ICS 271 Fall 2008.

Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.

Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.

Constraint Programming for Compiler Optimization March 2006.

Program Representations. Representing programs Goals.

A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.

Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.

Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.

SAT-Based Decision Procedures for Subsets of First-Order Logic

Data Flow Analysis Compiler Design Nov. 8, 2005.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla Gomes and Bart Selman Cornell University Muri Meeting March 2002.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Precision Going back to constant prop, in what cases would we lose precision?

Tractable Symmetry Breaking Using Restricted Search Trees Colva M. Roney-Dougal, Ian P. Gent, Tom Kelsey, Steve Linton Presented by: Shant Karakashian.

Yongjoo Kim*, Jongeun Lee**, Jinyong Lee*, Toan Mai**, Ingoo Heo* and Yunheung Paek* *Seoul National University **UNIST (Ulsan National Institute of Science.

CP Summer School Modelling for Constraint Programming Barbara Smith 1.Definitions, Viewpoints, Constraints 2.Implied Constraints, Optimization,

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.

CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

Solving Problems by searching Well defined problems A probem is well defined if it is easy to automatically asses the validity (utility) of any proposed.

On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.

Chapter 2) CSP solving-An overview Overview of CSP solving techniques: problem reduction, search and solution synthesis Analyses of the characteristics.

OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.

Arc Consistency CPSC 322 – CSP 3 Textbook § 4.5 February 2, 2011.

University of Michigan Electrical Engineering and Computer Science Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators.

High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.

Roman Barták (Charles University in Prague, Czech Republic) ACAT 2010.

1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.

Scheduling with Constraint Programming

Modelling and Solving Configuration Problems on Business

Automatic Test Generation

Double and Multiple Sampling Plan

Memory Segmentation to Exploit Sleep Mode Operation

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Control Flow Testing Handouts

Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Parallel Algorithm Design

Outline of the Chapter Basic Idea Outline of Control Flow Testing

Delay Optimization using SOP Balancing

Structural testing, Path Testing

CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12

Constraint Propagation

On Efficient Graph Substructure Selection

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

Objective of This Course

SAT-Based Area Recovery in Technology Mapping

Instruction Scheduling Hal Perkins Winter 2008

Sungho Kang Yonsei University

Algorithms for Budget-Constrained Survivable Topology Design

Instruction Scheduling Hal Perkins Autumn 2005

Delay Optimization using SOP Balancing

Compiler Construction

Assoc.Prof.Dr Zeki Bayram

Directional consistency Chapter 4

Instruction Scheduling Hal Perkins Autumn 2011

Multidisciplinary Optimization

Presentation transcript:

Graph based Code Selection Techniques for Embedded Processors Part II exact code selection for DFGs using constraint logic programming problem: handling of common sub-expressions (CSEs) in traditional tree based code selection D P R X Y The second part of the talk is concerned with exact code selection for DFGs using CLP. Here I focus on embedded processors with highly irregular data paths, by means of small distributed RFs and FUs. The problem of tree based code selection is the handling of CSEs. I will give a short outline of CP concepts then introduce the CS approach and experimental results.

Problems of traditional Tree based Code Selection splitting of DFG into trees required: unique locations for roots generally memory selected a b * c + t X:=M[a] Y:=M[b] Y:=X*Y R:=X+Y R:=R+Y memory X:=M[c] M[t]:=Y Y:=M[t] transfer routes between definitions and uses of roots not taken into account overhead in data transfers The tree based CS approach reqires the splitting of DFG representations of basic blocs into trees. Generally the splitting is performed at nodes denoting CSEs. Furthermore the tree based approach requires unique locations for the tree roots. For irregular data paths generally the memory is selected. The problem is that the transfer costs for the routes between def’s and uses of tree roots are not taken into account properly. The location of roots to memory often leads to unnecessary store and load operations. This leads to an overhead in transfer operation.

Constraint Satisfaction Problems CSPs x  {1..3} y  {1..3} domain variables PROBLEM SPECIFICATION constraints x < y SOLUTION of a CSP mapping of variables to domain members fulfilling all constraints search = labeling variables  domain members construct search tree one path at the time backtracking on constraint violation 3 1 x y 2 1) CSP + Solution One way to find a solution is to traverse the variables in a certain order, therby assigning a certain value (from it's domain) to each variable. This can be illustrated by a search tree where nodes represent the variables and outgoing edges the possible assignments. A path denotes the order of traversing the variables together with its mapping. If we are at a point in the tree, where constraints are violated , we have to perform backtracking .... Another aspect is to find and optimal solution .... optimal search branch and bound cost function: f({x,y}) backtracking

Constraint guided Search constraints are active in the background automatic reactivation by the system x < y constraints effect search x  {1..2} y  {2..3} local reduction of domains x  {1..3} y  {1..3} constraint propagation (CP) pruning impossible paths !!! search strategy variable and value selection high impact on efficiency x 3 We have seen that the constraints guide search by means of indicating backtracking on constraint violation. Furthermore, constraint also perform local reduction of variable domains. Here now members of the domains are eliminated which cannot occur in any solution. This concept is also know as CP and is interleaved with search. The effects can be obeyed in the search tree, where now the set of possible paths is pruned in advance. In each labeling step (going down one move in the tree) now CP is performed, maybe leading to further reductions. This can lead to drastically reductions of the search space. CP together with a certain search strategy now has a high impact on the efficiency of search. Backtracking and CP are automatically handled by the CLP system. 1 2 y activate CP activate CP y y 1 2 3 1 2 3 1 3 2

minimize(labeling(Vs),f(Vs)) Overall Approach CSP model select variables Vs predefined & user defined objective function f(Vs) search strategy labeling(Vs) Specifying and solving a certain problem consists of first specifying a CSP model and defining a search strategy and the a search strategy over the variables. The CLP system we use offers a set of predefinded strategies. Furthermore there exist predefined and generic brach and bound procedures for finding optimal results. These expect a search strategy and a objective function as input. predefined, generic & user defined optimize minimize(labeling(Vs),f(Vs))

CSP Model of Code Selection Representation for Alternative Covers d:= o1+o2 d{R,X}, o1{R,X}, o2=Y + R := X + Y X := R + Y d=R  o1=X R|X:=R|X+Y combine alternative resources mutual dependencies given by constraints specification of instruction set data transfer restrictions extended information: COSTS, FUNCTIONAL UNITS, etc With regards to the overall approach we first give a CSP model for the code selection problem. This is based on representation of alternative covers for a DFG. We consider the matching RTs at each node of a DFG and combine the RF locations for each definition and operands to single sets of alternative RFs. This can be specified by domain variables, one for each definition and each operand. Since not all combinations of RFs are allowed, like seen in the example, we specify them as constraints over the Rf variables. In order to perform optimal covering we also introduce variable for the costs togethtr with the corresponding constraints. This model can be further extended to include information of FUs, etc with regards to RA and IS

Labeling based Code Selection CSP representation for all alternative covers of the DFG b + a minimize  ci labeling(Vs) variables d := o1+o2 c - operation costs - transfer costs IDEA 1: labeling of RF variables IDEA 2: labeling of cost variables labeling of d-variable of each CSE reduced search space We now have the CSP model for CS. Each node of a DFG is associated with Variables denoting alternative RFs and Costs. The constraints specify the mutual dependencies between the RFs w.r.t the instruction set. By this we get a representation of all alternative covers of the DFG, w.r.t. to the specified instructions set. The next step is to define a search strategy to find an optimal covering. As an OF we simply accumulate over the set of cost variables. The first idea was to label the RF variables. But a better approach was to label the cost variables and the d-RF variables of each CSE. The first advantage is to have a reduced search space. constraints

Features CSEs part of chained operations common data routes alternative covers X|Y:=M[a] X|Y:=M[b] a b R:=M[c] * c multiple covers given by remaining alternatives + The model comprises the following features 1) common data routes of CSEs are taken into account 2) CSEs as subpatterns of chained operations. In the example the multiplication is part of two MAC operations. 3) due to labeling the cost variables certain RF variables are not bound to a certain RF, which can be interpreted to have a set of optimal covers. + delayed binding of resources R:=R + X|Y * X|Y R:=R + X|Y * X|Y more flexibility for subsequent phases

Results for ADSP210x Exact DFG Covering sequential costs time in seconds iir lattice test1 test2 test3 test4 example complex update complex multiply nodes edges CSEs We applied the approach to Benchmarks of the DSPStone Benchmarks and some internal benchmarks with large basic blocks. We had improvements between 20-50% w.r.t. the tree based approach where tree roots where located to memory. For small basic blocks we had eacceptable runtimes. But for lareg BBs the runtimes got unacceptable, The largest one didn#t terminate after 4 weeks. However, the optimal results - 1 was computed within the first 1000 seconds for the larger examples. The rest of the time was spend to verify that there is no better results. DSPStone Benchmarks and internal test set improvements between 20-50% very high run-times for large DFGs

Results for ADSP210x Heuristic DFG Covering iir lattice test1 test2 test3 test4 example complex update complex multiply We tried a variant of the approach where we partitioned the DFG into smaller DFGs and deriving optimal solutions for each. With this approach we got solutions for all DFGs within 300 seconds (I should note that all but the largest DFG where solved within 5 seconds). This surely leads to suboptimal results. But these where equal or very close to the optimal results. partitioning into smaller DFGs results equal or very close to optimal results also acceptable run-times for large DFGs

Phase Integration of CS new concept for phase integration of code selection ADSP210x comparable with hand written code TI TMS320C5x up to 50% cost reduction compared to TI compiler CS constraints very high run-times full integration exploit alternative covers exact CS RA,IS constraints register allocation exact CS,RA,IS instruction scheduling

CLP based Advantages search & backtracking high specificaton level internal code generation model phase coupling problems user control over search exchange strategies while remaining the model „easy“ to extend

Related Work code selection approaches: RTG criterion [Araujo,Malik95] delayed binding of RFs [Paulin 95] exact DFG covering & extended delayed binding of resources heuristic phase coupling approaches: mutation scheduling [Novak,Nicolau 95] data routing [Hartmann 92, IMEC 94] extended delayed binding of resources exact phase coupling approaches : ILP/LP based [Wilson 95 / Gebotys 97] binate covering [Liao 95, Hanono 97] more expressive power  smaller models more impact on search strategies

Conclusions DFG based code selection extended OLIVE approach combined with ILP exploitation of SIMD instructions constraint programming handling of CSEs phase integration of code selection high quality code intuitive and manageable code generation models higher run-times, but acceptable reduce complexity: partitioning of graph, etc.