HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof. Fabrizio FERRANDI Correlatore: Ing. Antonino TUMEO Politecnico di Milano
Summary 2 Outlines High-Level Synthesis Proposed methodology Experimental results Some further extensions… Conclusion and future works
Output: register-transfer level (RTL) design in a hardware description language (e.g. SystemC, VHDL and Verilog) Goal: minimize objectives (area, latency, etc.) High-Level Synthesis – Problem description 3 High-Level Synthesis Three main sub-tasks: 1.operation scheduling: when operations start their execution 2.resource allocation and binding: where operations are executed, where values are stored and how elements are interconnected. 3.controller synthesis: which operations are issued Behavioral specification Design constraints Resource Library Datapath & Controller Objectives Scheduling AllocationBinding Controller Synthesis High-Level Synthesis tool Inputs: behavioral description (in C language) library of different types of resources set of constraints “High-Level Synthesis means going from an algorithmic level specification of a behaviour of a digital system to a register level structure that implements that behavior”. McFarland, et al., Proc. IEEE, February 1990.
High-Level Synthesis – Problem description 4 What are the problems? All the sub-tasks are NP-complete: no efficient algorithms Interconnections have to be considered: up to 80% of final area All the tasks are closely interdependent Most of information are available only at the end of the synthesis Genetic algorithms Try non-deterministic approaches with feedback information Multi-objective optimization: reducing to single-objective (weighted average) is not efficient Non-dominated Sorting Genetic Algorithm (NSGA-II) K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA- II,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.
High-Level Synthesis and Design Space Exploration 5 The proposed methodology
Experimental results 6 Development framework Integrated in the PandA framework an open-source C++ framework covering different aspects of the hardware-software design of embedded systems Evolutionary computation with Open BEAGLE framework Functional validation Comparison between Verilog and C simulations Estimation model validation Comparison between estimations and logic synthesis values average error equal 4.02 % standard deviance equal 2.82 % maximum error less than 10 % These values can be effectively used as fitness values
Experimental results 7 Design Space Exploration validation Population size of individuals, evolving up to a maximum of 200 generations the best trade-off between overall execution time and solution quality. Considerations: It takes into account all elements in the design solution It can cover a good number of trade-offs between the fastest solution and the minimal area solution Better approach than existing tools to deal with area constraints Paper accepted for publication at International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Samos, Greece, July 2007 Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs”
Some extensions… 8 Some features just provided… Paper submitted to IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis” Weighted clique covering: in register allocation to reduce interconnections An higher weight is assigned to compatibility edge when the two values involve the same functional units Clique covering on a weighted graphs; results show a further reduction of overall area up to 10%. Fitness inheritance: to reduce overall execution time A fraction of expensive real evaluations is substituted with an estimation based on similar evaluated individuals It is able to reduce overall execution time over by 25% No substantial difference in the final Pareto-optimal solution
Conclusion and future works 9 The main contributions from this thesis are: An high-level synthesis flow from C specifications to HDL descriptions Integration of a model for fast estimation of synthesis results Design space exploration with a genetic algorithm: It takes into account all elements composing the design solution High fitting with real values Multi-objective concurrent optimization Future works: Optimize the results coming from the synthesis flow Further reduce the overall execution time of the proposed methodology Refine the estimation model and specialize it for different targets
Christian PILATO Matr. n Thank you!
The proposed High-Level Synthesis flow 5 High-Level Synthesis Flow The proposed flow is organized as follows: From C to intermediate representation from GIMPLE to produce graph representation High-Level Synthesis Flow 1. Partial binding and Scheduling 2. Finite State Machine creation 3. Register allocation 4. Interconnection allocation 5. Performance and area estimations From data structures to intermediate representation in form of graph From intermediate representation to Hardware Description Language (e.g. Verilog) ready for logic synthesis
1. Partial binding and Scheduling 6 Partial Binding and Scheduling Partial binding: force an operation to be executed on a selected functional unit instance β (+1) = A technique introduced to partially control the final area occupation It can affect scheduling, register allocation and interconnection allocation Scheduling: assign a starting control step to each operation to be executed Many scheduling algorithms are able to support partial binding (Integer Linear Programming formulation, list based algorithm, etc.) Different solutions based on the selected algorithm
2-3. Finite State Machine creation and Register allocation 7 FSM and Register allocation Scheduling gives information about concurrent operations. This information is useful for controller synthesis and register allocation State Transition Graph (STG), based on Moore-FSM model, is created on scheduled specification It represents control flow and concorrent operations Conditional operations create bifurcation based on evaluated conditional values Register allocation: allocate elements to store values across cycle step boundaries. A compiler approach has been implemented on STG: Liveness analysis based on dataflow equations Interference graph based on liveness information Different heuristics to minimize number of registers
4-5. Interconnection allocation and result estimations 8 The final steps… *: C. Brandolese, W. Fornaciari, and F. Salice. “ An Area Estimation Methodology for FPGA Based Designs at SystemC-Level ”, DAC '04: Proceedings of the 41st annual conference on Design automation, pp. 129 – 132, Interconnection allocation: allocate elements to interconnect the hardware components Mux-based architecture: port swapping for commutative operations Glue logic: represent logic netlist to decode commands and select inputs Truth tables based on signals from controller The RTL structural description is now available and it considers all elements. Objective values could be retrieved from logic synthesis too slow! Estimation model: perform fast estimations of objective values. Area is difficult to be estimated Updated and used an existing area model*
Design Space Exploration by Genetic Algorithm 10 Problem dependent elements Chromosome encoding Each operation in the specification has a gene to represent a feasible partial binding Genes are added to represent algorithms used to perform high-level synthesis steps: scheduling, register allocation and interconnection optimization Fitness Evaluation Information from chromosome about partial binding and algorithms are used to perform a synthesis flow. Objective values are estimated using the proposed model
Design Space Exploration by Genetic Algorithm 11 Problem independent elements Generic operators common operators (crossover and mutation) used without modifications: no unfeasible chromosomes can be created. If the gene changed by operators is related to: operation: a new binding constraint for that operation. algorithm: a different algorithm to solve the related synthesis step Initial population created by random or starting from some interesting points to explore around them. Solution ranking ranking into different levels according to their fitness values. accelerated using the fast-non-dominated-sort algorithm available in the NSGA-II