Download presentation
Presentation is loading. Please wait.
Published byDina Harmon Modified over 9 years ago
1
HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof. Fabrizio FERRANDI Correlatore: Ing. Antonino TUMEO Politecnico di Milano
2
Summary 2 Outlines High-Level Synthesis Problem definition Open problems Genetic algorithm Overview Proposed methodology High-Level Synthesis flow An illustrative example Design space exploration with genetic algorithm Experimental results Some further extensions… Conclusion and future works
3
High-Level Synthesis – Problem definition 3 High-Level Synthesis Goal: minimize some figures of merit (area, latency, etc.), also called objectives Inputs: behavioral description (in C language) set of constraints library of different types of resources Output: register-transfer level (RTL) design in a hardware description language (e.g. SystemC, VHDL and Verilog) “High-Level Synthesis means going from an algorithmic level specification of a behaviour of a digital system to a register level structure that implements that behavior”. McFarland, et al., Proc. IEEE, February 1990.
4
High-Level Synthesis – Problem description 4 High-Level Synthesis tasks Three main tasks: 1.operation scheduling: when operations start their execution 2.resource allocation and binding: where operations are executed (hardware components), where values are stored and how elements are interconnected. 3.controller synthesis: how operations are issued Behavioral specification Design constraints Resource Library Datapath & Controller Objectives Scheduling AllocationBinding Controller Synthesis High-Level Synthesis tool
5
High-Level Synthesis – Problem description 5 What are the problems? All the sub-problems belong to a class of problems known as NP- Complete: they are difficult to be solved (can not be efficiently solved). So it gives us an excuse to explore other options to try to efficiently but sub-optimally solve the problem. Traditional approaches are oriented to optimize latency and area occupation. Particular attention only to functional units and registers area. Recent studies have demonstrated that interconnection costs have to be taken into account. D. Chen and J. Cong, “ Register binding and port assignment for multiplexer optimization ”, ASP-DAC ’ 04: Proceedings of the 2004 conference on Asia South Pacific design automation, pp. 68 – 73, 2004.
6
High-Level Synthesis – Problem description 6 What are the problems? (cont’d) All the sub-tasks are NP-complete problems, so there are not efficient algorithms to solve them All the tasks are closely interdependent and they should have to be considered together. Most of information are available only at the end of the synthesis Genetic algorithms Try in using nondeterministic approaches with feedback information Multi-objective optimization suggest not to reduce the fitness function to a weighted average. Non-dominated Sorting Genetic Algorithm (NSGA-II) K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA- II,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.
7
Genetic algorithm overview 7 Genetic algorithm Chromosome encoding and fitness evaluation are usally the only elements that depends on the problem
8
The proposed High-Level Synthesis flow 8 High-Level Synthesis Flow The proposed flow is organized as follows: From C to intermediate representation from GIMPLE to produce graph representation High-Level Synthesis Flow Partial binding and Scheduling Finite State Machine creation Register allocation Interconnection allocation Performance and area estimations From datastructures to structural description in form of graph representation From intermediate representation to Hardware Description Language (e.g. Verilog) ready for low-level synthesis
9
High-Level Synthesis and Design Space Exploration 9 The proposed methodology
10
1. Partial binding and Scheduling 10 Partial Binding and Scheduling Partial binding: force an operation to be executed on a selected functional unit instance β ( +2 ) = [ADDER,1] A technique introduced to partially control the final area occupation It can control scheduling, register allocation and interconnection allocation Scheduling: a cycle step is assigned to each operation to be executed for starting execution Different algorithms are able to support partial binding feature for scheduling: Integer Linear Programming formulation List Based algorithm Different solutions based on selected algorithm
11
2. Finite State Machine creation 11 Finite State Machine creation Scheduled specification gives information about operations that have to be executed in the same control step. This information is useful for: Register allocation Controller synthesis Finite State Machine model is a good representation for this situation. Moore model has been implemented for its natural correspondence with the problem State Transition Graph is created on scheduled specification
12
3. Register allocation 12 Register allocation Storage elements have to be allocated to store values across the cycle step boundaries A compiler approach has been implemented: Dataflow equations to compute liveness information Conflict graph creation based on liveness information Vertex coloring heuristic to minimize number of registers Compilers uses Control Flow Graph. It does not exploit parallelism (concurrent execution of operations) This approach uses State Transition Graph as base for dataflow analysis. It represent the control flow and it contains information about concurrent operations
13
4-5. Interconnection allocation and result estimations 13 The final steps… C. Brandolese, W. Fornaciari, and F. Salice. “ An Area Estimation Methodology for FPGA Based Designs at SystemC-Level ”, ASP-DAC ’ 04: Proceedings of the 2004 conference on Asia South Pacific design automation, pp. 129 – 132, 2004. Interconnection optimization Port swapping for commutative operations Boolean logic for decoding and selection Truth table are created from enables coming from controller to select right inputs of multiplexers Final structural description is now available and real values could be retrieved from low-level synthesis: too slow! Estimation model Interconnections and glue logic are difficult to be estimated, due to optimization made by synthesis tool Used an existing model for estimations of area occupied by solutions Linear regression to update coefficients due to different representation, synthesis tool and target devices
14
Example 14 An illustrative example
15
Design Space Exploration by Genetic Algorithm 15 Problem dependent elements Each operation in the behavioral specification has a gene associated to represent a feasible partial binding Additional genes are added to represent algorithms which the high-level synthesis sub-tasks are performed with. The implementation provides genes for: Scheduling Register allocation Interconnection allocation Fitness values is obtained retrieving information from chromosome, using them as partial binding and starting a synthesis flow. Then results are estimated using models and values returned to algorithm
16
Design Space Exploration by Genetic Algorithm 16 Problem independent elements Common generic operators can be used without modifications: Uniform crossover Uniform mutation If the gene changed by operators is related to: an operation: the result is a new binding for that operation. an algorithm: the algorithm used to solve the related synthesis step is changed. The initial population is created by random generation or by starting from a first admissible binding. It allows the algorithm to start from some interesting points (e.g. minimum number of functional units or minimum latency) and the explore around them. Solutions are sorted into different levels according to their fitness values. The ranking has been accelerated using the fast-non-dominated-sort algorithm available in the NSGA-II algorithm.
17
Experimental results 17 Experimental results Development framework Implemented and integrated in a C++ framework, named PandA (an open- source framework covering different aspects of the hardware-software design of embedded systems). Open BEAGLE framework has been used for evolutionary computation. Functional validation A simple question: does it really works? The final RTL design really implements the behavioral specification? Verilog and C can be integrated and a simple regression test can be used for functional validation Area model validation Comparison between evaluations and values coming from low-level synthesis average error equal 3.7% and maximum error equal 14% These values can be effectively used as fitness values
18
Experimental results 18 Experimental results Design Space Exploration validation population size of 1.000 individuals, evolving up to a maximum of 200 generations has been revealed to be the better trade-off between overall execution time and solution quality. Better approach than existing tools to deal with area constraints It can explore both the fastest solution (with unconstrained number of resources) and the minimal area solution, while covering a good number of solutions in between as trade-offs Paper accepted for publication: Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs” Conference: International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Samos, Greece, July 2007
19
Conclusion 19 Some features just provided… Paper submitted to conference: Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis” Conference: IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September 2007. Extension provided after the thesis has been written Weighted clique covering for register allocation: Compatibility edges are weighted. An higher weight is assigned when the two values involve the same functional units (it could be reduce interconnections) Branch and bound approach to solve clique covering on a weighted graph; results show a further reduction of overall area up to 10%. Fitness inheritance substituting a fraction of expensive real evaluations with an estimation based on neighbors in an hypothetical design space: created a model for inheritance. it is able to reduce overall execution time over by 25%, without any substantial difference in the final Pareto-optimal solution set.
20
Conclusion and future works 20 Conclusion and future works The main contributions from this thesis are: An high-level synthesis flow from C specification to HDL description, simulable and synthetizable Area model for fast estimation of synthesis results Design Space Exploration using a genetic algorithm that integrate the synthesis flow and the area model estimation to lead the evolution to good design solution, taking into account all elements that contribute to area and time in final designes Future works: Optimize the synthesis flow and provide an efficient support to particular costructs (e.g. loops and memory access) Reduce the overall execution time of the proposed methodology (fitness inheritance or initialize population based on an assigned given Pareto-solution set) Refine the area model and allow to specialize it for different target devices on- the-fly (e.g. parameters stored in an external file and loaded at start-up)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.