Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa
Topic Overview Introduction Background –Circuit Partitioning (CP) –Handel-C vs. VHDL –Memetic Algorithm Research Challenges Hardware Approach Current Status and Future Work
Introduction Today's technology allows for billions of transistors to be implemented into a single circuit As these transistors become smaller, the interconnect delay is the limiting factor in computer execution speeds These factors place an increasing importance on CAD tools to minimizing this interconnect length As FPGAs become larger and faster, new methods for improving algorithm performance become available 2.0 µ1.5 µ1.0 µ0.8 µ0.5 µ0.35 µ Delay (ns) Minimum Feature Size Typical Gate Delay Interconnect Delay
Circuit Partitioning Method of splitting complex designs into smaller subsystems Attempts to minimize the connection between subsystems The objective is to maximize the number of uncut nets –The longer the interconnects between modules, the longer the delay within the circuit M0 M2 M4 M3 M1 M5 Net 5 Net 1 Net 2 Net 3 Net 4
Development Tools Celoxica DK Design Suite High-level language based on ISO/ANSI-C for the implementation of algorithms in hardware Allows software engineers to design hardware without retraining Can generate VHDL code or a EDIF file Support for many Actel, Altera and Xilinx devices Uses second-party Placement and Routing programs to generate bit files Handel C Source Files Compile Generate EDIF (netlist) Generate VHDL/Verilog Simulate & netlist Place & Route Tools Generation BitStream Design Flow
Similarities of Handel-C & ISO C Similarities –#define, #ifdef, etc. –Casting different Variable types –Function Declarations are the same –Registers stored as variables (eg. int, unsigned, etc) –for, while and do loops Differences –No float, double in Handel-C –Variables in Handel-C are of undefined widths –No Recursive Function Calls –Incline functions generate totally new hardware –No malloc, free (Hardware cannot make dynamic memory –Data can be read in for simulation only –Parallelism exists
Memory is access as a array Type of memory is easily distinguishable Memory of Handel-C Memory Access Advantage Memory Data is access within 1 Clock No specific timing requiredNo specific timing required Block RamBlock Ram External RamExternal Ram Logic RamLogic Ram Memory Access Disadvantage MemoryData[1024] = WriteData;MemoryData[1024] = WriteData; Allows Multi-Dimensional Memory AccessAllows Multi-Dimensional Memory Access Divides operating clock frequency by 4 External Clock Handel-C Clock Write Enable Data
Parallel Execution In Handel-C Parallel Execution par{ } Command Clock 1 Clock 2 Clock 3 Clock 4 Wait Waiting for right execution to finish Channel Communication Allows parallel component to talk to each other Channel
Memetic Algorithm A genetic/evolutionary algorithm which includes a non-genetic local search to improve solutiongenetic/evolutionary algorithmlocal search Genetic Algorithm –Population based heuristic technique based on the biological reproductive system –Operates on the theory of “survival of the fittest” –Good at exploring the solution space Local Search –Iterative improvement algorithms –Often get trapped in sub- optimum solutions –Good at exploiting the solution space –Success is dependent on good starting solutions
Not Global Minimum Genetic Algorithm Local Search
Research Challenges Memetic Algorithms –Increase computational performance of Algorithm (CPU Time) –Exploit the inherent parallel nature of Genetic Algorithms Hardware Development Languages –Determine the impact of High level Languages vs Low level Languages
Approach Explore the most efficient design to implement memetic algorithms on single FPGA chip Achieve increased performance through pipelining and parallelization –Divide the tasks into separate but concurrent components FPGA Chip Different Tasks of algorithm
Genetic Algorithm in Hardware Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module (Pipelined Approch) Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Offspring 1 Offspring 2 Offspring 3
Local Search Algorithm M0 M2 M1 M5 M4 M3 Net 4 Net 5 Net 1 Net 2 Net Block 1Block 0 0 Objective Value = (Uncut Nets) 23 Module Data Block 1 Block (forcing specific nets within one block)
Sequential issues Select Next Move Copy Solution Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Block Ram Update Net Info
Preliminary Results of GA Software Results (Sun Blade 1000 ) BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat Quality Hardware Results 59MHz / 4 ) BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat Speedup 290% 370% 342% 369% 266% 247% 264% 253% -16.8% -32.8% -25.5% -27.3% 0.8% -8.8% 1.1% 8.4%
Handel-C vs VHDL For Local Search Designs 42,19242,898 Total equivalent gate Handel-C VHDL Prototype Handel-C 1/4 (25%) 3,349/24,576 (13%) 2,193/24,576 (8%) 2,204/12,288 (17%) ns ns ns Number of GCLKs Number of 4 input LUTs Number of Slice Registers Number of Slices Usage Summary Average Delay on the 10 Worst Nets Maximum Delay Average Connection Delay for this design Speed 2/4 (50%) 3,333/24,576 (13%) 1,709/24,576 (6%) 2,573/12,288 (20%) ns ns ns (xcv1000-4bg560)
Current Status and Future Work Current Status –Completed VHDL Local Search Prototype Verified through simulation –Completed Handel-C Local Search Design Verified and implemented on RC1000 –Completed Handel-C Genetic Algorithm Design Currently in testing stages Future Work –Complete VHDL Local Search Design and Implementation –Analyze the performance difference between Hardware based Memetic algorithm and Software algorithm
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa