Download presentation
Presentation is loading. Please wait.
Published byGary Caldwell Modified over 9 years ago
1
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa
2
Topic Overview Introduction Background –Circuit Partitioning (CP) –Handel-C vs. VHDL –Memetic Algorithm Research Challenges Hardware Approach Current Status and Future Work
3
Introduction Today's technology allows for billions of transistors to be implemented into a single circuit As these transistors become smaller, the interconnect delay is the limiting factor in computer execution speeds These factors place an increasing importance on CAD tools to minimizing this interconnect length As FPGAs become larger and faster, new methods for improving algorithm performance become available 2.0 µ1.5 µ1.0 µ0.8 µ0.5 µ0.35 µ 0.1 1.0 10 Delay (ns) Minimum Feature Size Typical Gate Delay Interconnect Delay
4
Circuit Partitioning Method of splitting complex designs into smaller subsystems Attempts to minimize the connection between subsystems The objective is to maximize the number of uncut nets –The longer the interconnects between modules, the longer the delay within the circuit M0 M2 M4 M3 M1 M5 Net 5 Net 1 Net 2 Net 3 Net 4
5
Development Tools Celoxica DK Design Suite High-level language based on ISO/ANSI-C for the implementation of algorithms in hardware Allows software engineers to design hardware without retraining Can generate VHDL code or a EDIF file Support for many Actel, Altera and Xilinx devices Uses second-party Placement and Routing programs to generate bit files Handel C Source Files Compile Generate EDIF (netlist) Generate VHDL/Verilog Simulate & netlist Place & Route Tools Generation BitStream Design Flow
6
Similarities of Handel-C & ISO C Similarities –#define, #ifdef, etc. –Casting different Variable types –Function Declarations are the same –Registers stored as variables (eg. int, unsigned, etc) –for, while and do loops Differences –No float, double in Handel-C –Variables in Handel-C are of undefined widths –No Recursive Function Calls –Incline functions generate totally new hardware –No malloc, free (Hardware cannot make dynamic memory –Data can be read in for simulation only –Parallelism exists
7
Memory is access as a array Type of memory is easily distinguishable Memory of Handel-C Memory Access Advantage Memory Data is access within 1 Clock No specific timing requiredNo specific timing required Block RamBlock Ram External RamExternal Ram Logic RamLogic Ram Memory Access Disadvantage MemoryData[1024] = WriteData;MemoryData[1024] = WriteData; Allows Multi-Dimensional Memory AccessAllows Multi-Dimensional Memory Access Divides operating clock frequency by 4 External Clock Handel-C Clock Write Enable Data
8
Parallel Execution In Handel-C Parallel Execution par{ } Command Clock 1 Clock 2 Clock 3 Clock 4 Wait Waiting for right execution to finish Channel Communication Allows parallel component to talk to each other Channel
9
Memetic Algorithm A genetic/evolutionary algorithm which includes a non-genetic local search to improve solutiongenetic/evolutionary algorithmlocal search Genetic Algorithm –Population based heuristic technique based on the biological reproductive system –Operates on the theory of “survival of the fittest” –Good at exploring the solution space Local Search –Iterative improvement algorithms –Often get trapped in sub- optimum solutions –Good at exploiting the solution space –Success is dependent on good starting solutions
10
Not Global Minimum Genetic Algorithm Local Search
11
Research Challenges Memetic Algorithms –Increase computational performance of Algorithm (CPU Time) –Exploit the inherent parallel nature of Genetic Algorithms Hardware Development Languages –Determine the impact of High level Languages vs Low level Languages
12
Approach Explore the most efficient design to implement memetic algorithms on single FPGA chip Achieve increased performance through pipelining and parallelization –Divide the tasks into separate but concurrent components FPGA Chip Different Tasks of algorithm
13
Genetic Algorithm in Hardware Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module (Pipelined Approch) Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Offspring 1 Offspring 2 Offspring 3
14
Local Search Algorithm M0 M2 M1 M5 M4 M3 Net 4 Net 5 Net 1 Net 2 Net 3 012345 011010 Block 1Block 0 0 Objective Value = (Uncut Nets) 23 Module Data 010 010 Block 1 Block 0 12345 00 00 11 0 (forcing specific nets within one block)
15
Sequential issues Select Next Move Copy Solution Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Block Ram Update Net Info
16
Preliminary Results of GA Software Results (Sun Blade 1000 ) 107.6 BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat 833 3014 1952 2271 24 300 224 149 902 3029 1920 2192 32 294 221 147 795.4 2580.6 1713.2 1947.6 25 253.2 186.6 767.2 2504.4 1671.2 1887.8 19.2 241.2 175.4 96.2 786.4 2546.6 1694.6 1919.6 24.7 251.1 184.6 107.4 5.642 14.539 8.252 12.134 1.073 2.703 2.361 2.480 30.6 122.1 73.1 87.9 0.8 8.4 6.6 4.3 Quality Hardware Results (@ 59MHz / 4 ) 116.6 BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat 833 3014 1952 2271 24 300 224 149 902 3029 1920 2192 32 294 221 147 661.4 1732.0 1275.4 1415.0 25.2 230.8 188.8 645.2 1703.0 1246.8 1390.0 22.0 221.2 182.0 112.0 657.2 1723.8 1266.2 1407.8 25.2 229.8 188.2 116.3 3.775 7.041 6.705 6.138 0.333 1.883 1.316 0.661 10.3 33.0 21.4 23.8 0.3 3.4 2.5 1.7 Speedup 290% 370% 342% 369% 266% 247% 264% 253% -16.8% -32.8% -25.5% -27.3% 0.8% -8.8% 1.1% 8.4%
17
Handel-C vs VHDL For Local Search Designs 42,19242,898 Total equivalent gate Handel-C VHDL Prototype Handel-C 1/4 (25%) 3,349/24,576 (13%) 2,193/24,576 (8%) 2,204/12,288 (17%) 11.612 ns 15.768 ns 2.921 ns Number of GCLKs Number of 4 input LUTs Number of Slice Registers Number of Slices Usage Summary Average Delay on the 10 Worst Nets Maximum Delay Average Connection Delay for this design Speed 2/4 (50%) 3,333/24,576 (13%) 1,709/24,576 (6%) 2,573/12,288 (20%) 11.309 ns 11.979 ns 2.775 ns (xcv1000-4bg560)
18
Current Status and Future Work Current Status –Completed VHDL Local Search Prototype Verified through simulation –Completed Handel-C Local Search Design Verified and implemented on RC1000 –Completed Handel-C Genetic Algorithm Design Currently in testing stages Future Work –Complete VHDL Local Search Design and Implementation –Analyze the performance difference between Hardware based Memetic algorithm and Software algorithm
19
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.