ECE 448 Lecture 16 ASIC Front-End Design ECE 448 – FPGA and ASIC Design with VHDL
Two competing implementation approaches ASIC Application Specific Integrated Circuit FPGA Field Programmable Gate Array designed all the way from behavioral description to physical layout no physical layout design; design ends with a bitstream used to configure a device designs must be sent for expensive and time consuming fabrication in semiconductor foundry bought off the shelf and reconfigured by designers themselves ECE 448 – FPGA and ASIC Design with VHDL
FPGAs vs. ASICs FPGAs ASICs Off-the-shelf High performance Low development costs Low power Short time to the market Low cost (but only in high volumes) Reconfigurability ECE 448 – FPGA and ASIC Design with VHDL
ASIC Design Example – Factoring circuit/GMU Global Memory Local Memory ECE 448 – FPGA and ASIC Design with VHDL
Area of Xilinx Virtex II 6000 ASIC 130 nm vs. Virtex II 6000 Factoring/GMU 19.80 mm 51x Area of Xilinx Virtex II 6000 FPGA (estimation by R.J. Lim Fong, MS Thesis, VPI, 2004) 19.68 mm 2.7 mm 2.82 mm Area of an ASIC with equivalent functionality ECE 448 – FPGA and ASIC Design with VHDL
ASICs vs. FPGAs Source: I. Kuon, J. Rose, University of Toronto “Measuring the Gap Between FPGAs and ASICs” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 62, no. 2, Feb 2007. ECE 448 – FPGA and ASIC Design with VHDL
ECE 448 – FPGA and ASIC Design with VHDL
ECE 448 – FPGA and ASIC Design with VHDL
ECE 448 – FPGA and ASIC Design with VHDL
ECE 448 – FPGA and ASIC Design with VHDL
Simplified ASIC Design Flow Synthesis Front-End Design Timing Analysis Back-End Design Floorplanning Placement Clock Tree Synthesis Design Setup includes reading in the netlist and reference libraries, as well as creating a design library and design cell. Timing Setup includes reading in the timing constraints file, as well as setting up timing parameters within Astro. Routing Design for Manufacturing 31 ECE 448 – FPGA and ASIC Design with VHDL
Major ASIC Toolsets Cadence Magma ECE 448 – FPGA and ASIC Design with VHDL
Simplified ASIC Design Flow Synopsys Tools Synthesis Design Analyzer Front-End Design Primetime Timing Analysis Back-End Design Floorplanning Placement Astro Clock Tree Synthesis Design Setup includes reading in the netlist and reference libraries, as well as creating a design library and design cell. Timing Setup includes reading in the timing constraints file, as well as setting up timing parameters within Astro. Routing Design for Manufacturing 31 ECE 448 – FPGA and ASIC Design with VHDL
A Complete Placed and Routed Chip 28 ECE 448 – FPGA and ASIC Design with VHDL
What is “Physical Layout”? NMOS PMOS OUT VDD GND Physical or Layout View IN IN OUT PMOS NMOS Transistor or Device View VDD GND Semiconductor devices are built or fabricated by growing, implanting and depositing materials on a silicon wafer. Polygons of a specific color or layer represent an aerial view of the specific areas on the silicon wafer where a particular material, represented by that layer, will be implanted or deposited. The composite picture of all of these layers superimposed on each other is called the layout or physical view of the design. Before the devices can be fabricated, each polygon layer is converted into one or more masks (see next page). How devices are formed: In the inverter example above, the dark green solid rectangle at the bottom represents an N-type Diffusion Area, while the pea-green stipple-patterned rectangle above represents P-type Diffusion, inside of a pink N-Well area. A transistor device is formed when a conductive material called polysilicon (poly for short), the stippled red line, crosses over and splits the diffusion area into two regions. The poly over the diffusion becomes the gate of both the N- and PMOS devices, and the two separated diffusion regions per device are called the source and drain regions. The source is usually connected to power or ground and the drain usually forms the device output or connects to another device’s source. The blue striped lines represent Metal 1, either aluminum or copper, which acts as interconnect. The small solid black squares are contacts or cuts, which form electrical connections between metal and diffusion or poly. Physical Layout – Topography of devices and interconnects, made up of polygons that represent different layers of material (diffusion, polysilicon, metal, contact, etc) ECE 448 – FPGA and ASIC Design with VHDL
Process of Device Fabrication Devices are fabricated vertically on a silicon substrate wafer by layering different materials in specific locations and shapes on top of each other Each of many process masks defines the shapes and locations of a specific layer of material (diffusion, polysilicon, metal, contact, etc) Mask shapes, derived from the layout view, are transformed to silicon via photolithographic and chemical processes Layout or Mask (aerial) view Silicon Substrate A mask is a glass plate with shapes represented by either opaque or clear areas (depending on if the process step requires a positive or negative image). A photolithographic process allows light to pass through the clear areas of the mask onto the silicon wafer, which is covered by photo-sensitive material. Through chemical processes either the exposed or non-exposed areas will be etched away, depending again on the step, thereby exposing only key underlying areas. These areas are then either ion-implanted (forming “diffusion” areas) or covered with material (metal, polysilicon, oxide insulation) through a deposition step. The fabrication process entails processing the silicon wafer through numerous chemical and photo-lithographical steps, using multiple masks, to build up all the required layers of materials which create the required devices. The next page shows a cross-sectional view of a basic N- and PMOS device. Wafer (cross-sectional) view 40 ECE 448 – FPGA and ASIC Design with VHDL
Wafer Representation of Layout Polygons Input VDD GND Output PMOS NMOS 0.25 um Aerial or Layout View CMOS technology implies that all active devices, or transistors, come in pairs of N- and PMOS transistors. On the left side, you see the layout implementation of the N- and PMOS devices. Each material layer (poly, metal1, diffusion, etc) is represented in layout tools by polygons of a unique color and layer number. When a design is “taped-out”, this refers to the process of writing out each mask layer in the a format called GDSII. The GDSII file is then used to create the individual glass masks for each process layer. On the right side, you see the same devices fabricated on silicon. The masks, which were derived from the 2-dimensional layout representation of the devices, were used to fabricate 3-dimensional devices in silicon. The reference made to the “0.25 um technology”, refers to the minimum width of the polysilicon gate, the red striped polygon above, which this particular process can build (see next page). Wafer Cross-sectional View 41 ECE 448 – FPGA and ASIC Design with VHDL
Front-End Design Flow ECE 448 – FPGA and ASIC Design with VHDL
Simplified RTL Synthesis ECE 448 – FPGA and ASIC Design with VHDL
VHDL vs. Verilog Government Developed Commercially Developed Ada based C based Strongly Type Cast Mildly Type Cast Difficult to learn Easy to Learn More Powerful Less Powerful ECE 448 – FPGA and ASIC Design with VHDL
Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; ECE 448 – FPGA and ASIC Design with VHDL
Logic Synthesis ECE 448 – FPGA and ASIC Design with VHDL
TCL – Tool Command Language Created by John Ousterhout of UC Berkeley Scripting Language Very simple to automate routine tasks. Extension Language Used to customize tools with user/company specific aplications. Nearly all of modern EDA tools have a TCL interface. Very simple to learn and use. ECE 448 – FPGA and ASIC Design with VHDL
TCL Example proc rfmdIfNotDirMkdir { directory } { if {! [file exists $directory]} { file mkdir $directory; } if {! [file isdirectory $directory]} { echo "Could not make \"$directory\""; exit 1; } elseif {! [file writable $directory]} { echo " \"$directory\" is not writable"; } else { return 1; ECE 448 – FPGA and ASIC Design with VHDL
TCL References Practical Programming in Tcl and TK Brent B. Welch Ken Jones TCL/TK in a Nutshell Paul Raines Jeff Tranter ECE 448 – FPGA and ASIC Design with VHDL
Basic Synthesis Flow ECE 448 – FPGA and ASIC Design with VHDL
Synthesis using Design Compiler ECE 448 – FPGA and ASIC Design with VHDL
ECE 448 – FPGA and ASIC Design with VHDL
ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (1) designer = "Pawel Chodowiec" company = "George Mason University" search_path = "./opt3/synopsys/TSMCHOME/digital/Front_End/timing_power/tcb013ghp_200a " link_library = "* tcb013ghptc.db" /* Typical case library */ target_library = "tcb013ghptc.db " symbol_library = "tcb013ghp.sdb " /* Directory configuration */ src_directory = ~/exam1/vhdl/ report_directory = ~/exam1/reports/ db_directory = ~/exam1/db/ ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (2) /* Packages can be only read */ read_file -format vhdl -rtl src_directory + "components.vhd" blocks = {regne, upcount, RAM_16Xn_DISTRIBUTED, exam1} foreach (block, blocks) { block_source = src_directory + block + ".vhd" read_file -format vhdl -rtl block_source analyze -format vhdl -lib WORK block_source } current_design block /* All commands now apply to the entity "exam1" */ ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (3) uniquify /* Creates unique instances of multiple refrenced entities */ link check_design /* Checks the current design for consistency */ /*******************************************/ /* apply block attributes and constraints */ create_clock -period 10 clk /* Defines that the port "clk" on the entity "clk" is the clock for the design. Period=10ns 50% duty cycle Use -waveform option to define duty cycle other than 50%*/ set_operating_conditions NCCOM /*Normal Case Commercial Operating Conditions*/ ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (4) /***************************************************/ /* Apply these constraints to the top-level entity*/ set_max_fanout 100 block set_clock_latency 0.1 find(clock, "clk") set_clock_transition 0.01 find(clock, "clk") set_clock_uncertainty -setup 0.1 find(clock, "clk") set_clock_uncertainty -hold 0.1 find(clock, "clk") set_load 0 all_outputs() set_input_delay 1.0 -clock clk -max all_inputs() set_output_delay -max 1.0 -clock clk all_outputs() set_wire_load_model -library tcb013ghptc -name "TSMC8K_Fsg_Conservative" ECE 448 – FPGA and ASIC Design with VHDL
Wireload model basics (1) ECE 448 – FPGA and ASIC Design with VHDL
Wireload model basics (2) ECE 448 – FPGA and ASIC Design with VHDL
Synthesis script (5) set_dont_touch block compile -map_effort medium change_names -rules vhdl vhdlout_architecture_name = "sort_syn" vhdlout_use_packages = {"IEEE.std_logic_1164"} write -f db -hierarchy -output db_directory + "exam1.db" /*write -f vhdl -hierarchy -output db_directory + "exam1_syn.vhd"*/ report -area > report_directory + "exam1.report_area" report -timing -all > report_directory + "exam1.report_timing" ECE 448 – FPGA and ASIC Design with VHDL
Results of synthesis ECE 448 – FPGA and ASIC Design with VHDL
Area report after synthesis (1) report_area Information: Updating design information... (UID-85) **************************************** Report : area Design : exam1 Version: V-2003.12-SP1 Date: Tue Nov 15 20:39:06 2005 Library(s) Used: tcb013ghptc (File: /opt3/synopsys/TSMCHOME/digital/Front_End/timing_power/ tcb013ghp_200a/tcb013ghptc.db) ECE 448 – FPGA and ASIC Design with VHDL
Area report after synthesis (2) Number of ports: 75 Number of nets: 346 Number of cells: 107 Number of references: 28 Combinational area: 10593.477539 Noncombinational area: 14295.521484 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 24888.976562 Total area: undefined ECE 448 – FPGA and ASIC Design with VHDL
Critical Path (1) Critical Path – The Longest Path From Outputs of Registers to Inputs of Registers t logic D Q in clk out tCritical = tFF-P + tlogic + tFF-setup ECE 448 – FPGA and ASIC Design with VHDL
Critical Path (2) Min. Clock Period = Length of The Critical Path Max. Clock Frequency = 1 / Min. Clock Period ECE 448 – FPGA and ASIC Design with VHDL
n+m n+m ECE 448 – FPGA and ASIC Design with VHDL
Clock Jitter Rising Edge of The Clock Does Not Occur Precisely Periodically May cause faults in the circuit clk ECE 448 – FPGA and ASIC Design with VHDL
Clock Skew Rising Edge of the Clock Does Not Arrive at Clock Inputs of All Flip-flops at The Same Time D Q in clk out delay D Q in clk out delay ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (1) **************************************** Report : timing -path full -delay max -max_paths 1 Design : exam1 Version: V-2003.12-SP1 Date : Tue Nov 15 20:39:06 2005 Operating Conditions: NCCOM Library: tcb013ghptc Wire Load Model Mode: segmented ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (2) Startpoint: in_addr(1) (input port clocked by clk) Endpoint: RegSUM/Q_reg[34] (rising edge-triggered flip-flop clocked by clk) Path Group: clk Path Type: max Des/Clust/Port Wire Load Model Library ----------------------------------------------------------------------------------- exam1 TSMC8K_Fsg_Conservative tcb013ghptc RAM_16Xn_DISTRIBUTED ZeroWireload tcb013ghptc exam1_DW01_cmp2_32_0 ZeroWireload tcb013ghptc exam1_DW01_cmp2_32_1 ZeroWireload tcb013ghptc exam1_DW01_add_35_0 ZeroWireload tcb013ghptc regne_1 ZeroWireload tcb013ghptc regne_2 ZeroWireload tcb013ghptc regne_n35 ZeroWireload tcb013ghptc ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (3) Point Incr Path ------------------------------------------------------------------------------------------------ clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.10 0.10 input external delay 1.00 1.10 f in_addr(1) (in) 0.00 1.10 f U98/Z (CKMUX2D1) 0.13 1.23 f Memory/ADDR[1] (RAM_16Xn_DISTRIBUTED) 0.00 1.23 f Memory/U41/ZN (INVD1) 0.08 1.31 r Memory/U343/Z (OR3D1) 0.10 1.41 r Memory/U338/ZN (INVD2) 0.20 1.61 f Memory/U40/ZN (MOAI22D0) 0.17 1.78 f Memory/U350/Z (OR4D1) 0.26 2.03 f Memory/DATA_OUT[0] (RAM_16Xn_DISTRIBUTED) 0.00 2.03 f ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (4) add_96xplusxplus/B[0] (exam1_DW01_add_35_0) 0.00 2.03 f add_96xplusxplus/U9/Z (AN2D0) 0.12 2.15 f add_96xplusxplus/U1_1/CO (CMPE32D1) 0.10 2.25 f add_96xplusxplus/U1_2/CO (CMPE32D1) 0.10 2.34 f add_96xplusxplus/U1_3/CO (CMPE32D1) 0.10 2.44 f add_96xplusxplus/U1_4/CO (CMPE32D1) 0.10 2.54 f add_96xplusxplus/U1_5/CO (CMPE32D1) 0.10 2.63 f add_96xplusxplus/U1_6/CO (CMPE32D1) 0.10 2.73 f add_96xplusxplus/U1_7/CO (CMPE32D1) 0.10 2.82 f add_96xplusxplus/U1_8/CO (CMPE32D1) 0.10 2.92 f add_96xplusxplus/U1_9/CO (CMPE32D1) 0.10 3.02 f add_96xplusxplus/U1_10/CO (CMPE32D1) 0.10 3.11 f add_96xplusxplus/U1_11/CO (CMPE32D1) 0.10 3.21 f add_96xplusxplus/U1_12/CO (CMPE32D1) 0.10 3.31 f add_96xplusxplus/U1_13/CO (CMPE32D1) 0.10 3.40 f add_96xplusxplus/U1_14/CO (CMPE32D1) 0.10 3.50 f ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (5) add_96xplusxplus/U1_15/CO (CMPE32D1) 0.10 3.60 f add_96xplusxplus/U1_16/CO (CMPE32D1) 0.10 3.69 f add_96xplusxplus/U1_17/CO (CMPE32D1) 0.10 3.79 f add_96xplusxplus/U1_18/CO (CMPE32D1) 0.10 3.88 f add_96xplusxplus/U1_19/CO (CMPE32D1) 0.10 3.98 f add_96xplusxplus/U1_20/CO (CMPE32D1) 0.10 4.08 f add_96xplusxplus/U1_21/CO (CMPE32D1) 0.10 4.17 f add_96xplusxplus/U1_22/CO (CMPE32D1) 0.10 4.27 f add_96xplusxplus/U1_23/CO (CMPE32D1) 0.10 4.37 f add_96xplusxplus/U1_24/CO (CMPE32D1) 0.10 4.46 f add_96xplusxplus/U1_25/CO (CMPE32D1) 0.10 4.56 f add_96xplusxplus/U1_26/CO (CMPE32D1) 0.10 4.66 f add_96xplusxplus/U1_27/CO (CMPE32D1) 0.10 4.75 f add_96xplusxplus/U1_28/CO (CMPE32D1) 0.10 4.85 f add_96xplusxplus/U1_29/CO (CMPE32D1) 0.10 4.94 f add_96xplusxplus/U1_30/CO (CMPE32D1) 0.10 5.04 f add_96xplusxplus/U1_31/CO (CMPE32D1) 0.10 5.14 f ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (6) add_96xplusxplus/U7/Z (AN2D0) 0.10 5.24 f add_96xplusxplus/U5/Z (AN2D0) 0.08 5.32 f add_96xplusxplus/U4/Z (CKXOR2D0) 0.15 5.47 f add_96xplusxplus/SUM[34] (exam1_DW01_add_35_0) 0.00 5.47 f RegSUM/R[34] (regne_n35) 0.00 5.47 f RegSUM/U32/Z (AO21D0) 0.11 5.57 f RegSUM/Q_reg[34]/D (EDFQD1) 0.00 5.57 f data arrival time 5.57 ECE 448 – FPGA and ASIC Design with VHDL
Timing report after synthesis (7) clock clk (rise edge) 10.00 10.00 clock network delay (ideal) 0.10 10.10 clock uncertainty -0.10 10.00 RegSUM/Q_reg[34]/CP (EDFQD1) 0.00 10.00 r library setup time -0.12 9.88 data required time 9.88 ------------------------------------------------------------------------------------- data arrival time -5.57 slack (MET) 4.31 ECE 448 – FPGA and ASIC Design with VHDL
Static Timing Analysis ECE 448 – FPGA and ASIC Design with VHDL
Static Timing Analysis Review Tools will calculate all paths from sequential start point to sequential end point. The worst case path will be used for Setup analysis, and the best case path will be used for hold analysis. All paths are considered for design rule checking ECE 448 – FPGA and ASIC Design with VHDL
Review of Setup and Hold Checks ECE 448 – FPGA and ASIC Design with VHDL
False and Multicycle paths False path Very slow signals like reset, test mode enable, that are not used under normal conditions are classified as false paths Multicycle path Paths that take more than one clock cycle are known as multicycle paths. Have to take define the multicylce paths in the analyzer and it takes those constraints into account when synthesizing ECE 448 – FPGA and ASIC Design with VHDL
Multicycle path - Example ECE 448 – FPGA and ASIC Design with VHDL
Optimization criteria ECE 448 – FPGA and ASIC Design with VHDL
Degrees of freedom and possible trade-offs speed area power testability ECE 448 – FPGA and ASIC Design with VHDL
Degrees of freedom and possible trade-offs speed latency area throughput ECE 448 – FPGA and ASIC Design with VHDL
VHDL Coding for Synthesis ECE 448 – FPGA and ASIC Design with VHDL
Recommended rules for Synthesis When implementing combinational paths do not have hierarchy Register all outputs Do not implement glue logic between blocks, partition them well Separate designs on functional boundary Keep block sizes to a reasonable size ECE 448 – FPGA and ASIC Design with VHDL
Avoid hierarchical combinational blocks The path between reg1 and reg2 is divided between three different block Due to hierarchical boundaries, optimization of the combinational logic cannot be achieved Synthesis tools (Synopsys) maintain the integrity of the I/O ports, combinational optimization cannot be achieved between blocks (unless “grouping” is used). ECE 448 – FPGA and ASIC Design with VHDL
Recommend way to handle Combinational Paths All the combinational circuitry is grouped in the same block that has its output connected the destination flip flop It allows the optimal minimization of the combinational logic during synthesis Allows simplified description of the timing interface ECE 448 – FPGA and ASIC Design with VHDL
Register all outputs Simplifies the synthesis design environment: Inputs to the individual block arrive within the same relative delay (caused by wire delays) Don’t really need to specify output requirements since paths starts at flip flop outputs. Take care of fanouts, rule of thumb, keep the fanout to 16 (dependent on technology and components that are being driven by the output) ECE 448 – FPGA and ASIC Design with VHDL
NO GLUE LOGIC between blocks Due to time pressures, and a bug found that can be simply be fixed by adding some simple glue logic. RESIST THE TEMPTATION!!! At this level in the hierarchy, this implementation will not allow the glue logic to be absorbed within any lower level block. ECE 448 – FPGA and ASIC Design with VHDL
Separate design with different goals reg1 may be driven by time critical function, hence will have different optimization constraints reg3 may be driven by slow logic, hence no need to constrain it for speed ECE 448 – FPGA and ASIC Design with VHDL
Optimization based on design requirements Use different entities to partition design blocks Allows different constraints during synthesis to optimize for area or speed or both. ECE 448 – FPGA and ASIC Design with VHDL
Separate FSM with random logic Separation of the FSM and the random logic allows you to use FSM optimized synthesis ECE 448 – FPGA and ASIC Design with VHDL
Maintain a reasonable block size Partition your design such that each block is between 1000-10000 gates (this is strictly tools and technology dependent) Larger the blocks, longer the run time -> quick iterations cannot be done. ECE 448 – FPGA and ASIC Design with VHDL