ECE555 Lecture 3 Nam Sung Kim University of Wisconsin – Madison Dept. of Electrical & Computer Engineering
Implementation Strategy for digital ICs
Impact of Implementation Choices 100-1000 Domain-specific processor (e.g. DSP) 10-100 Embedded microprocessor Energy Efficiency (in MOPS/mW) 1-10 Hardwired custom Configurable/Parameterizable 0.1-1 Somewhat flexible Fully flexible Flexibility (or application scope) None
Implementation Choices Custom Standard Cells Compiled Cells Ma cro Cells Cell-based Pre-diffused (Gate Arrays) Pre-wired (FPGA's) Array-based Semicustom Digital Circuit Implementation Approaches
The Custom Approach Intel 4004 Courtesy Intel
Transition to Automation and Regular Structures Intel 4004 (‘71) Intel 8080 Intel 8085 Intel 8286 Intel 8486 Courtesy Intel
Cell-based Design (or standard cells) Routing channel requirements are reduced by presence of more interconnect layers
Standard Cell — Example [Brodersen92]
Standard Cell – The New Generation Cell-structure hidden under interconnect layers
Standard Cell - Example 3-input NAND cell (from ST Microelectronics): C = Load capacitance T = input rise/fall time
Automatic Cell Generations Initial transistor geometries Placed transistors Routed cell Compacted cell Finished cell
A Historical Perspective: the PLA x 1 2 AND plane Product terms OR f
Inverting format (NOR-NOR) more effective Two-Level Logic Every logic function can be expressed in sum-of-products format (AND-OR) minterm Inverting format (NOR-NOR) more effective
PLA Schematics (Logic-level)
PLA Schematic (Transistor-level)
PLA Layout – Exploiting Regularity
Breathing Some New Life in PLAs River PLAs A cascade of multiple-output PLAs. Adjacent PLAs are connected via river routing. No placement and routing needed and output buffers and the input buffers of the next stage are shared. Courtesy B. Brayton
Generated by hard-macro module generator Macro Modules 25632 (or 8192 bit) SRAM Generated by hard-macro module generator
“Soft” MacroModules
“Intellectual Property” A Protocol Processor for Wireless
Semicustom Design Flow HDL Logic Synthesis Floorplanning Placement Routing Tape-out Circuit Extraction Pre-Layout Simulation Post-Layout Simulation Structural Physical Behavioral Design Capture Design Iteration
The “Design Closure” Problem Iterative Removal of Timing Violations (white lines)
Integrating Synthesis w/ Physical Design RTL (Timing) Constraints Physical Synthesis Macromodules Fixed netlists Netlist with Place-and-Route Info Place-and-Route Optimization Artwork
Late-Binding Implementation Pre-diffused (Gate Arrays) Pre-wired (FPGA's) Array-based
Gate Array — Sea-of-gates Uncommited Cell Committed Cell (4-input NOR)
Sea-of-gate Primitive Cells Using oxide-isolation Using gate-isolation
Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA300K (0.6 mm CMOS) Courtesy LSI Logic
The return of gate arrays? Via programmable gate array (VPGA) Via-programmable cross-point metal-5 metal-6 programmable via Exploits regularity of interconnect [Pileggi02]
Prewired Arrays Classification of prewired arrays (or field- programmable devices): Based on Programming Technique Fuse-based (program-once) Non-volatile EPROM based RAM based Programmable Logic Style Array-Based Look-up Table Programmable Interconnect Style Channel-routing Mesh networks
Open by default, closed by applying current pulse Fuse-Based FPGA antifuse polysilicon ONO dielectric n + antifuse diffusion 2 l Open by default, closed by applying current pulse
Array-Based Programmable Logic 5 4 O 3 2 1 Programmable OR array O I 3 2 1 Fixed AND array Programmable OR array I 5 4 O 3 2 1 Fixed OR array Programmable AND array Programmable AND array O O O 3 2 1 O 3 O 2 O 1 PLA PROM PAL Indicates programmable connection Indicates fixed connection
Programming a PROM f 1 X 2 NA : programmed node
i inputs, j minterms/macrocell, k macrocells More Complex PAL i inputs, j minterms/macrocell, k macrocells From Smith97
Programmable Logic Block 2-input mux Configuration A B S F= X 1 Y XY A F B 1 S
Logic Cell of Actel Fuse-Based FPGA
Look-up Table Based Logic Cell
Array-Based Programmable Wiring Interconnect Point Programmed interconnection Input/output pin Cell Horizontal tracks Vertical tracks
Mesh-based Interconnect Network Switch Box Connect Box Interconnect Point
Transistor Implementation of Mesh
RAM-based FPGA Xilinx XC4000ex Courtesy Xilinx
Design at a Crossroad: System-on-a-Chip Embedded applications where cost, performance, and energy are the real issues! DSP and control intensive Mixed-mode Combines programmable and application-specific modules Software plays crucial role RAM 500 k Gates FPGA + 1 Gbit DRAM Preprocessing Spectral Multi- Imager system +2 Gbit DRAM Recog- nition mC Analog 64 SIMD Processor Image Conditioning Array + SRAM 100 GOPS
Backup
Logic Transistor per Chip Productivity Logic Transistor per Chip (M) 10,000,000 10,000 1,000 100 10 1 0.1 0.01 0.001 100,000,000 0.01 0.1 1 10 100 1,000 10,000 100,000 Logic Tr./Chip 1,000,000 10,000,000 Tr./Staff Month. 100,000 1,000,000 Complexity 58%/Yr. compounded 10,000 (K) Trans./Staff - Mo. Productivity 100,000 Complexity growth rate 1,000 10,000 x x 100 1,000 x x 21%/Yr. compound x x x Productivity growth rate x 10 100 1 10 2003 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2005 2007 2009 Source: Sematech Complexity outpaces design productivity
A Simple Processor MEMORY INPUT/OUTPUT CONTROL DATAPATH
A System-on-a-Chip: Example Courtesy: Philips
Design Methodology Design process traverses iteratively between three abstractions: behavior, structure, and geometry More and more automation for each of these steps
PLA Layout – Exploiting Regularity V DD GND f And-Plane Or-Plane