Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective
Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Modified and integrated by Davide Bertozzi Design Methodologies Array-based design

Late-Binding Implementation
Till now, all methodologies require a complete run through the fabrication process very high NRE (nonrecurring expense) Array-based implementations have less manufacturing costs attractive for small series lower performance/density, higher power Pre-diffused (Gate Arrays) Pre-wired (FPGA's) Array-based

Gate Array — Sea-of-gates
wafers of pre-diffused transistors are pre-manufactured desired interconnections added to determine the overall function of the chip - just a few metallization steps more, applied onto pre-diffused wafers in a week or less - manufacturing irregarding of final application (standard masks) PMOS Uncommited Cell NMOS Committed Cell (4-input NOR) The channelless layout is called “sea-of-gates” (which also does not have predefined contacts)

Gate Array — Primitive cells
PMOS Uncommited Cell NMOS How to determine - composition of primitive cells? - need to ensure maximum transistor exploitation - size of primitive transistors? - flexibility to drive arbitrary loads Static design decisions affect a wide range of designs!!

Sea-of-gate Primitive Cells
Alternative cell structures Using oxide-isolation Using gate-isolation Long rows of transistors sharing the same diffusion area Some transistors must be tied to Vdd or GND for isolation between neighboring gates Isolated cells consist of N transistors In principle, gate-isolation leads to higher transistor density

Sea-of-gate Primitive Cells
Transistor sizing challenge Interconnect-oriented nature of GAs (prop. delay dominated by interconn. capacitance) Favors larger device sizes - large area overhead when unused Connect smaller devices in parallel (e.g., 2 rows of small NMOS TNs, to connect in parallel when needed) Small devices for pass transistor logic or memory cells Using oxide-isolation smaller smaller Utilization factors largely depend on application - from 100% (regular structures) to lower than 75%. Mapping a design onto a gate array is largely automated

Example: Base Cell of Gate-Isolated GA
1 pMOS 1 nMOS Cell height: 21 tracks From Smith97

Example: Flip-Flop in Gate-Isolated GA
From Smith97

Sea-of-gates Memories can be implemented on top of gate arrays
- inefficient (similar to standard cells) GAs integrated with memory macros (embedded gate array) Random Logic Memory Subsystem LSI Logic LEA300K (0.6 mm CMOS) Courtesy LSI Logic

Comparison Some Gate Array design approaches also
- Lower manifacturing cost - larger area - interconnect-centric programming (!) - regular and fixed layout: load factors, wiring parasitics,… can be accurately estimated Standard cell: - Higher manifacturing cost - lower area - less emphasis on routing - load factors and parasitics are only known after placement, routing and extraction Loss of interest Some Gate Array design approaches also leverage regularity and predictability of interconnects

The return of gate arrays?
Array of prediffused cells with a superimposed wiring grid Via programmable gate array (VPGA) Via-programmable cross-point metal-5 metal-6 programmable via Exploits regularity of interconnects [Pileggi02]

Solution: programming in the field, outside the silicon foundry!
Prewired Arrays Solution: programming in the field, outside the silicon foundry! Classification of prewired arrays (or field programmable gate arrays, FPGA): Based on Programming Technique Fuse-based (program-once) Non-volatile RAM based Programmable Logic Style Array-Based Look-up Table based Programmable Interconnect Style Channel-routing Mesh networks

Prewired Arrays Starting from a regular array of cells….. How do we implement programmable logic? How can we commit logic to perform any possible boolean function? How do we store the program/configuration that commits the programmable array to a certain logic function?

Configuration storage
Fuse-based FPGA - Use of fuses (to be blown) or antifuses (to be short-circuited) - small area overhead vs one-time-programmable Nonvolative FPGA - program stored in EEPROM/Flash - functionality retained until next programming round - Additional process steps (e.g., ultrathin oxides), high programming voltages Volatile FPGA - program stored in RAM cells - at power up, configuration re-loading from external non-volatile memory - RAM cells programmed as a giant shift register - linear programming vs multi-cell programming - regular CMOS process is OK - logic function can be dynamically modified on the fly during execution (partial reconfiguration capability)

Antifuse-Based FPGA Open by default, closed by applying current pulse
antifuse polysilicon 10nm ONO dielectric n + antifuse diffusion Open by default, closed by applying current pulse (melting of the dielectric) The opposite holds for FUSES From Smith97

Prewired Arrays ….starting from a regular array of cells….. How do we implement programmable logic? How can we commit logic to perform any possible boolean function? Array-based approach Cell-based approach How do we store the program/configuration that dedicates the programmable array to a certain logic function?

Array-Based Programmable Logic (programmable logic devices, PLD)
Include input in the minterm Include minterm in the output I 5 4 O 3 2 1 Programmable OR array O I 3 2 1 Fixed AND array Programmable OR array I 5 4 O 3 2 1 Fixed OR array Programmable AND array Programmable AND array O O O 3 2 1 Fixed, trade-off flexibility for density and power O 3 O 2 O 1 PLA PROM PAL Indicates programmable connection Indicates fixed connection

Programming a PROM A large fraction of the PROM is unused!
1 X 2 NA : programmed node A large fraction of the PROM is unused! Complex logic functions determine: low performance low programming density And in general, no registers nor flip flops! PLD less and less attractive

More Complex PAL How can I implement sequential logic with PLDs?
Outputs can be fed back as a subset of the inputs Programmable D,T,J-K or clocked S-R flip flop i inputs, j minterms/macrocell, k macrocells From Smith97

Multi-level logic advantages
Reduced sum of products form: x = A D F + A E F + B D F + B E F + C D F + C E F + G 6 x 3-input AND gates + 1 x 7-input OR gate (may not exist!) 25 wires (19 literals plus 6 internal wires) A D 1 F A E 2 F A B B 1 D 3 C F B x D 2 3 x E 4 7 E 4 F F C G D 5 F Factored form: x = (A + B + C) (D + E) F + G 1 x 3-input OR gate, 2 x 2-input OR gates, 1 x 3-input AND gate 10 wires (7 literals plus 3 internal wires) C E 6 F G Such optimizations are unsopported by PLAs

Array-based Programmable Logic
+ REGULAR STRUCTURE accurate parasitic, area, power, speed estimates + SUITABLE FOR 2-LEVEL LOGIC E.g. functions with a large fan-in ..or functions that map well into 2-level logic (e.g.,FSMs) - HIGHER OVERHEAD capacitance of intermediate nodes negatively affects performance and power risk of underutilization, especially PLAs (and waste of power) The alternative is CELL-BASED PROGRAMMABLE LOGIC….

2-input mux as programmable logic block
A mux used as logic function generator Configuration A B S F= X 1 Y XY + A F B 1 S By properly connecting inputs A,B and S to variables X and Y, 10 different logic functions can be obtained

Logic Cell of Actel Fuse-Based FPGA
More complex logic gates with multiple Muxes Used in Actel fuse-based FPGA Any 2 or 3 inputs logic functions; some 4 inputs logic functions; a Latch

Look-up Table Based Logic Cell
EXOR inference The Look-up table stores the truth table of a logic function (with n inputs, any logic function of n inputs can be implemented)

Extensions for sequential cells
Sel LUT D Q CLK LUT-Based Logic Cell

Sizing LUTs Source: Altera white paper: FPGA Architecture Small size LUT increases the level of logic implementation and, hence, increases circuit delay. Large size LUT increases silicon area and cost since some of their inputs are not used in logic implementation.

LUT-Based Logic Cell Complex cells by adding more LUTs, increasing LUT size and inserting flip-flops and Muxes D 4 C 1 ....C x xxxxx 3 2 F Logic function of xxx xx xxxx H P Bits control Multiplexer Controlled by Configuration Program CLB for Xilinx 4000 Series Courtesy Xilinx

How to make interconnects programmable?

Array-Based Programmable Wiring
Interconnect Point Pass transistor with memory cell M (Flash or SRAM) Programmed interconnection Input/output pin Cell Horizontal tracks Vertical tracks

Crossing points Pass transistor
- large number of transistors and control signals - High fan-out wires  delay and power Fuse/antifuse - Fuses: long programming times (few connections usually needed) - Antifuses require less programming - one time programmable Array-based wire programming has been successful only in the write-once class of FPGAs

Mesh-based Interconnect Network
Each logic cell output routed north, west, south or east Connectivity through RAM-programmable switching or connect matrices Switch Box Connect Box Interconnect Point Courtesy Dehon and Wawrzyniek

Transistor Implementation of Mesh
The transistor induces a treshold-voltage drop which limits performance level restorers, zero-Vth transistors, boosted control signals,.. Inefficient for global interconnects Courtesy Dehon and Wawrzyniek

Hierarchical Mesh Network
Most mesh-based FPGA architectures offer alternative wiring resources allowing for effective global wiring Reduced fanout and reduced resistance Courtesy Dehon and Wawrzyniek

ALTERA EPLD Block Diagram
Nonvolatile FPGA Logic cells are PLA elements (called Logic Array Block, LABs) 16 macrocells per LAB Primary inputs Macrocell Courtesy Altera

Altera MAX From Smith97

Altera MAX Interconnect Architecture
column channel row channel LAB2 PIA LAB1 LAB6 t LAB Array-based (MAX ) Simple, predictable does not scale well Mesh-based (MAX 9000) Wide channels (48 to 96 wires) Beyond 560 macrocells Courtesy Altera

Xilinx 4000 Interconnect Architecture
Combines look-up table based approach with mesh-based interconnect Low delay inter-CLB connections 12 Quad 8 Single 4 Double 3 Long Direct CLB 2 Connect 3 Long 12 4 4 8 4 8 4 2 Quad Long Global Long Double Single Global Carry Direct Clock Clock Chain Connect Can also be configured as array of memory cells Distributed over long distances Courtesy Xilinx

RAM-based FPGA Horizontal and vertical routing channels
easily recognizable 1000 CLB: 32x32 array 25000 equivalent gates 422 kbits programming RAM CLB at 250 MHz Multi-CLB adder: MHz 1 32 bit adder: 62 CLB Xilinx XC4025 Courtesy Xilinx

Heterogeneous Programmable Platforms
Centered around an FPGA FPGA Fabric Embedded memories Embedded PowerPc Hardwired multipliers Xilinx Vertex-II Pro High-speed I/O (3.125 Gbps transceivers) Courtesy Xilinx

Berkeley Pleiades Processor
Centered around an ARM7 core - ARM8: system manager - Intensive computations offloaded to a reconfigurable datapath (adders, multipliers, ASIP,..) - FPGA for bit manipulation Interface Reconfigurable Data-path FPGA ARM8 Core 0.25um 6-level metal CMOS 5.2mm x 6.7mm 1.2 Million transistors 40 MHz at 1V 2 extra supplies: 0.4V, 1.5V 1.5~2 mW power dissipation

Digital Integrated Circuits A Design Perspective

Similar presentations

Presentation on theme: "Digital Integrated Circuits A Design Perspective"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Digital Integrated Circuits A Design Perspective

Similar presentations

Presentation on theme: "Digital Integrated Circuits A Design Perspective"— Presentation transcript:

Similar presentations

About project

Feedback