Download presentation
Presentation is loading. Please wait.
Published byClifton Gilmore Modified over 9 years ago
1
COE 405 Programmable Logic and Storage Devices Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals
2
1-2 OutlineOutline n History of Computational Fabrics n ASIC vs. FPGA n Reconfigurable Logic n Anti-Fuse-Based Approach (Actel) n RAM Based Field Programmable Logic (Xilinx) n CLBs n Carry & Control Logic n FPGA Memory Implementation n History of Computational Fabrics n ASIC vs. FPGA n Reconfigurable Logic n Anti-Fuse-Based Approach (Actel) n RAM Based Field Programmable Logic (Xilinx) n CLBs n Carry & Control Logic n FPGA Memory Implementation
3
1-3 History of Computational Fabrics n Discrete devices: relays, transistors (1940s-50s) n Discrete logic gates (1950s-60s) n Integrated circuits (1960s-70s) e.g. TTL packages: Data Book for 100’s of different parts n Gate Arrays (IBM 1970s) Transistors are pre-placed on the chip & Place and Route software puts the chip together automatically – only program the interconnect (mask programming) n Software Based Schemes (1970’s- present) Run instructions on a general purpose core n Discrete devices: relays, transistors (1940s-50s) n Discrete logic gates (1950s-60s) n Integrated circuits (1960s-70s) e.g. TTL packages: Data Book for 100’s of different parts n Gate Arrays (IBM 1970s) Transistors are pre-placed on the chip & Place and Route software puts the chip together automatically – only program the interconnect (mask programming) n Software Based Schemes (1970’s- present) Run instructions on a general purpose core
4
1-4 History of Computational Fabrics n ASIC Design (1980’s to present) Turn Verilog directly into layout using a library of standard cells Effective for high-volume and efficient use of silicon area n Programmable Logic (1980’s to present) A chip that is reprogrammed after it has been fabricated Examples: PALs, PLAs, EPROM, EEPROM, PLDs, FPGAs Excellent support for mapping from Verilog n ASIC Design (1980’s to present) Turn Verilog directly into layout using a library of standard cells Effective for high-volume and efficient use of silicon area n Programmable Logic (1980’s to present) A chip that is reprogrammed after it has been fabricated Examples: PALs, PLAs, EPROM, EEPROM, PLDs, FPGAs Excellent support for mapping from Verilog
5
1-5 What is an FPGA? n A filed programmable gate array (FPGA) is a reprogrammable silicon chip. n Using prebuilt logic blocks and programmable routing resources, you can configure these chips to implement custom hardware functionality without ever having to pick up a breadboard or soldering iron. n You develop digital computing tasks in software and compile them down to a configuration file or bitstream that contains information on how the components should be wired together. n A filed programmable gate array (FPGA) is a reprogrammable silicon chip. n Using prebuilt logic blocks and programmable routing resources, you can configure these chips to implement custom hardware functionality without ever having to pick up a breadboard or soldering iron. n You develop digital computing tasks in software and compile them down to a configuration file or bitstream that contains information on how the components should be wired together.
6
1-6 ASIC vs. FPGA designs must be sent for expensive and time consuming fabrication in semiconductor foundry bought off the shelf and reconfigured by designers themselves ASIC Application Specific Integrated Circuit FPGA Field Programmable Gate Array designed all the way from behavioral description to physical layout no physical layout design; design ends with a bitstream used to configure a device
7
1-7 ASIC vs. FPGA Off-the-shelf Low development cost Short time to market Reconfigurability High performance ASICsFPGAs Low power Low cost in high volumes
8
1-8 Other FPGA Advantages n Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits n Easy upgrades like in case of software n FPGA provide a flexible platform for implementing digital computing n A rich set of macros and I/Os supported (multipliers, block RAMS, ROMS, high-speed I/O) n A wide range of applications from prototyping (to validate a design before ASIC mapping) to high performance spatial computing n Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits n Easy upgrades like in case of software n FPGA provide a flexible platform for implementing digital computing n A rich set of macros and I/Os supported (multipliers, block RAMS, ROMS, high-speed I/O) n A wide range of applications from prototyping (to validate a design before ASIC mapping) to high performance spatial computing
9
1-9 How are FPGAs Used? n Prototyping Ensemble of gate arrays used to emulate a circuit to be manufactured Get more/better/faster debugging done than with simulation n Reconfigurable hardware One hardware block used to implement more than one function n Special-purpose computation engines Hardware dedicated to solving one problem (or class of problems) Accelerators attached to general- purpose computers (e.g., in a cell phone!) n Prototyping Ensemble of gate arrays used to emulate a circuit to be manufactured Get more/better/faster debugging done than with simulation n Reconfigurable hardware One hardware block used to implement more than one function n Special-purpose computation engines Hardware dedicated to solving one problem (or class of problems) Accelerators attached to general- purpose computers (e.g., in a cell phone!)
10
1-10 Major FPGA Vendors SRAM-based FPGAs n Xilinx, Inc. n Altera Corp. n Atmel n Lattice Semiconductor Flash & antifuse FPGAs n Actel Corp. n Quick Logic Corp. SRAM-based FPGAs n Xilinx, Inc. n Altera Corp. n Atmel n Lattice Semiconductor Flash & antifuse FPGAs n Actel Corp. n Quick Logic Corp. Share over 60% of the market
11
1-11 Reconfigurable Logic
12
1-12 Anti-Fuse-Based Approach (Actel)
13
1-13 Actel Logic Module Combinational Block Example Gate Mapping S-R Latch
14
1-14 Actel Routing & Programming
15
1-15 RAM Based Field Programmable Logic - Xilinx
16
1-16 Xilinx FPGA Families n Old families XC3000, XC4000, XC5200 Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. n High-performance families Virtex (0.22µm) Virtex-E, Virtex-EM (0.18µm) Virtex-II, Virtex-II PRO (0.13µm) Virtex-4 (0.09µm) n Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 n Old families XC3000, XC4000, XC5200 Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. n High-performance families Virtex (0.22µm) Virtex-E, Virtex-EM (0.18µm) Virtex-II, Virtex-II PRO (0.13µm) Virtex-4 (0.09µm) n Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3
17
1-17 FPGA Nomenclature
18
1-18 Device Part Marking
19
1-19 The Xilinx 4000 CLB
20
1-20 Two 4-input Functions, Registered Output and a Two Input Function
21
1-21 5-input Function, Combinational Output
22
1-22 5-Input Functions implemented using two LUTs
23
1-23 LUT Mapping n N-LUT direct implementation of a truth table: any function of n-inputs. n N-LUT requires 2 N storage elements (latches) n N-inputs select one latch location (like a memory) n N-LUT direct implementation of a truth table: any function of n-inputs. n N-LUT requires 2 N storage elements (latches) n N-inputs select one latch location (like a memory)
24
1-24 Configuring the CLB as a RAM
25
1-25 Xilinx 4000 Interconnect
26
1-26 Xilinx 4000 Interconnect Details
27
1-27 Xilinx 4000 Flexible IOB
28
1-28 Basic I/O Block Structure
29
1-29 IOB Functionality n IOB provides interface between the package pins and CLBs n Each IOB can work as uni- or bi-directional I/O n Outputs can be forced into High Impedance n Inputs and outputs can be registered advised for high-performance I/O n Inputs can be delayed n IOB provides interface between the package pins and CLBs n Each IOB can work as uni- or bi-directional I/O n Outputs can be forced into High Impedance n Inputs and outputs can be registered advised for high-performance I/O n Inputs can be delayed
30
1-30 Additional Features in Modern FPGAs
31
1-31 Spartan-3 Xilinx FPGA Block Diagram
32
1-32 CLB Structure
33
1-33 CLB Slice Structure n Each slice contains two sets of the following: Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control n Each slice contains two sets of the following: Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control
34
1-34 Xilinx Multipurpose LUT (MLUT) 16 x 1 ROM (logic)
35
1-35 5-Input Functions implemented using two LUTs n One CLB Slice can implements any function of 5 inputs n Logic function is partitioned between two LUTs n F5 multiplexer selects LUT n One CLB Slice can implements any function of 5 inputs n Logic function is partitioned between two LUTs n F5 multiplexer selects LUT
36
1-36 Distributed RAM n CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size n Synchronous write n Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read n Two LUTs can make 32 x 1 single-port RAM 16 x 2 single-port RAM 16 x 1 dual-port RAM n CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size n Synchronous write n Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read n Two LUTs can make 32 x 1 single-port RAM 16 x 2 single-port RAM 16 x 1 dual-port RAM
37
1-37 Shift Register n Each LUT can be configured as shift register Serial in, serial out n Dynamically addressable delay up to 16 cycles n For programmable pipeline n Cascade for greater cycle delays n Use CLB flip-flops to add depth n Each LUT can be configured as shift register Serial in, serial out n Dynamically addressable delay up to 16 cycles n For programmable pipeline n Cascade for greater cycle delays n Use CLB flip-flops to add depth
38
1-38 Shift Register n Register-rich FPGA Allows for addition of pipeline stages to increase throughput n Data paths must be balanced to keep desired functionality n Register-rich FPGA Allows for addition of pipeline stages to increase throughput n Data paths must be balanced to keep desired functionality
39
1-39 Carry & Control Logic
40
1-40 Fast Carry Logic n Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters n Carry logic is independent of normal logic and routing resources n All major synthesis tools can infer carry logic for arithmetic functions n Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters n Carry logic is independent of normal logic and routing resources n All major synthesis tools can infer carry logic for arithmetic functions
41
1-41 The Virtex II CLB (Half Slice Shown)
42
1-42 Adder Implementation
43
1-43 Carry Chain
44
1-44 New 18 x 18 Embedded Multiplier n Embedded 18-bit x 18-bit multiplier 2’s complement signed operation n Multipliers are organized in columns n Fast arithmetic functions Optimized to implement multiply / accumulate modules n Embedded 18-bit x 18-bit multiplier 2’s complement signed operation n Multipliers are organized in columns n Fast arithmetic functions Optimized to implement multiply / accumulate modules
45
1-45 Design Flow - Mapping n Technology Mapping: Schematic/HDL to Physical Logic units Compile functions into basic LUT-based groups (function of target architecture) n Technology Mapping: Schematic/HDL to Physical Logic units Compile functions into basic LUT-based groups (function of target architecture)
46
1-46 Design Flow – Placement & Route n Placement – assign logic location on a particular device n Routing – iterative process to connect CLB inputs/outputs and IOBs. Optimizes critical path delay – can take hours or days for large, dense designs n Placement – assign logic location on a particular device n Routing – iterative process to connect CLB inputs/outputs and IOBs. Optimizes critical path delay – can take hours or days for large, dense designs Challenge! Cannot use full chip for reasonable speeds (wires are not ideal). Typically no more than 50% utilization.
47
1-47 Example: Verilog to FPGA
48
1-48 Memory Types
49
1-49 FPGA Memory Implementation n Regular registers in logic blocks Piggy use of resources, but convenient & fast if small n [Xilinx Vertex II] use the LUTs: Single port: 16x(1,2,4,8), 32x(1,2,4,8), 64x(1,2), 128x1 Dual port (1 R/W, 1R): 16x1, 32x1, 64x1 Can fake extra read ports by cloning memory: all clones are written with the same addr/data, but each clone can have a different read address n [Xilinx Vertex II] use block ram: 18K bits: 16Kx1, 8Kx2, 4Kx4 with parity: 2Kx(8+1), 1Kx(16+2), 512x(32+4) Single or dual port Pipelined (clocked) operations n Regular registers in logic blocks Piggy use of resources, but convenient & fast if small n [Xilinx Vertex II] use the LUTs: Single port: 16x(1,2,4,8), 32x(1,2,4,8), 64x(1,2), 128x1 Dual port (1 R/W, 1R): 16x1, 32x1, 64x1 Can fake extra read ports by cloning memory: all clones are written with the same addr/data, but each clone can have a different read address n [Xilinx Vertex II] use block ram: 18K bits: 16Kx1, 8Kx2, 4Kx4 with parity: 2Kx(8+1), 1Kx(16+2), 512x(32+4) Single or dual port Pipelined (clocked) operations
50
1-50 LUT-Based RAMS
51
1-51 LUT-Based RAMS
52
1-52 LUT-Based RAM Modules
53
1-53 LUT-Based RAM Modules // instantiate a LUT-based RAM module RAM16X1S mymem (.D(din),.O(dout),.WE(we),.WCLK(clock_27mhz),.A0(a[0]),.A1(a[1]),.A2(a[2]),.A3(a[3])); defparam mymem.INIT = 16’b01101111001101011100; // msb first // instantiate a LUT-based RAM module RAM16X1S mymem (.D(din),.O(dout),.WE(we),.WCLK(clock_27mhz),.A0(a[0]),.A1(a[1]),.A2(a[2]),.A3(a[3])); defparam mymem.INIT = 16’b01101111001101011100; // msb first
54
1-54 Example of Inferred Memory
55
1-55 Block RAM n Most efficient memory implementation Dedicated blocks of memory n Ideal for most memory requirements 4 to 104 memory blocks 18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories n Builds both single and true dual-port RAMs n Synchronous write and read (different from distributed RAM) n Most efficient memory implementation Dedicated blocks of memory n Ideal for most memory requirements 4 to 104 memory blocks 18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories n Builds both single and true dual-port RAMs n Synchronous write and read (different from distributed RAM)
56
1-56 Block RAM n Support of two independent 9 Kb blocks, or a single 18 Kb block RAM. n Each 9 Kb block RAM can be set to simple dual-port mode, doubling data width of the block RAM to a maximum of 36 bits. n Simple dual-port mode is defined as having one read- only port and one write-only port with independent clocks. n 18 or 36-bit wide ports can have an individual write enable per byte. This feature is popular for interfacing to an on-chip microprocessor. n All inputs are registered with the port clock and have a setup-to-clock timing specification. n Support of two independent 9 Kb blocks, or a single 18 Kb block RAM. n Each 9 Kb block RAM can be set to simple dual-port mode, doubling data width of the block RAM to a maximum of 36 bits. n Simple dual-port mode is defined as having one read- only port and one write-only port with independent clocks. n 18 or 36-bit wide ports can have an individual write enable per byte. This feature is popular for interfacing to an on-chip microprocessor. n All inputs are registered with the port clock and have a setup-to-clock timing specification.
57
1-57 Block RAM n A write operation requires one clock edge. n A read operation requires one clock edge. n All output ports are latched. The state of the output port does not change until the port executes another read or write operation. The default block RAM output is latch mode. n The output data path has an optional internal pipeline register. Using the register mode is strongly recommended. This allows a higher clock rate, however, it adds a clock cycle latency of one. n A write operation requires one clock edge. n A read operation requires one clock edge. n All output ports are latched. The state of the output port does not change until the port executes another read or write operation. The default block RAM output is latch mode. n The output data path has an optional internal pipeline register. Using the register mode is strongly recommended. This allows a higher clock rate, however, it adds a clock cycle latency of one.
58
1-58 Block RAM
59
1-59 Block RAM Logic Diagram
60
1-60 Block RAM Data Combinations and ADDR Locations
61
1-61 Block RAM Port Aspect Ratios
62
1-62 Dual-Port Bus Flexibility n Each port can be configured with a different data bus width n Provides easy data width conversion without any additional logic n Each port can be configured with a different data bus width n Provides easy data width conversion without any additional logic
63
1-63 Simple Dual-Port Mode Allowed Combinations for 9 Kb Block RAM
64
1-64 True Dual-Port Mode Allowed Combinations for 9 Kb Block RAM
65
1-65 18 Kb Block RAM—True Dual-Port Operation
66
1-66 Read & Write Operations n Read Operation In latch mode, the read operation uses one clock edge. The read address is registered on the read port, and the stored data is loaded into the output latches after the RAM access time. When using the output register, the read operation will take one extra latency cycle to arrive at the output. n Write Operation A write operation is a single clock-edge operation. The write address is registered on the write port, and the data input is stored in memory. n Read Operation In latch mode, the read operation uses one clock edge. The read address is registered on the read port, and the stored data is loaded into the output latches after the RAM access time. When using the output register, the read operation will take one extra latency cycle to arrive at the output. n Write Operation A write operation is a single clock-edge operation. The write address is registered on the write port, and the data input is stored in memory.
67
1-67 Write Modes n Three settings of the write mode determines the behavior of the data available on the output latches after a write clock edge: WRITE_FIRST, READ_FIRST, and NO_CHANGE. n The Write mode attribute can be individually selected for each port. The default mode is WRITE_FIRST. n WRITE_FIRST outputs the newly written data onto the output bus. n READ_FIRST outputs the previously stored data while new data is being written. n NO_CHANGE maintains the output previously generated by a read operation. n Three settings of the write mode determines the behavior of the data available on the output latches after a write clock edge: WRITE_FIRST, READ_FIRST, and NO_CHANGE. n The Write mode attribute can be individually selected for each port. The default mode is WRITE_FIRST. n WRITE_FIRST outputs the newly written data onto the output bus. n READ_FIRST outputs the previously stored data while new data is being written. n NO_CHANGE maintains the output previously generated by a read operation.
68
1-68 WRITE_FIRST or Transparent Mode (Default) n In WRITE_FIRST mode, the input data is simultaneously written into memory and stored in the data output (transparent write).
69
1-69 READ_FIRST or Read-Before-Write Mode n In READ_FIRST mode, data previously stored at the write address appears on the output latches, while the input data is being stored in memory (read before write).
70
1-70 NO_CHANGE Mode n In NO_CHANGE mode, the output latches remain unchanged during a write operation.
71
1-71 Conflict Avoidance n Block RAM memory is a true dual-port RAM where both ports can access any memory location at any time. n When accessing the same memory location from both ports, the user must, however, observe certain restrictions. n There are no timing restrictions when both ports perform a read operation. n When one port performs a write operation, the other port must not read- or write access the exact same memory location. n Block RAM memory is a true dual-port RAM where both ports can access any memory location at any time. n When accessing the same memory location from both ports, the user must, however, observe certain restrictions. n There are no timing restrictions when both ports perform a read operation. n When one port performs a write operation, the other port must not read- or write access the exact same memory location.
72
1-72 Spartan-3 Block RAM Amounts
73
1-73 Spartan-3 FPGA Family Members
74
1-74 Virtex-II 1.5V Architecture
75
1-75 Virtex-II 1.5V DeviceCLB Array SlicesMaximum I/O BlockRAM (18kb) Multiplier Blocks Distributed RAM bits XC2V408x825688448,192 XC2V8016x85121208816,384 XC2V25024x161,53620024 49,152 XC2V50032x243,07226432 98,304 XC2V100040x325,12043240 163,840 XC2V150048x407,68052848 245,760 XC2V200056x4810,75262456 344,064 XC2V300064x5614,33672096 458,752 XC2V400080x7223,040912120 737,280 XC2V600096x8833,7921,104144 1,081,344 XC2V8000112x10446,5921,108168 1,490,944
76
1-76 Using Core Generator
77
1-77 Single Port BRAM
78
1-78 Single Port BRAM
79
1-79 Single Port BRAM
80
1-80 Single Port BRAM
81
1-81 Dual Port BRAM
82
1-82 Dual Port BRAM
83
1-83 Dual Port BRAM
84
1-84 Distributed RAM
85
1-85 Distributed RAM
86
1-86 Distributed RAM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.