Digital Systems Design Overview ENGIN 341 – Advanced Digital Design University of Massachusetts Boston Department of Engineering Dr. Filip Cuckov
Overview 1.Introduction to Programmable Logic Devices 2.Field Programmable Gate Arrays (FPGAs) 3.Implementing Functions on FPGAs 4.The Zynq System on Chip (SoC) 5.FPGA Design Flow
1. Introduction to Programmable Logic Devices
Programmable Devices Comparison SPLD PLA PAL GAL PROM EPROM EEPROM CPLD FPGA
Implementing Functions Using ROMs In General
Programmable Logic Arrays F 0 = A’B’ + AC’ F 1 = B + AC’ F 2 = A’B’ + BC’ F 3 = AC + B PLA Example Implementation PLA Architecture
GAL 22V10 Output Logic Macrocell Detail
CPLD Example – Xilinx CoolRunner XCR3064XL
2. Field Programmable Gate Arrays Field Programmable Logic Block Arrays (FPLBA ???) Name due to early PLD-type block usage in Altera devices. Modern FPGAs mostly use LUTs First FPGA: Xilinx XC2000 ca FPGA Market Share (2013 data) FPGAs vs CPLDs
FPGA Architectures
FPGA Programmable Logic Block* Models Usually composed from Look-Up-Tables (LUTs) Actel: MUX based LUT-Based Logic Block MUX-Based Logic Block *As in: basic building blocks
Actual FPGA Logic Block Variants Simplified view of sample Xilinx (slice) and Altera (logic element) Logic Blocks
FPGA Programmable Interconnect
FPGA Programmable I/O Blocks
3. Implementing Functions on FPGAs Book Chapter 6 Boole’s Expansion Theorem / Shannon’s Expansion or Decomposition F(a,b,c,d,e) = a’ ∙ F(0,b,c,d,e) + a ∙ F(1,b,c,d,e) = a’ ∙ F 0 + a ∙ F 1 Example: F(a,b,c,d,e) = abc’e + a’b’cd + cde’ + a’b F 0 = 0 ∙ bc’e + 1 ∙ b’cd + cde’ + 1 ∙ b = b’cd + cde’ + b F 1 = 1 ∙ bc’e + 0 ∙ b’cd + cde’ + 0 ∙ b = bc’e + cde’ Claude Elwood Shannon (1916–2001) George Boole ( ) vs
Expansion/Decomposition Example Implement F using 4-input LUTs (LUT4) F 0 = b’cd + cde’ + b | F 1 = bc’e + cde’ | F = a’ ∙ F 0 + a ∙ F 1 bcdeF0F bcdeF1F F 0 (b,c,d,e) F 1 (b,c,d,e) abcdeF Original Function aF1F1 F0F0 *F F (a,F 1,F 0,*) * is a LUT input we don’t care about LUT4 b c d e F0F0 b c d e F1F1 a F
Further Decomposition F(a,b,c,d,e) = a’b’ ∙ F(0,0,c,d,e) + a’b ∙ F(0,1,c,d,e) + ab’ ∙ F(1,0,c,d,e) + ab ∙ F(1,1,c,d,e) = a’b’ ∙ F 00 + a’b ∙ F 01 + ab’ ∙ F 10 + ab ∙ F 11 This would be done in order to implement F using 3-input LUT (LUT3) Same Example: F(a,b,c,d,e) = abc’e + a’b’cd + cde’ + a’b F 00 = 0 ∙ 0 ∙ c’e + 1 ∙ 1 ∙ cd + cde’ + 1 ∙ 0 = cd + cde’ F 01 = 0 ∙ 1 ∙ c’e + 1 ∙ 0 ∙ cd + cde’ + 1 ∙ 1 = 1 F 10 = 1 ∙ 0 ∙ c’e + 0 ∙ 1 ∙ cd + cde’ + 0 ∙ 0 = cde’ F 11 = 1 ∙ 1 ∙ c’e + 0 ∙ 0 ∙ cd + cde’ + 0 ∙ 1 = c’e + cde’
F 00 = cd + cde’ | F 01 = 1 | F 10 = cde’ | F 11 = c’e + cde’ F = a’b’ ∙ F 00 + a’b ∙ F 01 + ab’ ∙ F 10 + ab ∙ F 11 = ( F 1 + F 2 + F 3 ) + F 4 = F F 4 cdeF F 00 (c,d,e) abcdeF Original Function cdeF F 01 (c,d,e) cdeF F 10 (c,d,e) cdeF F 11 (c,d,e) abF 00 F1F F 1 (a,b,F 00 ) abF 01 F2F F 2 (a,b,F 01 ) abF 10 F3F F 3 (a,b,F 10 ) abF 11 F4F F 4 (a,b,F 11 ) F1F1 F2F2 F3F3 F F 123 (F 1,F 2,F 3 ) F 123 F4F4 * F(F 123,F 4,*) Further Decomposition Example
Diagram and Alternative MUX Implementation LUT3 c d e F 01 LUT3 c d e F 00 LUT3 c d e F 11 LUT3 c d e F 10 LUT3 a b F1F1 a b F2F2 a b F3F3 a b F4F4 F 123 LUT3 F c d e F 01 LUT3 c d e F 00 LUT3 c d e F 11 LUT3 c d e F 10 b b F a
4. The Zynq System on Chip (SoC) Xilinx Zynq System on Chip (SoC) Overview Zynq Overview – 16m 25s (Optional – Answers why was Zynq created) Zynq Overview Zynq Architecture – 10m 49s (Brief overview of the PS and PL components) Zynq Architecture Zynq Processing System (PS) – 7m 25s (Optional – Brief overview of Zynq PS) Zynq Processing System Zynq Programmable Logic (PL) – 9m 23s (Brief overview of Zynq PL) Zynq Programmable Logic Zynq PL Architecture Xilinx 7 Series FPGA Overview – 28m 21s (Optional - General overview, watch at 1.5x speed) Xilinx 7 Series FPGA Overview CLB Architecture – 21m 52s (A must see, IMDB rating of 9.1) CLB Architecture Additional Xilinx videos and training:
Digilent Zybo FeatureDescription FPGA Zynq-7000 AP SoC XC7Z010-1CLG400 I/O Interfaces USB-UART for programming, serial comm., and power One 10/100/1G Ethernet USB OTG 2.0 USB-UART bridge 16-bit VGA output Dual role (Input/Output) HDMI I2S CODEC Audio Line-In, Line-Out, microphone Memory 512 Mbyte DDR3 128 Mbit Quad-SPI Flash MicroSD card connector Switches and LEDs 4 Slide switches accessible from PL 4 LEDs accessible from PL 1 LED accessible from PS 4 Push-buttons accessible from PL 2 Push-buttons accessible from PS 1 Reset button accessible from PL 1 Reset button accessible from PS Clocks one MHz Oscillator for PS Expansion ports One processor-dedicated Pmod connector One dual (analog/digital) Pmod conenctor 4 Pmod connectors
Zynq-7000 AP SoC XC7Z010-1CLG400 Processing System (PS) 650Mhz dual-core Cortex-A9 processor DDR3 memory controller with 8 DMA channels High-bandwidth peripheral controllers: 1G Ethernet, USB 2.0, SDIO Low-bandwidth peripheral controller: SPI, UART, CAN, I2C Programmable Logic (PL) Equivalent to Artix-7 (7 Series) FPGA: 4,400 logic slices, each with four 6-input LUTs and 8 flip-flops 240 KB of fast block RAM Two clock management tiles, each with a PLL and a MMCM 80 DSP slices Internal clock speeds exceeding 450MHz On-chip analog-to-digital converter (XADC)
Xilinx 7 Series FPGA Configurable Logic Block Each CLB is composed of of 2 slices Slices can be of type SLICEL SLICEM Slice features: Real 6-input look-up table (LUT) technology Dual LUT5 (5-input LUT) option Distributed Memory and Shift Register Logic capability Dedicated high-speed carry logic for arithmetic functions Wide multiplexers for efficient utilization Xilinx 7 Series FPGA CLB Source: Xilinx 7 Series FPGA CLB User GuideXilinx 7 Series FPGA CLB User Guide
The four (A, B, C, and D) 6-input (A1-A6) LUTs provide 2 independent outputs (O5 and O6) The LUTs can implement: Any arbitrarily defined six-input Boolean function OR Two arbitrarily defined five-input Boolean functions, as long as these two functions share common inputs Slices also contain three multiplexers used to combine up to four LUTs to implement any function of 7or 8 inputs. F7AMUX: Used to generate 7-input functions from LUTs A and B F7BMUX: Used to generate 7-input functions from LUTs C and D F8MUX: Used to combine all LUTs to generate 8- input functions. Xilinx 7 Series FPGA Slice Detail SLICEL element shown. See source for SLICEM details.
Carry Chain(s) Dedicated fast lookahead carry logic can perform fast arithmetic addition and subtraction. Cascadable to form wider add/subtract logic. Run upward and have a height of four bits per slice. For each bit, there is a carry MUX and a XOR gate for adding/subtracting the operands with selected carry bits. The dedicated carry path and carry MUX can also be used to cascade function generators for implementing wide logic functions. Memory Elements Flip-Flops 8 in total 4 are only DFF and 4 can be configured as DFF or Latch Distributed RAM LUTs can be configured as RAM (SLICEM only) Dual-Port 32 x 1bit RAM, up to Single-Port 256 x 1bit RAM Xilinx 7 Series FPGA Slice Detail
Other PL Elements Clock Management Mixed-Mode Clock Manager Phase Locked Loop (PLL) DSP Slices 25x18 bit signed multiplier 48 bit adder/accumulator 25 bit pre-adder Block RAM Dual-port 36KB blocks Programmable FIFO logic Zynq Layout Snapshot using Vivado PS PL
5. FPGA Design Flow Computer Aided Design (CAD) Tools Xilinx Vivado Aldec Active-HDL (Free Student Edition)Free Student Edition Functional Simulation Synthesis Post Synthesis Simulation Mapping, Placement, Routing Implementation