Advance Digital Design Hassan Bhatti, Lecture 10.

Slides:



Advertisements
Similar presentations
ECE 506 Reconfigurable Computing ece. arizona
Advertisements

Lecture 15 Finite State Machine Implementation
Spartan-3 FPGA HDL Coding Techniques
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Xilinx CPLDs and FPGAs Module F2-1. CPLDs and FPGAs XC9500 CPLD XC4000 FPGA Spartan FPGA Spartan II FPGA Virtex FPGA.
This material exempt per Department of Commerce license exception TSU Xilinx Product Intro and Basic FPGA Architecture.
Xilinx FPGAs:Evolution and Revolution. Evolution results in bigger, faster, cheaper FPGAs; better software with fewer bugs, faster compile times; coupled.
Basic FPGA Architecture © 2005 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-II Architecture Virtex™-II architecture’s core voltage.
Introduction To VIRTEX II Architecture Presented By: Ankur Agarwal.
ADC Board VHDL Firmware development for Mona Lisa
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Virtex-II Architecture. Virtex-II/Spartan-III 2 Outline CLB Resources Memory and Multipliers I/O Resources Clock Resources.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Configurable System-on-Chip: Xilinx EDK
Evolution of implementation technologies
Programmable logic and FPGA
ECE 331 – Digital System Design Tristate Buffers, Read-Only Memories and Programmable Logic Devices (Lecture #16) The slides included herein were taken.
CMPUT Computer Organization and Architecture II1 CMPUT329 - Fall 2003 Topic: Internal Organization of an FPGA José Nelson Amaral.
Implementing Digital Circuits Lecture L3.1. Implementing Digital Circuits Transistors and Integrated Circuits Transistor-Transistor Logic (TTL) Programmable.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
Unit 9 Multiplexers, Decoders, and Programmable Logic Devices
Section II Basic PLD Architecture. Section II Agenda  Basic PLD Architecture —XC9500 and XC4000 Hardware Architectures —Foundation and Alliance Series.
Spartan-II Memory Controller For QDR SRAMs Lobby Pitch February 2000 ®
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
CPLD (Complex Programmable Logic Device)
J. Christiansen, CERN - EP/MIC
The Xilinx Spartan 3 FPGA EGRE 631 2/2/09. Basic types of FPGA’s One time programmable Reprogrammable (non-volatile) –Retains program when powered down.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
® SPARTAN Series High Volume System Solution. ® Spartan/XL Estimated design size (system gates) 30K 5K180K XC4000XL/A XC4000XV Virtex S05/XL.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Programmable Logic Devices
Architecture and Features
® Spartan-II High Volume Solutions Overview. ® High Performance System Features Software and Cores Smallest Die Size Lowest Possible Cost.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Basic Sequential Components CT101 – Computing Systems Organization.
ECE 448 Lecture 6 FPGA devices
BR 1/991 Issues in FPGA Technologies Complexity of Logic Element –How many inputs/outputs for the logic element? –Does the basic logic element contain.
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Basic FPGA Architecture FPGA Design Flow Workshop.
® /1 The E is the Edge. ® /2 Density Leadership Virtex XCV1000 Density (system gates) 10M Gates In 2002 Virtex-E.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Modern FPGA architecture.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL FPGA Devices ECE 448 Lecture 5.
Redefining the FPGA. SSTL3 1x CLK 2x CLK LVTTL LVCMOS GTL+ Virtex as a System Component 2x CLK SDRAM Backplane Logic Translators Custom Logic Clock Mgmt.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Basic FPGA Architecture
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
FPGA 상명대학교 소프트웨어학부 2007년 1학기.
Issues in FPGA Technologies
Sequential Logic Design
Register Files and Memories
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Spartan FPGAs مرتضي صاحب الزماني.
Basic FPGA Architecture
Introduction.
We will be studying the architecture of XC3000.
The Xilinx Virtex Series FPGA
XC4000E Series Xilinx XC4000 Series Architecture 8/98
Reconfigurable FPGAs (The Xilinx Virtex II Pro / ProX FPGA family)
The Xilinx Virtex Series FPGA
Basic FPGA Architecture
Introduction.
FPGA’s 9/22/08.
Presentation transcript:

Advance Digital Design Hassan Bhatti, Lecture 10

Field-Programmable Gate Arrays (FPGAs)  Ease of reprogramming enable rapid prototyping  Replacement of ASICs in low-volume end of the market  Register rich tiled architecture of Functional units and a flexible channel based interconnections

Overview Continued  ASIC Research center has xess boards with Xilinx chips on them.  Every Xilinx chip required Xilinx tool to be compiled

FPGA Big Idea  Basic idea: 2D array of combination logic blocks (CL) and flip-flops (FF) with a means for the user to configure both: 1. the interconnection between the logic blocks, 2. the function of each block.

Idealized FPGA Logic Block  4-input Look Up Table (4-LUT) 1.implements combinational logic functions  Register 1.optionally stores output of LUT 2.Latch determines whether read reg or LUT

Xilinx FPGA  Xilinx are pioneers in FPGA, launch first XC4000 FPGA in  Other generations like Spartan/XL etc are based on XC  Each FPGA consist of  Configurable Logic Blocks CLBs,  Routing Resources,  IOB (Input Output Buffers)  SRAM Based controller.

XC 4000

XC 4000 Continued….

Architecture of CLBs  Each CLB has two 4-input Lookup Tables (LUTs) and two registers.  The two LUTs implement two independent logic functions F and G.  The outputs F’ and G’ from the two LUTs inside each CLB can be combined to form a more complex function H.  CLBs are linked together to form carry and cascade chain circuits not shown in diagram).

Architecture of CLBs

Interconnect Resources of XC 4000  There are three types of interconnects 1.Dedicated Inter connects (Direct) : Lines provide routing b/w adjacent vertical and horizontal CLBs in the same row and column. 2.Double Length Lines: (Long lines) Transverse the distance of two CLBs before entering a switch matrix skipping every other CLBs. 3.Long Lines Span (Global): The entire array vertically and horizontally. They have splitters that segment the lines.

XC 4000 Interconnect ….

Inside Interconnects

Architecture Of PIP  Break Point PIP Connect or isolates two wire segments  Cross point PIP Turn Corners  Multiplex PIP Directional and buffered Select one of n input to output

XC 4000 IOB

Example  Implement the following functions on a single  CLB of the XC4000 FPGA:  X = A’B’ (C + D)  Y = AK + BK + C’D’K + AEJL  Use look up table F to implement X  Use look up table G for AEJL  Use F, G and H for Y:  Y = K(A+B + C’D’) + AEJL  = KX’ + AEJL= KF’+G

Illustrated

Spartan 2  ASIC Center got Xess-100 which has spartan-2 board.  The architecture is based on XC-4000.

Inside the Board

Spartan-3E Architecture Fundamental Elements Configurable Logic Blocks (CLBs) –Consists of RAM based look up table to implement logic and storage elements that can be used as flip-flops or latches. Input Output Blocks (IOBs) –Controls the flow of data between IO pins and internal logic. Supports many different signal standards. (Tri-state, bidirectional, LVTTL, etc. Block RAM (BRAM) 18 bit Multiplier Blocks Digital Clock Manager (DCM)

Spartan 3 Configurable Logic Blocks (CLB’s) CLBs contain Ram based lookup tables to implement logic and storage elements that can be used as flip-flops or latches. CLBs can be programmed to perform a wide variety of logic functions as well as store data.

Spartan 3E IO Blocks (IOB’s) IOB’s control flow of data between IO pins and the internal logic. Each IOB supports bidirectional data flow, 3-state operation, and numerous different signal standards. (We will typically use LVTTL). See data sheet.

Very low cost, high-performance logic solution for high-volume, consumer-oriented applications Multi-voltage, multi-standard SelectIO™ interface pins - Up to 376 I/O pins or 156 differential signal pairs - LVCMOS, LVTTL, HSTL, and SSTL single-ended signal standards - 3.3V, 2.5V, 1.8V, 1.5V, and 1.2V signaling

I/O block continued

CLB’s – four slices per CLB

Top slice of CLB

Virtex Basic Architecture I/O Blocks (IOBs) Configurable Logic Blocks (CLBs) Configurable Logic Blocks (CLBs) Clock Management (DCMs, BUFGMUXes) Block SelectRAM™ resource Block SelectRAM™ resource Dedicated multipliers Programmable interconnect

Slices and CLBs Each Virtex  -II CLB contains four slices –Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs –A switch matrix provides access to general routing resources CIN Switch Matrix BUFT Slice S0 Slice S1 Local Routing Slice S2 Slice S3 CIN SHIFT

Slice Structure The next few slides discuss the slice features –LUTs –MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) –Carry Logic –MULT_ANDs –Sequential Elements

Combinatorial Logic A B C D Z Look-Up Tables Combinatorial logic is stored in Look-Up Tables (LUTs) –Also called Function Generators (FGs) –Capacity is limited by the number of inputs, not by the complexity Delay through the LUT is constant ABCDZ

Connecting Look-Up Tables F5 F8 F5 F6 CLB Slice S3 Slice S2 Slice S0 Slice S1 F5 F7 F5 F6 MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice

Fast Carry Logic Simple, fast, and complete arithmetic Logic –Dedicated XOR gate for single-level sum completion –Uses dedicated routing resources –All synthesis tools can infer carry logic

CO DICI S LUT CY_MUX CY_XOR MULT_AND A B A x B LUT MULT_AND Gate Highly efficient multiply and add implementation –Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition –The MULT_AND gate enables an area reduction by performing the multiply and the add in one LUT per bit

D CE PRE CLR Q FDCPE D CE S R Q FDRSE D CE PRE CLR Q LDCPE G _1 Flexible Sequential Elements Either flip-flops or latches Two in each slice; eight in each CLB Inputs come from LUTs or from an independent CLB input Separate set and reset controls –Can be synchronous or asynchronous All controls are shared within a slice –Control signals can be inverted locally within a slice

Shift Register LUT (SRL16CE) Dynamically addressable serial shift registers –Maximum delay of 16 clock cycles per LUT (128 per CLB) –Cascadable to other LUTs or CLBs for longer shift registers Dedicated connection from Q15 to D input of the next SRL16CE –Shift register length can be changed asynchronously by toggling address A LUT DQ CE DQ DQ DQ LUT D CE CLK A[3:0] Q Q15 (cascade out)

IOB Element Input path –Two DDR registers Output path –Two DDR registers –Two 3-state enable DDR registers Separate clocks and clock enables for I and O Set and reset signals are shared Reg DDR MUX 3-state OCK1 OCK2 Reg DDR MUX Output OCK1 OCK2 PAD Reg Input ICK1 ICK2 IOB

SelectIO Standard Allows direct connections to external signals of varied voltages and thresholds –Optimizes the speed/noise tradeoff –Saves having to place interface components onto your board Differential signaling standards –LVDS, BLVDS, ULVDS –LDT –LVPECL Single-ended I/O standards –LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V) –PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz) –GTL, GTLP –and more!

Digital Controlled Impedance (DCI) DCI provides –Output drivers that match the impedance of the traces –On-chip termination for receivers and transmitters DCI advantages –Improves signal integrity by eliminating stub reflections –Reduces board routing complexity and component count by eliminating external resistors –Eliminates the effects of temperature, voltage, and process variations by using an internal feedback circuit

Other Virtex-II Features Distributed RAM and block RAM –Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits) –Block RAM is a dedicated resources on the device (18-kb blocks) Dedicated 18 x 18 multipliers next to block RAMs Clock management resources –Sixteen dedicated global clock multiplexers –Digital Clock Managers (DCMs)

Distributed SelectRAM Resources Uses a LUT in a slice as memory Synchronous write Asynchronous read –Accompanying flip-flops can be used to create synchronous read RAM and ROM are initialized during configuration –Data can be written to RAM after configuration Emulated dual-port RAM –One read/write port –One read-only port RAM16X1S O D WE WCLK A0 A1 A2 A3 LUT RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0DPO DPRA1 DPRA2 DPRA3 Slice LUT

Block SelectRAM Resources Up to 3.5 Mb of RAM in 18-kb blocks –Synchronous read and write True dual-port memory –Each port has synchronous read and write capability –Different clocks for each port Supports initial values Synchronous reset on output latches Supports parity bits –One parity bit per eight data bits DIA DIPA ADDRA WEA ENA SSRA CLKA DIB DIPB WEB ADDRB ENB SSRB DOA CLKB DOPA DOPB DOB 18-kb block SelectRAM memory

Dedicated Multiplier Blocks 18-bit twos complement signed operation Optimized to implement Multiply and Accumulate functions Multipliers are physically located next to block SelectRAM™ memory 18 x 18 Multiplier 18 x 18 Multiplier Output (36 bits) Data_A (18 bits) Data_B (18 bits) 4 x 4 signed 8 x 8 signed 12 x 12 signed 18 x 18 signed

Global Clock Routing Resources Sixteen dedicated global clock multiplexers –Eight on the top-center of the die, eight on the bottom-center –Driven by a clock input pad, a DCM, or local routing Global clock multiplexers provide the following: –Traditional clock buffer (BUFG) function –Global clock enable capability (BUFGCE) –Glitch-free switching between clock signals (BUFGMUX) Up to eight clock nets can be used in each clock region of the device –Each device contains four or more clock regions

Digital Clock Manager (DCM) Up to twelve DCMs per device –Located on the top and bottom edges of the die –Driven by clock input pads DCMs provide the following: –Delay-Locked Loop (DLL) –Digital Frequency Synthesizer (DFS) –Digital Phase Shifter (DPS) Up to four outputs of each DCM can drive onto global clock buffers –All DCM outputs can drive general routing

Spartan-3 versus Virtex-II Lower cost Smaller process = lower core voltage –.09 micron versus.15 micron –Vccint = 1.2V versus 1.5V Different I/O standard support –New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL –Default is LVCMOS, versus LVTTL More I/O pins per package Only one-half of the slices support RAM or SRL16s (SLICEM) Fewer block RAMs and multiplier blocks –Same size and functionality Eight global clock multiplexers Two or four DCM blocks No internal 3-state buffers –3-state buffers are in the I/O

SLICEM and SLICEL Each Spartan™-3 CLB contains four slices –Similar to the Virtex™-II Slices are grouped in pairs –Left-hand SLICEM (Memory) LUTs can be configured as memory or SRL16 –Right-hand SLICEL (Logic) LUT can be used as logic only CIN Switch Matrix COUT Slice X0Y0 Slice X0Y1 Fast Connects Slice X1Y0 Slice X1Y1 CIN SHIFTIN Left-Hand SLICEM Right-Hand SLICEL SHIFTOUT

Spartan-3E Features More gates per I/O than Spartan-3 Removed some I/O standards –Higher-drive LVCMOS –GTL, GTLP –SSTL2_II –HSTL_II_18, HSTL_I, HSTL_III –LVDS_EXT, ULVDS DDR Cascade –Internal data is presented on a single clock edge 16 BUFGMUXes on left and right sides –Drive half the chip only –In addition to eight global clocks Pipelined multipliers Additional configuration modes –SPI, BPI –Multi-Boot mode

Virtex-II Pro Features 0.13 micron process Up to 24 RocketIO™ Multi-Gigabit Transceiver (MGT) blocks –Serializer and deserializer (SERDES) –Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant transceivers, and others –8-, 16-, and 32-bit selectable FPGA interface –8B/10B encoder and decoder PowerPC™ RISC processor blocks –Thirty-two 32-bit General Purpose Registers (GPRs) –Low power consumption: 0.9mW/MHz –IBM CoreConnect bus architecture support