Instructor: Dr. Phillip Jones

Slides:



Advertisements
Similar presentations
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
Advertisements

EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Programmable logic and FPGA
EE 261 – Introduction to Logic Circuits Module #8 Page 1 EE 261 – Introduction to Logic Circuits Module #8 – Programmable Logic & Memory Topics A.Programmable.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Introduction to FPGA AVI SINGH. Prerequisites Digital Circuit Design - Logic Gates, FlipFlops, Counters, Mux-Demux Familiarity with a procedural programming.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
1 - ECpE 583 (Reconfigurable Computing): Course overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 1: Wed 8/24/2011 (Course.
1 - CPRE 583 (Reconfigurable Computing): FPGA Features and Convey Computer HC-1 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
1 - CPRE 583 (Reconfigurable Computing): VHDL to FPGA: A Tool Flow Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: 9/7/2011.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing HW, VHDL 2 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 2:
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
1 - ECpE 583 (Reconfigurable Computing): CoreGen Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 18: Wed 10/26/2011 (CoreGen.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
1 - ECpE 583 (Reconfigurable Computing): Midterm Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 17: Wed 10/21/2011 (Midterm.
Reconfigurable Computing - Performance Issues John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
Introduction to the FPGA and Labs
This chapter in the book includes: Objectives Study Guide
ETE Digital Electronics
Sequential Programmable Devices
Sequential Logic Design
ENGR xD52 Eric VanWyk Fall 2013
Reconfigurable Architectures
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
This chapter in the book includes: Objectives Study Guide
Design for Embedded Image Processing on FPGAs
Introduction to Programmable Logic
ECE 4110–5110 Digital System Design
ESE532: System-on-a-Chip Architecture
CprE / ComS 583 Reconfigurable Computing
Electronics for Physicists
This chapter in the book includes: Objectives Study Guide
CPRE 583 Reconfigurable Computing
Instructor: Dr. Phillip Jones
ELEN 468 Advanced Logic Design
Instructor: Alexander Stoytchev
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
Field Programmable Gate Array
Field Programmable Gate Array
Field Programmable Gate Array
We will be studying the architecture of XC3000.
RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS
Lecture 41: Introduction to Reconfigurable Computing
CPRE 583 Reconfigurable Computing
Instructor: Dr. Phillip Jones
CPRE 583 Reconfigurable Computing
Programmable Configurations
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Advanced Digital Systems Design Methodology
Electronics for Physicists
(Lecture by Hasan Hassan)
Instructor: Dr. Phillip Jones
Programmable logic and FPGA
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Instructor: Dr. Phillip Jones CPRE 583 Reconfigurable Computing Lecture 3: Wed 8/31/2011 (Reconfigurable Computing Hardware) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ece.iastate.edu/cpre583/

Questions From Last Lecture?

Questions From Last Lecture?

Announcements/Reminders HW1 due Friday of next week Try to have it completed by this Friday since MP1 will be released on Friday Start thinking about topics you may want to do your mini-literature survey on (HW 2). Guest Lecturer on this Friday (I will be out of town, but should have email access)

Overview Logic Interconnect/Routing Optimized resources Adders, Multipliers Memory System-on-chip building blocks Example Commercial FPGA structure

What you should learn Basic understanding of the major components that make up an FPGA device.

Basic FPGA Architectural Components FPGA: Field Programmable Gate Array Sea of general purpose logic gates CLB Configurable Logic Block (CLB)

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D X000 X001 X010 X101 X110 X111 ABCD Z 1 0000 0001 1110 1111 ABCD Z 0000 0001 1110 1111 ABCD Z 1 0000 0001 1110 1111 ABCD Z 1 B 2:1 Mux C D Z 1 AND Z A B C D OR Z A B C D

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many 4-LUTs needed to OR 32-bits Draw 32 1

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many 4-LUTs needed to OR 32-bits Draw 4 LUT 4 LUT 4 LUT 32 4 LUT 4 LUT 4 LUT 1 4 LUT 4 LUT 4 LUT 4 LUT 4 LUT

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many 4-LUTs needed to AND 2-bits with the 32-bit OR Draw 4 LUT 4 LUT 4 LUT 32 4 LUT 4 LUT 4 LUT 1 4 LUT 4 LUT 4 LUT 4 LUT 4 LUT

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many 4-LUTs needed to AND 2-bits with the 32-bit OR Draw 4 LUT 4 LUT 4 LUT 32 4 LUT 4 LUT 4 LUT 1 4 LUT 4 LUT 4 LUT 4 LUT 4 LUT

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B Write out the Truth table C D ABCD Z How many 4-LUTs needed to AND 2-bits with the 32-bit OR Draw 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 4 LUT 4 LUT 4 LUT 32 4 LUT 4 LUT 4 LUT 1 4 LUT 4 LUT 4 LUT 4 LUT 4 LUT

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B Write out the Truth table C D ABCD Z How many 4-LUTs needed to AND 2-bits with the 32-bit OR Draw 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 4 LUT 4 LUT 4 LUT 32 4 LUT 4 LUT 4 LUT 1 4 LUT 4 LUT 4 LUT 4 LUT 4 LUT

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B Write out the Truth table C D ABCD Z How many 4-LUTs needed to AND 2-bits with the 32-bit OR Draw 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 1 4 LUT 4 LUT 4 LUT 32 4 LUT 4 LUT 4 LUT 1 4 LUT 4 LUT 4 LUT 4 LUT 4 LUT

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How could one build a 4-LUT? 4 ABCD 1x16 Memory 1 16:1 Mux Z

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many different 4 input functions can a 4-LUT implement? 1 1x16 Memory 16:1 Mux 4 ABCD Z 216 = 65536

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many different N input functions can a N-LUT implement? 1 1x16 Memory 16:1 Mux 4 ABCD Z

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many different N input functions can a N-LUT implement? 1 1x16 Memory 16:1 Mux N ABCD Z

Computational Fabric - LUT LUT = Look up Table Z A 4-LUT B C D How many different N input functions can a N-LUT implement? 1 1x2N Memory 16:1 Mux N ABCD Z = 22N N = 4 216 =224=65536

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Microprocessor 1024-bits

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT op 4 A 3 3 10-LUT Microprocessor B 3 1024-bits

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT op 4 A 3 3 10-LUT Microprocessor B 3 op 4 1024-bits A 3 3 B 3 op 4 3 A 3 B 3

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT op 4 A 3 10-LUT Microprocessor 3 B 3 1024-bits op A 3 3 B 3 op 4 A 3 3 B 3

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 4 3 A B op 10-LUT Microprocessor 1024-bits 4 3 A B op 4 op A 3 3 B 3

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants 1024-bits

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants 1024-bits (A and “1100”) or (B or “1000”)

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT A 10-LUT B Bit logic and constants 1024-bits (A and “1100”) or (B or “1000”)

Granularity of Computation Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT AND 4 A 10-LUT 1 Bit logic and constants 1024-bits OR Area that was required using 2-LUTS (A and “1100”) or (B or “1000”) OR 4 B It’s much worse, each 10-LUT only has one output

Computational Fabric - DFF Z A 4-LUT B C D LUTs are fine for implementing any arbitrary combinational logic (output is ONLY a function of its inputs) function. But what about sequential logic (output is a function of input AND previous state information)? Need Memory!!

Computational Fabric - DFF Z(t) A 4-LUT B Z(t+1) C DFF D DFF = D Flip Flop 1/0 0/0 1 11 110 1101 1/1 Start Input/output Detect the pattern “1101”

Computational Fabric - DFF Z(t) A 4-LUT B Z(t+1) C DFF D DFF = D Flip Flop Increase circuit performance (pipelining) 4 LUT delays per output A 4-LUT 4-LUT 4-LUT 4-LUT B C DFF DFF DFF DFF D 4-LUT B C D A DFF 1 DFF delay per output

Communication: Interconnect & Routing Need a mechanism to move results of computation around. CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB

Communication: Interconnect & Routing Need a mechanism to move results of computation around. Nearest Neighbor: CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB

Communication: Interconnect & Routing Need a mechanism to move results of computation around. Nearest Neighbor: CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Segmented: CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB

Communication: Interconnect & Routing Need a mechanism to move results of computation around. Nearest Neighbor: CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Segmented: CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Hierarchical: CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB

Optimized Resources: Dedicated Logic LUTs + DFFs can implement any arbitrary digital logic. But not optimally (ASICs give make much better use of silicon area for Power, Speed, routing resources) Arithmetic Add, Multiply On chip memory System on chip building blocks Processor, PCI-express, Gigabit Ethernet, ADC, etc.

Optimized Resources: Dedicated Logic Fast Addition generate propagate logic Carry in Carry out 6-LUT A3 B3 A2 B2 A1 B1 Sum 3 Sum 2 Sum 1 Two output LUT Carry Look Ahead c4 G1 P1 Sum 1 CLB P2 Carry 2 G2 A1 B1 Carry1 A2 B2 Dedicated routing resources

Optimized Resources: Dedicated Logic Embedded Memory 96 bits, 300 MHz 8 12

Optimized Resources: Dedicated Logic Embedded Memory 18 Kbits, 550 MHz 8 Dedicated memory block 12

Optimized Resources: Dedicated Logic Multiplication 18x18 multiply Type # LUTs Latency Speed LUT ~400 5 clks 380 MHz Dedicated 18x18 Multiplier 3 clks 450 MHz Virtex-5 (6-LUTs) Very rough estimate of Silicon area comparison (assuming SX95 andLX110 have about the same die size) 6-LUT 6-LUT 18x18 Multiplier In other word you can replace one LUT based 18x18 multiplier With 100 dedicated 18x18 Multipliers!!! 6-LUT 6-LUT

Optimized Resources: Dedicated Logic Processor PowerPC hard-core MicroBlaze soft-core 500 MHz Super scalor Highspeed 2x5 switch fabric 250 MHz Simple scalar

Optimized Resources: Dedicated Logic System on Chip Dedicated Logic Reconfigurable Logic RAM ADC Sensor Matrix Multiplier Coprocessor Sensor Motor Data Buffer PID Controller Ethernet MAC Also see Actel Fusion:http://www.actel.com/products/fusion/default.aspx

Xilinx CLB Architecture Virtex 5 FPGA User Guide

Questions/Comments/Concerns

Computational Fabric - LUT N-Lut, 3,4…6,…8-LUT AND, XOR, NOT Exercises How many 4-LUTs to OR 32 bits (draw) How many 4-LUTs to AND 2 bits with the OR of these 32 bits (draw) Draw the truth table for the 4-LUT that gives the final output How could one implement a LUT (Memory + MUX) How many ways can a 4-LUT be programmed How many ways can a N-LUT be programmed Granularity trade-off: Functionality vs. propagation delay (2-LUT -> CPU), bit-level vs. datapath

Computational Fabric - DFF Enable building circuits that can store information (sequential circuits, state machines) Enables pipelining to increase operating frequency/ throughput

Communication: Interconnect & Routing Need a mechanism to move the results of a LUT to other LUTs. Island stale (Array of CB) Nearest neighbor (paper on reconfigure arch that uses this) Not scalable (large delays, and uses logic elements for routing?) Segmented (different length for latency trade-off) Multi hop scales < O(N)? Avoid using logic Hierarchical (good for apps with lots of local communication and little remote communication) Typical an FPGA silicon area will be 10% logic and 90% interconnect!!

Optimized Resources: Hard Cores LUTs + DFFs can implement any arbitrary digital logic. But not optimally (ASICs give make much better use of silicon area for Power, Speed, routing resources) Arithmetic Add, Mult On chip memory System on chip building blocks Processor, PCI-express, Gigbit Ethernet, A/D