CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.

Slides:



Advertisements
Similar presentations
Simulation of Fracturable LUTs
Advertisements

Digital Design Copyright © 2006 Frank Vahid 1 FPGA Internals: Lookup Tables (LUTs) Basic idea: Memory can implement combinational logic –e.g., 2-address.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Programmable Logic Devices
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
EECE579: Digital Design Flows
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani,
Evolution of implementation technologies
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 33: Array Subsystems (PLAs/FPGAs) Prof. Sherief Reda Division of Engineering,
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
CS294-6 Reconfigurable Computing Day 16 October 20, 1998 Retiming Structures.
1. 2 FPGAs Historically, FPGA architectures and companies began around the same time as CPLDs FPGAs are closer to “programmable ASICs” -- large emphasis.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 12: February 21, 2007 Compute 2: Cascades, ALUs, PLAs.
1 DIGITAL DESIGN I DR. M. MAROUF FPGAs AUTHOR J. WAKERLY.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #3 – FPGA.
CSET 4650 Field Programmable Logic Devices
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Circuit design for FPGAs: –Logic elements. –Interconnect.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Power Reduction for FPGA using Multiple Vdd/Vth
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
PROGRAMMABLE LOGIC DEVICES (PLD)
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Some features of V1495 Shiuan-Hal,Shiu Everything in this document is not final decision!
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.
EE121 John Wakerly Lecture #15
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.
Oleg Petelin and Vaughn Betz FPL 2016
Field Programmable Gate Arrays
ESE534: Computer Organization
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
CS184a: Computer Architecture (Structure and Organization)
XILINX FPGAs Xilinx lunched first commercial FPGA XC2000 in 1985
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
Topics Circuit design for FPGAs: Logic elements. Interconnect.
FPGA Logic Synthesis using Quantified Boolean Satisfiability
ESE534: Computer Organization
CS184a: Computer Architecture (Structure and Organization)
CprE / ComS 583 Reconfigurable Computing
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables

Compute “Any” Function What do we use for “compute” function Lookup Table –can be programmed to perform any function of its inputs

Lookup Table Load bits into table –2 N bits to describe –=> 2 2 N different functions Table translation –performs logic transform

Lookup Table

How much logic in a LUT? Upper Upper Bound: –M-LUT implemented w/ 4-LUTs –M-LUT  2 M-4 +(2 M-4 -1)  2 M-3 4-LUTs

How Much? Lower Upper Bound: –2 2 M functions realizable by M-LUT –Need n 4-LUTs to cover (2 2 4 ) n  2 2 M nlog(2 2 4 )  log(2 2 M ) n2 4 log(2)  2 M log(2) n2 4  2 M n  2 M-4

How Much? Both Lower Upper Bound –and Upper Lower Bound –(number of 4-LUTs in M-LUT) –2 M-4  n  2 M-3

Memories and 4-LUTs For the most complex functions an M-LUT has ~2 M-4 4-LUTs SRAM 32Kx8 =0.6  m –170M 2 (21ns latency) –8*2 11 =16K 4-LUTs XC3042 =0.6  m –180M 2 (13ns delay per CLB) –288 4-LUTs Memory is 50+x denser than FPGA

Memory and 4-LUTs For “regular” functions? 15-bit parity –entire 32Kx8 SRAM –5 4-LUTs (2% of XC3042 ~ 3.2M 2 ~1/50th Memory) 7b Add –entire 32Kx8 SRAM –14 4-LUTs (5% of XC3042, 8.8M 2 ~1/20th Memory)

LUT + Interconnect Interconnect allows us to exploit structure in computation Already know –LUT Area << Interconnect Area –Area of an M-LUT on FPGA >> M-LUT Area …but most M-input functions –complexity << 2 M

LUT Count vs. base LUT size

LUT vs. K DES MCNC Benchmark –moderately irregular

Toronto Experiments Want to determine best K for LUTs Bigger LUTs –handle complicated functions efficiently –less interconnect overhead Smaller LUTs –handle regular functions efficiently –interconnect allows exploitation of compute sturcture What’s the typical complexity/structure?

Toronto LUT Size Map to K-LUT –use Chortle Route to determine wiring tracks –global route –different channel width W for each benchmark Area Model for K and W

LUT Area vs. K Routing Area roughly linear in K

Mapped LUT Area Compose Mapped LUTs and Area Model

Mapped Area vs. LUT K N.B. unusual case minimum area at K=3

Toronto Result Minimum LUT Area –at K=4 –robust for different switch sizes (wire widths)

Delay? Circuit Depth in LUTs? “Simple Function” --> M-input AND –1 table lookup in M-LUT –log k (M) in K-LUT

Delay? M-input “Complex” function –1 table lookup for M-LUT –between:  (M-K)/log 2 (k)  +1 –and  (M-K)/log 2 (k- log 2 (k))  +1

Circuit Depth vs. K

LUT Delay vs. K For small LUTs: –t LUT  c 0 +c 1  K Large LUTs: –add length term –c 2  2 K Plus Wire Delay –~  area

Delay vs. K Delay = Depth  (t LUT + t Interconnect )

Observation General interconnect is expensive “Larger” logic blocks –=> less interconnect crossing –=> lower interconnect delay –=> get larger –=> get slower faster than modeled here due to area –=> less area efficient don’t match structure in computation

Different Structure How can we have “larger” compute nodes (less general interconnect) without paying huge area penalty of large LUTs?

Structure in subgraphs Small LUTs capture structure Structure of small LUT-mapped netlists?

Structure LUT sequences ubiquitous

Hardwired Logic Blocks Single Output

Hardwired Logic Blocks Two outputs

Summary Memory most dense programmable structure for the most complex functions Memory inefficient (scales poorly) for structured compute tasks Most tasks have some structure Programmable Interconnect allows us to exploit that structure

Summary Area –LUT count decrease w/ K, but slower than exponential –LUT size increase w/ K exponential LUT function empirically linear routing area –Minimum area around K=4

Summary Delay –LUT depth decreases with K in practice closer to log(K) –Delay increases with K small K linear + large fixed term minimum around 5-6 Better structure match with hardwired LUT cascades