CS184a: Computer Architecture (Structure and Organization)

Slides:



Advertisements
Similar presentations
CS370 – Spring 2003 Programmable Logic Devices PALs/PLAs.
Advertisements

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Programmable Logic Devices
©2004 Brooks/Cole FIGURES FOR CHAPTER 9 MULTIPLEXERS, DECODERS, AND PROGRAMMABLE LOGIC DEVICES Click the mouse to move to the next page. Use the ESC key.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Chapter # 4: Programmable and Steering Logic Section 4.1
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
Multiplexers, Decoders, and Programmable Logic Devices
Chapter 6 Memory and Programmable Logic Devices
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 12: February 21, 2007 Compute 2: Cascades, ALUs, PLAs.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 4: October 5, 2005 Two-Level Logic-Synthesis.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
CEC 220 Digital Circuit Design Programmable Logic Devices
Logic and Computer Design Fundamentals, Fifth Edition Mano | Kime | Martin Copyright ©2016, 2008, 2004 by Pearson Education, Inc. All rights reserved.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.
Programmable Logic Devices
ESE532: System-on-a-Chip Architecture
This chapter in the book includes: Objectives Study Guide
ETE Digital Electronics
ESE534: Computer Organization
Chapter # 4: Programmable Logic
Transistors and Logic Circuits
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
This chapter in the book includes: Objectives Study Guide
Presentation on FPGA Technology of
XILINX FPGAs Xilinx lunched first commercial FPGA XC2000 in 1985
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
Instructor: Dr. Phillip Jones
Lecture 9 Logistics Last lecture Today HW3 due Wednesday
CS137: Electronic Design Automation
This chapter in the book includes: Objectives Study Guide
FIGURE 7.1 Conventional and array logic diagrams for OR gate
ESE534: Computer Organization
CSCE 211: Digital Logic Design
CSCE 211: Digital Logic Design
CSE 370 – Winter 2002 – Comb. Logic building blocks - 1
Programmable Logic.
Lecture 8 Logistics Last lecture Last last lecture Today
Programmable Logic Devices
Programmable Configurations
Lecture 10 Logistics Last lecture Today
Lecture 11 Logistics Last lecture Today HW3 due now
CSCE 211: Digital Logic Design
Part I Background and Motivation
CSCE 211: Digital Logic Design
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
CSE 370 – Winter 2002 – Comb. Logic building blocks - 1
Programmable Logic Arrays, Test Review
ESE534: Computer Organization
"Computer Design" by Sunggu Lee
Overview Last lecture Timing; Hazards/glitches
ESE535: Electronic Design Automation
FIGURE 5-1 MOS Transistor, Symbols, and Switch Models
Arithmetic Building Blocks
CS184a: Computer Architecture (Structure and Organization)
CprE / ComS 583 Reconfigurable Computing
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

CS184a: Computer Architecture (Structure and Organization) Day 12: February 2, 2005 Compute 2: Cascades, ALUs, PLAs Caltech CS184 Winter2003 -- DeHon

Last Time LUTs area structure big LUTs vs. small LUTs with interconnect design space optimization Caltech CS184 Winter2003 -- DeHon

Today Cascades ALUs PLAs Caltech CS184 Winter2003 -- DeHon

Last Time Larger LUTs General: Larger compute blocks Large LUTs Less interconnect delay General: Larger compute blocks Minimize interconnect crossings Large LUTs Not efficient for typical logic structure Caltech CS184 Winter2003 -- DeHon

Different Structure How can we have “larger” compute nodes (less general interconnect) without paying huge area penalty of large LUTs? Caltech CS184 Winter2003 -- DeHon

Structure in subgraphs Small LUTs capture structure What structure does a small-LUT-mapped netlist have? Caltech CS184 Winter2003 -- DeHon

Structure LUT sequences ubiquitous Caltech CS184 Winter2003 -- DeHon

Hardwired Logic Blocks Single Output Caltech CS184 Winter2003 -- DeHon

Hardwired Logic Blocks Two outputs Caltech CS184 Winter2003 -- DeHon

Delay Model Tcascade =T(3LUT) + T(mux) Don’t pay General interconnect Full 4-LUT delay Caltech CS184 Winter2003 -- DeHon

Options Caltech CS184 Winter2003 -- DeHon

Chung & Rose Study [Chung & Rose, DAC ’92] Caltech CS184 Winter2003 -- DeHon

Cascade LUT Mappings [Chung & Rose, DAC ’92] Caltech CS184 Winter2003 -- DeHon

ALU vs. Cascaded LUT? Caltech CS184 Winter2003 -- DeHon

Datapath Cascade ALU/LUT (datapath) Cascade Long “serial” path w/out general interconnect Pay only Tmux and nearest-neighbor interconnect Caltech CS184 Winter2003 -- DeHon

4-LUT Cascade ALU Caltech CS184 Winter2003 -- DeHon

ALU vs. LUT ? Compare/contrast ALU Only subset of ops available Denser coding for those ops Smaller …but interconnect dominates [Datapath width orthogonal to function] Caltech CS184 Winter2003 -- DeHon

Parallel Prefix LUT Cascade? Can we do better than N×Tmux? Can we compute LUT cascade in O(log(N)) time? Can we compute mux cascade using parallel prefix? Can we make mux cascade associative? Caltech CS184 Winter2003 -- DeHon

Parallel Prefix Mux cascade How can mux transform Smux-out? A=0, B=0  mux-out=0 A=1, B=1  mux-out=1 A=0, B=1  mux-out=S A=1, B=0  mux-out=/S Caltech CS184 Winter2003 -- DeHon

Parallel Prefix Mux cascade How can mux transform Smux-out? A=0, B=0  mux-out=0 Stop= S A=1, B=1  mux-out=1 Generate= G A=0, B=1  mux-out=S Buffer = B A=1, B=0  mux-out=/S Invert = I Caltech CS184 Winter2003 -- DeHon

Parallel Prefix Mux cascade How can 2 muxes transform input? Can I compute 2-mux transforms from 1 mux transforms? Caltech CS184 Winter2003 -- DeHon

Two-mux transforms SSS SGG SBS SIG GSS GGG GBG GIS BSS BGG BBB BII ISS IGG IBI IIB Caltech CS184 Winter2003 -- DeHon

Generalizing mux-cascade How can N muxes transform the input? Is mux transform composition associative? Caltech CS184 Winter2003 -- DeHon

Parallel Prefix Mux-cascade Can be hardwired, no general interconnect Caltech CS184 Winter2003 -- DeHon

“ALU”s Unpacked Traditional/Datapath ALUs SIMD/Datapath Control Architecture variable w Long Cascade Typically also w, but can shorter/longer Amenable to parallel prefix implementation in O(log(w)) time w/ O(w) space Restricted function Reduces instruction bits Reduces expressiveness Caltech CS184 Winter2003 -- DeHon

Commercial Devices Caltech CS184 Winter2003 -- DeHon

Xilinx XC4000 CLB Caltech CS184 Winter2003 -- DeHon

Xilinx Virtex-II Caltech CS184 Winter2003 -- DeHon

Caltech CS184 Winter2003 -- DeHon

Caltech CS184 Winter2003 -- DeHon

Caltech CS184 Winter2003 -- DeHon

Altera Stratix Caltech CS184 Winter2003 -- DeHon

Caltech CS184 Winter2003 -- DeHon

Caltech CS184 Winter2003 -- DeHon

Programmable Array Logic (PLAs) Caltech CS184 Winter2003 -- DeHon

PLA Directly implement flat (two-level) logic O=a*b*c*d + !a*b*!d + b*!c*d Exploit substrate properties allow wired-OR Caltech CS184 Winter2003 -- DeHon

Wired-or Connect series of inputs to wire Any of the inputs can drive the wire high Caltech CS184 Winter2003 -- DeHon

Wired-or Implementation with Transistors Caltech CS184 Winter2003 -- DeHon

Programmable Wired-or Use some memory function to programmable connect (disconnect) wires to OR Fuse: Caltech CS184 Winter2003 -- DeHon

Programmable Wired-or Gate-memory model Caltech CS184 Winter2003 -- DeHon

Diagram Wired-or Caltech CS184 Winter2003 -- DeHon

Wired-or array Build into array Compute many different or functions from set of inputs Caltech CS184 Winter2003 -- DeHon

Combined or-arrays to PLA Combine two or (nor) arrays to produce PLA (and-or array) Programmable Logic Array Caltech CS184 Winter2003 -- DeHon

PLA Can implement each and on single line in first array Can implement each or on single line in second array Caltech CS184 Winter2003 -- DeHon

PLA Efficiency questions: Each and/or is linear in total number of potential inputs (not actual) How many product terms between arrays? Caltech CS184 Winter2003 -- DeHon

PLA Product Terms Can be exponential in number of inputs E.g. n-input xor (parity function) When flatten to two-level logic, requires exponential product terms a*!b+!a*b a*!b*!c+!a*b*!c+!a*!b*c+a*b*c …and shows up in important functions Like addition… Caltech CS184 Winter2003 -- DeHon

PLAs Fast Implementations for large ANDs or ORs Number of P-terms can be exponential in number of input bits most complicated functions not exponential for many functions Can use arrays of small PLAs to exploit structure like we saw arrays of small memories last time Caltech CS184 Winter2003 -- DeHon

PLAs vs. LUTs? Look at Inputs, Outputs, P-Terms minimum area (one study, see paper) K=10, N=12, M=3 A(PLA 10,12,3) comparable to 4-LUT? 80-130%? 300% on ECC (structure LUT can exploit) Delay? Claim 40% fewer logic levels (4-LUT) (general interconnect crossings) [Kouloheris & El Gamal/CICC’92] Caltech CS184 Winter2003 -- DeHon

PLA Caltech CS184 Winter2003 -- DeHon

PLA and Memory Caltech CS184 Winter2003 -- DeHon

PLA and PAL PAL = Programmable Array Logic Caltech CS184 Winter2003 -- DeHon

Conventional/Commercial FPGA Altera 9K (from databook) Caltech CS184 Winter2003 -- DeHon

Conventional/Commercial FPGA Altera 9K (from databook) Like PAL Caltech CS184 Winter2003 -- DeHon

Big Ideas [MSB Ideas] Programmable Interconnect allows us to exploit that structure want to match to application structure Prog. interconnect delay expensive Hardwired Cascades key technique to reducing delay in programmables PLAs canonical two level structure hardwire portions to get Memories, PALs Caltech CS184 Winter2003 -- DeHon

Big Ideas [MSB-1 Ideas] Better structure match with hardwired LUT cascades Caltech CS184 Winter2003 -- DeHon