Caltech CS184a Fall2000 -- DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.

Slides:

Advertisements

Similar presentations

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.

Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day6: October 11, 2000 Instruction Taxonomy VLSI Scaling.

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.

Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.

Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.

CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 10: February 12, 2007 Empirical Comparisons.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 21, 2003 Retiming 2: Structures and Balance.

CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.

CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.

Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 12: February 21, 2007 Compute 2: Cascades, ALUs, PLAs.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.

ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.

FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.

CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.

FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.

FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day18: November 22, 2000 Control.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day7: October 16, 2000 Instruction Space (computing landscape)

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:

Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.

Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.

Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.

CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.

ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.

FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.

CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.

ESE534: Computer Organization

CS184a: Computer Architecture (Structure and Organization)

ESE534 Computer Organization

CS184a: Computer Architecture (Structure and Organization)

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

Topics Circuit design for FPGAs: Logic elements. Interconnect.

ESE534: Computer Organization

CS184a: Computer Architecture (Structures and Organization)

ESE534: Computer Organization

ESE534: Computer Organization

CS184a: Computer Architecture (Structure and Organization)

CprE / ComS 583 Reconfigurable Computing

Presentation transcript:

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs

Caltech CS184a Fall DeHon2 Last Time Instruction Space Modeling –huge range of densities –huge range of efficiencies –large architecture space –modeling to understand design space Started on Empirical Comparisons –[not sure when we’ll finish this up]

Caltech CS184a Fall DeHon3 Today Look at Programmable Compute Blocks Specifically LUTs Today Recurring theme: –define parameterized space –identify costs and benefits –look at typical application requirements –compose results, try to find best point

Caltech CS184a Fall DeHon4 Compute Function What do we use for “compute” function Any Universal –NANDx –ALU –LUT

Caltech CS184a Fall DeHon5 Lookup Table Load bits into table –2 N bits to describe –=> 2 2 N different functions Table translation –performs logic transform

Caltech CS184a Fall DeHon6 Lookup Table

Caltech CS184a Fall DeHon7 We could... Just build a large memory = large LUT Put our function in there What’s wrong with that?

Caltech CS184a Fall DeHon8 FPGA = Many small LUTs Alternative to one big LUT

Caltech CS184a Fall DeHon9 Toronto FPGA Model

Caltech CS184a Fall DeHon10 What’s best to use? Small LUTs Large Memories …small LUTs or large LUTs …or, how big should our memory blocks used to peform computation be?

Caltech CS184a Fall DeHon11 Start to Sort Out: Big vs. Small Luts Establish equivalence –how many small LUTs equal one big LUT?

Caltech CS184a Fall DeHon12 “gates” in 2-LUT ?

Caltech CS184a Fall DeHon13 How Much Logic in a LUT? Lower Bound? –Concrete: 4-LUTs to implement M-LUT Not use all inputs? –0 … maybe 1 Use all inputs? –(M-1)/3 example M-input AND cover 4 ins w/ first 4-LUT, 3 more and cascade input with each additional –(M-1)/k for K-lut

Caltech CS184a Fall DeHon14 How much logic in a LUT? Upper Upper Bound: –M-LUT implemented w/ 4-LUTs –M-LUT  2 M-4 +(2 M-4 -1)  2 M-3 4-LUTs

Caltech CS184a Fall DeHon15 How Much? Lower Upper Bound: –2 2 M functions realizable by M-LUT –Say Need n 4-LUTs to cover; compute n: strategy count functions realizable by each (2 2 4 ) n  2 2 M nlog(2 2 4 )  log(2 2 M ) n2 4 log(2)  2 M log(2) n2 4  2 M n  2 M-4

Caltech CS184a Fall DeHon16 How Much? Combine – Lower Upper Bound –Upper Lower Bound –(number of 4-LUTs in M-LUT) 2 M-4  n  2 M-3

Caltech CS184a Fall DeHon17 Memories and 4-LUTs For the most complex functions an M-LUT has ~2 M-4 4-LUTs SRAM 32Kx8 =0.6  m –170M 2 (21ns latency) –8*2 11 =16K 4-LUTs XC3042 =0.6  m –180M 2 (13ns delay per CLB) –288 4-LUTs Memory is 50+x denser than FPGA –…and faster

Caltech CS184a Fall DeHon18 Memory and 4-LUTs For “regular” functions? 15-bit parity –entire 32Kx8 SRAM –5 4-LUTs (2% of XC3042 ~ 3.2M 2 ~1/50th Memory) 7b Add –entire 32Kx8 SRAM –14 4-LUTs (5% of XC3042, 8.8M 2 ~1/20th Memory )

Caltech CS184a Fall DeHon19 LUT + Interconnect Interconnect allows us to exploit structure in computation Already know –LUT Area << Interconnect Area –Area of an M-LUT on FPGA >> M-LUT Area …but most M-input functions –complexity << 2 M

Caltech CS184a Fall DeHon20 Different Instance, Same Concept Most general functions are huge Applications exhibit structure Exploit structure to optimize “common” case

Caltech CS184a Fall DeHon21 LUT Count vs. base LUT size

Caltech CS184a Fall DeHon22 LUT vs. K DES MCNC Benchmark –moderately irregular

Caltech CS184a Fall DeHon23 Toronto Experiments Want to determine best K for LUTs Bigger LUTs –handle complicated functions efficiently –less interconnect overhead Smaller LUTs –handle regular functions efficiently –interconnect allows exploitation of compute sturcture What’s the typical complexity/structure?

Caltech CS184a Fall DeHon24 Familiar Systematization Define a design/optimization space –pick key parameters –e.g. K = number of LUT inputs Build a cost model Map designs  look at resource costs at each point Compose: Logical Resources  Resource Cost Look for best design points

Caltech CS184a Fall DeHon25 Toronto LUT Size Map to K-LUT –use Chortle Route to determine wiring tracks –global route –different channel width W for each benchmark Area Model for K and W

Caltech CS184a Fall DeHon26 LUT Area vs. K Routing Area roughly linear in K

Caltech CS184a Fall DeHon27 Mapped LUT Area Compose Mapped LUTs and Area Model

Caltech CS184a Fall DeHon28 Mapped Area vs. LUT K N.B. unusual case minimum area at K=3

Caltech CS184a Fall DeHon29 Toronto Result Minimum LUT Area –at K=4 –Important to note minimum on previous slides based on particular cost model –robust for different switch sizes (wire widths) [see graphs in paper]

Caltech CS184a Fall DeHon30 Implications

Caltech CS184a Fall DeHon31 Implications Custom? / Gate Arrays? More restricted logic functions?

Caltech CS184a Fall DeHon32 Relate to Sequential? How does this result relate to sequential execution case? Number of LUTs = Number of Cycles Interconnect Cost? –Naïve –structure in practice? Instruction Cost?

Caltech CS184a Fall DeHon33 Delay Back to Spatial (save for day10)...

Caltech CS184a Fall DeHon34 Delay? Circuit Depth in LUTs? “Simple Function” --> M-input AND –1 table lookup in M-LUT –log k (M) in K-LUT

Caltech CS184a Fall DeHon35 Delay? M-input “Complex” function –1 table lookup for M-LUT –between:  (M-K)/log 2 (k)  +1 –and  (M-K)/log 2 (k- log 2 (k))  +1

Caltech CS184a Fall DeHon36 Delay Simple: log M Complex: linear in M Both go as 1/log(k)

Caltech CS184a Fall DeHon37 Circuit Depth vs. K

Caltech CS184a Fall DeHon38 LUT Delay vs. K For small LUTs: –t LUT  c 0 +c 1  K Large LUTs: –add length term –c 2  2 K Plus Wire Delay –~  area

Caltech CS184a Fall DeHon39 Delay vs. K Delay = Depth  (t LUT + t Interconnect ) Why not satisfied with this model?

Caltech CS184a Fall DeHon40 Observation General interconnect is expensive “Larger” logic blocks –=> less interconnect crossing –=> lower interconnect delay –=> get larger –=> get slower faster than modeled here due to area –=> less area efficient don’t match structure in computation

Caltech CS184a Fall DeHon41 Finishing Up...

Caltech CS184a Fall DeHon42 No Class Monday CS Dept. Retreat Sun/Mon. André not read on Sunday. Catchup on reading, assignment, sleep… see you Wednesday.

Caltech CS184a Fall DeHon43 Big Ideas [MSB Ideas] Memory most dense programmable structure for the most complex functions Memory inefficient (scales poorly) for structured compute tasks Most tasks have some structure Programmable Interconnect allows us to exploit that structure

Caltech CS184a Fall DeHon44 Big Ideas [MSB-1 Ideas] Area –LUT count decrease w/ K, but slower than exponential –LUT size increase w/ K exponential LUT function empirically linear routing area –Minimum area around K=4

Caltech CS184a Fall DeHon45 Big Ideas [MSB-1 Ideas] Delay –LUT depth decreases with K in practice closer to log(K) –Delay increases with K small K linear + large fixed term minimum around 5-6