Download presentation
Presentation is loading. Please wait.
1
Caltech CS184a Fall2000 -- DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs, PLAs
2
Caltech CS184a Fall2000 -- DeHon2 Last Time LUTs –area –structure –big LUTs vs. small LUTs with interconnect –design space –optimization
3
Caltech CS184a Fall2000 -- DeHon3 Today LUT Delay LUT Cascades ALUs PLAs
4
Caltech CS184a Fall2000 -- DeHon4 Delay
5
Caltech CS184a Fall2000 -- DeHon5 Delay? Circuit Depth in LUTs? “Simple Function” --> M-input AND –1 table lookup in M-LUT –log k (M) in K-LUT
6
Caltech CS184a Fall2000 -- DeHon6 Delay? M-input “Complex” function –1 table lookup for M-LUT –between: (M-K)/log 2 (k) +1 –and (M-K)/log 2 (k- log 2 (k)) +1
7
Caltech CS184a Fall2000 -- DeHon7 Delay Simple: log M Complex: linear in M Both go as 1/log(k)
8
Caltech CS184a Fall2000 -- DeHon8 Circuit Depth vs. K
9
Caltech CS184a Fall2000 -- DeHon9 LUT Delay vs. K For small LUTs: –t LUT c 0 +c 1 K Large LUTs: –add length term –c 2 2 K Plus Wire Delay –~ area
10
Caltech CS184a Fall2000 -- DeHon10 Delay vs. K Delay = Depth (t LUT + t Interconnect ) Why not satisfied with this model?
11
Caltech CS184a Fall2000 -- DeHon11 Observation General interconnect is expensive “Larger” logic blocks –=> less interconnect crossing –=> lower interconnect delay –=> get larger –=> get slower faster than modeled here due to area –=> less area efficient don’t match structure in computation
12
Caltech CS184a Fall2000 -- DeHon12 Different Structure How can we have “larger” compute nodes (less general interconnect) without paying huge area penalty of large LUTs?
13
Caltech CS184a Fall2000 -- DeHon13 Structure in subgraphs Small LUTs capture structure Structure of small LUT-mapped netlists?
14
Caltech CS184a Fall2000 -- DeHon14 Structure LUT sequences ubiquitous
15
Caltech CS184a Fall2000 -- DeHon15 Hardwired Logic Blocks Single Output
16
Caltech CS184a Fall2000 -- DeHon16 Hardwired Logic Blocks Two outputs
17
Caltech CS184a Fall2000 -- DeHon17 Relation to ALUs How do ALUs differ?
18
Caltech CS184a Fall2000 -- DeHon18 PLAs
19
Caltech CS184a Fall2000 -- DeHon19 PLA
20
Caltech CS184a Fall2000 -- DeHon20 PLA and Memory
21
Caltech CS184a Fall2000 -- DeHon21 PLA and PAL
22
Caltech CS184a Fall2000 -- DeHon22 PLAs Fast Implementations for large ANDs or Ors Number of P-terms can be exponential in number of input bits –most complicated functions Can use arrays of small PLAs –to exploit structure –like we saw arrays of small memories last time
23
Caltech CS184a Fall2000 -- DeHon23 PLAs vs. LUTs? Look at Inputs, Outputs, P-Terms –minimum area (one study, see paper) –K=10, N=12, M=3 A(PLA 10,12,3) comparable to 4-LUT? –80-130%? –300% on ECC (structure LUT can exploit) Delay? –Claim 40% fewer logic levels (general interconnect crossings)
24
Caltech CS184a Fall2000 -- DeHon24 PLA Optimization (Folding)
25
Caltech CS184a Fall2000 -- DeHon25 Conventional/Commercial FPGA Altera 9K (from databook)
26
Caltech CS184a Fall2000 -- DeHon26 Conventional/Commercial FPGA Altera 9K (from databook)
27
Caltech CS184a Fall2000 -- DeHon27 Finishing Up...
28
Caltech CS184a Fall2000 -- DeHon28 Admin Homework 2 return Questions about homework
29
Caltech CS184a Fall2000 -- DeHon29 Big Ideas [MSB Ideas] Programmable Interconnect allows us to exploit that structure –want to match to application structure Hardwired Cascades –key technique to reducing delay in programmables PLAs –canonical two level structure –hardwire portions to get Memories, PALs
30
Caltech CS184a Fall2000 -- DeHon30 Big Ideas [MSB-1 Ideas] Delay –LUT depth decreases with K in practice closer to log(K) –Delay increases with K small K linear + large fixed term minimum around 5-6 Better structure match with hardwired LUT cascades
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.