Download presentation
Presentation is loading. Please wait.
1
ESE534: Computer Organization
Day 13: March 15, 2010 Empirical Comparisons Penn ESE534 Spring DeHon
2
Last Time Instruction Space Modeling Penn ESE534 Spring DeHon
3
Today Empirical Data Custom FPGAs Processors Tasks Gate Array
Std. Cell Full FPGAs Processors Tasks Penn ESE534 Spring DeHon
4
Preclass 1 How big? 2-LUT? 2-LUT w/ Flip-flop?
2-LUT w/ 4 input sources? 2-LUT w/ 1024 input sources? Penn ESE534 Spring DeHon
5
Empirical Comparisons
Penn ESE534 Spring DeHon
6
Empirical Ground modeling in some concretes Start sorting out
custom vs. configurable spatial configurable vs. temporal Penn ESE534 Spring DeHon
7
Full Custom Get to define all layers Use any geometry you like
Only rules are process design rules ESE570 Penn ESE534 Spring DeHon
8
Standard Cell Area inv nand3 inv AOI4 nor3 Inv All cells uniform
height Width of channel determined by routing Cell area Penn ESE534 Spring DeHon
9
Standard Cell Area inv nand3 inv AOI4 nor3 Inv All cells uniform
height Width of channel determined by routing Cell area Identify the full custom and standard cell regions on 386DX die Penn ESE534 Spring DeHon
10
MPGA Metal Programmable Gate Array Gates pre-placed (poly, diffusion)
Resurrected as “Structured ASICs” …and already dead?... Structured ASICs: what happened at LSI Logic? EDN, Oct. 2006 Still headed to $1B business? Gates pre-placed (poly, diffusion) Only get to define metal connections Cheap – only have to pay for metal mask(s) [Wu&Tsai/ISPD2004p103] Penn ESE534 Spring DeHon
11
MPGA/SA vs. Custom? AMI CICC’83 AMI CICC’04 Toshiba DSP Mosaid RAM
Std-Cell 0.7 Custom 0.5 AMI CICC’04 Custom 0.6 (DSP) Custom 0.8 (DPath) Toshiba DSP Custom 0.3 Mosaid RAM Custom 0.2 GE CICC’86 MPGA 1.0 Std-Cell FF/counter 0.7 FullAdder 0.4 RAM 0.2 Penn ESE534 Spring DeHon
12
Metal Programmable Gate Arrays
Penn ESE534 Spring DeHon
13
MPGAs Modern -- “Sea of Gates” yield 35--70% maybe 5kl2/gate ?
(quite a bit of variance) Penn ESE534 Spring DeHon
14
FPGA Table Penn ESE534 Spring DeHon
15
(semi) Modern FPGAs 1.25Ml2/LE 1. 5Ml2/4-LUT APEX 20K1500E 52K LEs
0.18mm 24mm 22mm 1.25Ml2/LE XC2V1000 10.44mm x 9.90mm [source: Chipworks] 0.15mm 11,520 4-LUTs 1. 5Ml2/4-LUT [Both also have RAM in cited area] Penn ESE534 Spring DeHon
16
Conventional FPGA Tile
K-LUT (typical k=4) w/ optional output Flip-Flop Penn ESE534 Spring DeHon
17
Toronto FPGA Model Penn ESE534 Spring DeHon
18
How many gates? (Prelcass 2)
Penn ESE534 Spring DeHon
19
“gates” in 2-LUT Penn ESE534 Spring DeHon
20
Now how many? Penn ESE534 Spring DeHon
21
Which gives: More usable gates? More gates/unit area?
Penn ESE534 Spring DeHon
22
Gates Required? Depth=3, Depth=2048? Penn ESE534 Spring DeHon
23
Gate metric for FPGAs? Day9: several components for computations
compute element interconnect: space time instructions Not all applications need in same balance Assigning a single “capacity” number to device is an oversimplification Penn ESE534 Spring DeHon
24
MPGA vs. FPGA Adding ~2x Custom/MPGA, Custom/FPGA ~10x Xilinx XC4K
1.25Ml2 /CLB gates (26?) 26-73Kl2/gate net MPGA (SOG GA) 5Kl2/gate 35-70% usable (50%) 7-17Kl2/gate net Ratio: (5) Adding ~2x Custom/MPGA, Custom/FPGA ~10x Penn ESE534 Spring DeHon
25
90nm FPGA: Stratix II STMicro CMOS090 Standard Cell Full custom layout
…but by tool [Kuon/Rose TRCADv26n2p ] Penn ESE534 Spring DeHon
26
MPGA vs. FPGA (Delay) MPGA (SOG GA) Ratio: 1--7 (2.5) Xilinx XC4K
l=0.6m tgd~1ns Ratio: (2.5) Altera claiming 2× For their Structured ASIC [2007] LSI claiming 3× 2005 Xilinx XC4K l=0.6m 1-7 gates in 7ns 2-3 gates typical Penn ESE534 Spring DeHon
27
Delay 90nm FPGA: Stratix II STMicro CMOS090
[Kuon/Rose TRCADv26n2p ] Penn ESE534 Spring DeHon
28
Energy 90nm FPGA: Stratix II STMicro CMOS090
[Kuon/Rose TRCADv26n2p ] Penn ESE534 Spring DeHon
29
Processors vs. FPGAs Penn ESE534 Spring DeHon
30
Processors and FPGAs Penn ESE534 Spring DeHon
31
Component Example Single die in 0.35mm XC4085XL-09 3,136 CLBs 4.6ns
682 Bit Ops/ns Alpha 64b ALUs ns 55.7 Bit Ops/ns [1 “bit op” = 2 gate evaluations] Penn ESE534 Spring DeHon
32
Processors and FPGAs Penn ESE534 Spring DeHon
33
Raw Density Summary Area Area-Time MPGA 2-3x Custom FPGA 5x MPGA
FPGA:std-cell custom ~ 15-30x Area-Time Gate Array 6-10x Custom FPGA 15-20x Gate Array FPGA:std-cell custom ~ 100x Processor 10x FPGA Penn ESE534 Spring DeHon
34
Raw Density Caveats Processor/FPGA may solve more specialized problem
Problems have different resource balance requirements …can lead to low yield of raw density Penn ESE534 Spring DeHon
35
Task Comparisons Penn ESE534 Spring DeHon
36
Broadening Picture Compare larger computations For comparison
throughput density metric: results/area-time normalize out area-time point selection high throughput density most in fixed area least area to satisfy fixed throughput target Penn ESE534 Spring DeHon
37
Multiply Penn ESE534 Spring DeHon
38
Preclass 3 Efficiency of 8×8 multiply on 16×16 multiplier?
Penn ESE534 Spring DeHon
39
Multiply Penn ESE534 Spring DeHon
40
Example: FIR Filtering
Yi=w1xi+w2xi+1+... Application metric: TAPs = filter taps multiply accumulate Penn ESE534 Spring DeHon
41
Energy [Abnous et al, The Application of Programmable DSPs in
Mobile Communications, Wiley, 2002, pp ] DeHon-Workshop FPT 2009
42
IIR/Biquad Simplest IIR: Yi=AXi+B Yi-1
Penn ESE534 Spring DeHon
43
DES Keysearch <http://www.cs.berkeley.edu/~iang/isaac/hardware/>
Penn ESE534 Spring DeHon
44
DNA Sequence Match Problem: “cost” of transform S1S2
Given: cost of insertion, deletion, substitution Relevance: similarity of DNA sequences evolutionary similarity structure predict function Typically: new sequence compared to large databse Penn ESE534 Spring DeHon
45
DNA Sequence Match Penn ESE534 Spring DeHon
46
Degrade from Peak Penn ESE534 Spring DeHon
47
Degrade from Peak: FPGAs
Long path length not run at cycle Limited throughput requirement bottlenecks elsewhere limit throughput req. Insufficient interconnect Insufficient retiming resources (bandwidth) Penn ESE534 Spring DeHon
48
Degrade from Peak: Processors
Ops w/ no gate evaluations (interconnect) Ops use limited word width Stalls waiting for retimed data Penn ESE534 Spring DeHon
49
Degrade from Peak: Custom/MPGA
Solve more general problem than required (more gates than really need) Long path length Limited throughput requirement Not needed or applicable to a problem Penn ESE534 Spring DeHon
50
Degrade Notes We’ll cover these issues in more detail as we get into them later in the course Penn ESE534 Spring DeHon
51
Admin HW6 out Wednesday Reading on web Penn ESE534 Spring DeHon
52
Big Ideas [MSB Ideas] Raw densities: custom:ga:fpga:processor
1:5:100:1000 close gap with specialization Penn ESE534 Spring DeHon
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.