Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESE534: Computer Organization

Similar presentations


Presentation on theme: "ESE534: Computer Organization"— Presentation transcript:

1 ESE534: Computer Organization
Day 13: March 15, 2010 Empirical Comparisons Penn ESE534 Spring DeHon

2 Last Time Instruction Space Modeling Penn ESE534 Spring DeHon

3 Today Empirical Data Custom FPGAs Processors Tasks Gate Array
Std. Cell Full FPGAs Processors Tasks Penn ESE534 Spring DeHon

4 Preclass 1 How big? 2-LUT? 2-LUT w/ Flip-flop?
2-LUT w/ 4 input sources? 2-LUT w/ 1024 input sources? Penn ESE534 Spring DeHon

5 Empirical Comparisons
Penn ESE534 Spring DeHon

6 Empirical Ground modeling in some concretes Start sorting out
custom vs. configurable spatial configurable vs. temporal Penn ESE534 Spring DeHon

7 Full Custom Get to define all layers Use any geometry you like
Only rules are process design rules ESE570 Penn ESE534 Spring DeHon

8 Standard Cell Area inv nand3 inv AOI4 nor3 Inv All cells uniform
height Width of channel determined by routing Cell area Penn ESE534 Spring DeHon

9 Standard Cell Area inv nand3 inv AOI4 nor3 Inv All cells uniform
height Width of channel determined by routing Cell area Identify the full custom and standard cell regions on 386DX die Penn ESE534 Spring DeHon

10 MPGA Metal Programmable Gate Array Gates pre-placed (poly, diffusion)
Resurrected as “Structured ASICs” …and already dead?... Structured ASICs: what happened at LSI Logic? EDN, Oct. 2006 Still headed to $1B business? Gates pre-placed (poly, diffusion) Only get to define metal connections Cheap – only have to pay for metal mask(s) [Wu&Tsai/ISPD2004p103] Penn ESE534 Spring DeHon

11 MPGA/SA vs. Custom? AMI CICC’83 AMI CICC’04 Toshiba DSP Mosaid RAM
Std-Cell 0.7 Custom 0.5 AMI CICC’04 Custom 0.6 (DSP) Custom 0.8 (DPath) Toshiba DSP Custom 0.3 Mosaid RAM Custom 0.2 GE CICC’86 MPGA 1.0 Std-Cell FF/counter 0.7 FullAdder 0.4 RAM 0.2 Penn ESE534 Spring DeHon

12 Metal Programmable Gate Arrays
Penn ESE534 Spring DeHon

13 MPGAs Modern -- “Sea of Gates” yield 35--70% maybe 5kl2/gate ?
(quite a bit of variance) Penn ESE534 Spring DeHon

14 FPGA Table Penn ESE534 Spring DeHon

15 (semi) Modern FPGAs 1.25Ml2/LE 1. 5Ml2/4-LUT APEX 20K1500E 52K LEs
0.18mm 24mm  22mm 1.25Ml2/LE XC2V1000 10.44mm x 9.90mm [source: Chipworks] 0.15mm 11,520 4-LUTs 1. 5Ml2/4-LUT [Both also have RAM in cited area] Penn ESE534 Spring DeHon

16 Conventional FPGA Tile
K-LUT (typical k=4) w/ optional output Flip-Flop Penn ESE534 Spring DeHon

17 Toronto FPGA Model Penn ESE534 Spring DeHon

18 How many gates? (Prelcass 2)
Penn ESE534 Spring DeHon

19 “gates” in 2-LUT Penn ESE534 Spring DeHon

20 Now how many? Penn ESE534 Spring DeHon

21 Which gives: More usable gates? More gates/unit area?
Penn ESE534 Spring DeHon

22 Gates Required? Depth=3, Depth=2048? Penn ESE534 Spring DeHon

23 Gate metric for FPGAs? Day9: several components for computations
compute element interconnect: space time instructions Not all applications need in same balance Assigning a single “capacity” number to device is an oversimplification Penn ESE534 Spring DeHon

24 MPGA vs. FPGA Adding ~2x Custom/MPGA, Custom/FPGA ~10x Xilinx XC4K
1.25Ml2 /CLB gates (26?) 26-73Kl2/gate net MPGA (SOG GA) 5Kl2/gate 35-70% usable (50%) 7-17Kl2/gate net Ratio: (5) Adding ~2x Custom/MPGA, Custom/FPGA ~10x Penn ESE534 Spring DeHon

25 90nm FPGA: Stratix II STMicro CMOS090 Standard Cell Full custom layout
…but by tool [Kuon/Rose TRCADv26n2p ] Penn ESE534 Spring DeHon

26 MPGA vs. FPGA (Delay) MPGA (SOG GA) Ratio: 1--7 (2.5) Xilinx XC4K
l=0.6m tgd~1ns Ratio: (2.5) Altera claiming 2× For their Structured ASIC [2007] LSI claiming 3× 2005 Xilinx XC4K l=0.6m 1-7 gates in 7ns 2-3 gates typical Penn ESE534 Spring DeHon

27 Delay 90nm FPGA: Stratix II STMicro CMOS090
[Kuon/Rose TRCADv26n2p ] Penn ESE534 Spring DeHon

28 Energy 90nm FPGA: Stratix II STMicro CMOS090
[Kuon/Rose TRCADv26n2p ] Penn ESE534 Spring DeHon

29 Processors vs. FPGAs Penn ESE534 Spring DeHon

30 Processors and FPGAs Penn ESE534 Spring DeHon

31 Component Example Single die in 0.35mm XC4085XL-09 3,136 CLBs 4.6ns
682 Bit Ops/ns Alpha 64b ALUs ns 55.7 Bit Ops/ns [1 “bit op” = 2 gate evaluations] Penn ESE534 Spring DeHon

32 Processors and FPGAs Penn ESE534 Spring DeHon

33 Raw Density Summary Area Area-Time MPGA 2-3x Custom FPGA 5x MPGA
FPGA:std-cell custom ~ 15-30x Area-Time Gate Array 6-10x Custom FPGA 15-20x Gate Array FPGA:std-cell custom ~ 100x Processor 10x FPGA Penn ESE534 Spring DeHon

34 Raw Density Caveats Processor/FPGA may solve more specialized problem
Problems have different resource balance requirements …can lead to low yield of raw density Penn ESE534 Spring DeHon

35 Task Comparisons Penn ESE534 Spring DeHon

36 Broadening Picture Compare larger computations For comparison
throughput density metric: results/area-time normalize out area-time point selection high throughput density  most in fixed area  least area to satisfy fixed throughput target Penn ESE534 Spring DeHon

37 Multiply Penn ESE534 Spring DeHon

38 Preclass 3 Efficiency of 8×8 multiply on 16×16 multiplier?
Penn ESE534 Spring DeHon

39 Multiply Penn ESE534 Spring DeHon

40 Example: FIR Filtering
Yi=w1xi+w2xi+1+... Application metric: TAPs = filter taps multiply accumulate Penn ESE534 Spring DeHon

41 Energy [Abnous et al, The Application of Programmable DSPs in
Mobile Communications, Wiley, 2002, pp ] DeHon-Workshop FPT 2009

42 IIR/Biquad Simplest IIR: Yi=AXi+B Yi-1
Penn ESE534 Spring DeHon

43 DES Keysearch <http://www.cs.berkeley.edu/~iang/isaac/hardware/>
Penn ESE534 Spring DeHon

44 DNA Sequence Match Problem: “cost” of transform S1S2
Given: cost of insertion, deletion, substitution Relevance: similarity of DNA sequences evolutionary similarity structure predict function Typically: new sequence compared to large databse Penn ESE534 Spring DeHon

45 DNA Sequence Match Penn ESE534 Spring DeHon

46 Degrade from Peak Penn ESE534 Spring DeHon

47 Degrade from Peak: FPGAs
Long path length  not run at cycle Limited throughput requirement bottlenecks elsewhere limit throughput req. Insufficient interconnect Insufficient retiming resources (bandwidth) Penn ESE534 Spring DeHon

48 Degrade from Peak: Processors
Ops w/ no gate evaluations (interconnect) Ops use limited word width Stalls waiting for retimed data Penn ESE534 Spring DeHon

49 Degrade from Peak: Custom/MPGA
Solve more general problem than required (more gates than really need) Long path length Limited throughput requirement Not needed or applicable to a problem Penn ESE534 Spring DeHon

50 Degrade Notes We’ll cover these issues in more detail as we get into them later in the course Penn ESE534 Spring DeHon

51 Admin HW6 out Wednesday Reading on web Penn ESE534 Spring DeHon

52 Big Ideas [MSB Ideas] Raw densities: custom:ga:fpga:processor
1:5:100:1000 close gap with specialization Penn ESE534 Spring DeHon


Download ppt "ESE534: Computer Organization"

Similar presentations


Ads by Google