Download presentation
Presentation is loading. Please wait.
1
CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices
2
Today Finish up empirical comparisons Instructions and their impact Modeling
3
Multiply
4
FIR
5
IIR/Biquad
6
DES Keysearch
7
DNA Sequence Match Problem: “cost” of transform S 1 S 2 Given: cost of insertion, deletion, substitution Relevance: similarity of DNA sequences –evolutionary similarity –structure predict function Typically: new sequence compared to large databse
8
DNA Sequence Match
9
Floating-Point Add (single prec.)
10
Floating-Point Mpy (single prec.)
11
Summary Raw densities (Area-Time) –FPGA/custom = 100x –Processor/custom = 1000x Special-purpose functional units in processors/DSPs, much lower net benefit since need to control and interconnect Gap narrows (closes) as programmable can be specialized
12
Modeling Why do we model?
13
Terminology Primitive Instruction (pinst) –Collection of bits which tell a bit-processing element what to do –Includes: select compute operation input sources in space (interconnect) input sources in time (retiming)
14
Computational Array Model Collection of computing elements –compute operator –local storage/retiming Interconnect Instruction
15
“Ideal” Instruction Control Issue a new instruction to every computational bit operator on every cycle
16
“Ideal” Instruction Distribution Why don’t we do this?
17
“Ideal” Instruction Distribution Problem: Instruction bandwidth (and storage area) quickly dominates everything else –Compute Block ~ 1M x –Instruction ~ 64 bits –Wire Pitch ~ –Memory bit ~
18
Instruction Distribution 64x8 =512 Two instructions in 1024
19
Instruction Distribution Distribute from both sides = 2x
20
Instruction Distribution Distribute X and Y = 2x
21
Instruction Distribution Room to distribute 2 instructions across PE per metal layer (1024 = 2x8x64) Feed top and bottom (left and right) = 2x Two complete metal layers = 2x => 8 instructions / PE Side
22
Instruction Distribution Maximum of 8 instructions per PE side Saturate wire channels at 8x N = N => at 64 PE –beyond this instruction distribution dominates area
23
Instruction Distribution Beyond 64 PE, instruction bandwidth dictates PE size – Pe area /(64x8 )x4 N = N –Pe area =16K x N
24
Instruction Memory Requirements Idea: put instruction memory in array Problem: Instruction memory can quickly dominate area, too –Memory Area = 64x1.2K /instruction –PE area = 1M + (Instructions)x80K
25
Instruction Pragmatics Instruction requirements could dominate array size. Standard architecture trick: –Look for structure to exploit in “typical computations”
26
Typical Structure? What structure do we usually expect?
27
Two Extremes SIMD Array (microprocessors) –Instruction/cycle –share instruction across array of PE s –uniform operation in space –operation variance in time FPGA –Instruction/PE –assume temporal locality of instructions (same) –operation variance in space –uniform operations in time
28
Hybrids VLIW (SuperScalar) –Few pinsts/cycle –Share instruction across w bits DPGA –Small instruction store / PE
29
Architecture Instr. Taxonomy
30
Modeling Instruction Effects Restrictions from “ideal” save area Restriction from “ideal” limits usability (yield) of PE Want to understand effects –area model –utilization/yield model
31
Efficiency/Yield Intuition What happens when –Datapath is too wide? –Datapath is too narrow? –Instruction memory is too deep? –Instruction memory is too shallow?
32
Computing Device Composition –Bit Processing elements –Interconnect space time –Instruction Memory
33
Computing Device Composition Tile basic building blocks into array
34
Model Area
35
Spot Check Area FPGA –w=1, d=c=1, k=4 880K –Xilinx 4K 630K –Altera 8K 930K SIMD –w=1000, c-0, d-64, k=3 170K –Abacus 190K Processor –w=32, d=32, c=1024, k=2 2.6M –MIPS-X 2.1M
36
Peak Densities from Model
37
Empirical Peak Densities
38
Efficiency What do we want to maximize? –Useful work per unit silicon Yield Fraction / Area (or minimize (Area/Yield) )
39
Efficiency For comparison, look at relative efficiency to ideal. Ideal = architecture exactly matched to application requirements Efficiency = A ideal /A arch A arch = Area Op/Yield
40
Efficiency Calculation
41
Efficiency with fixed Width w=1, 16K PEs
42
FPGA vs. DPGA Compare Robust point: when context memory area is 1/2 of total area
43
Robust Point depend on Width w=1 w=8 w=64
44
Processors and FPGAs FPGA c=d=1, w=1, k=4 “Processor” c=d=1024, w=64, k=2
45
Intermediate Architecture w=8 c=64 16K PEs
46
Caveats Model abstracts away many details which are important –interconnect –control –specialized functional units Applications are a heterogeneous mix of characteristics
47
Modeling Message Architecture space is huge Easy to be very inefficient Hard to pick one point robust across entire space
48
Summary Instruction resources can be significant Exploit application structure to reduce instruction requirements Classify architectures by instruction organization Arch. structure and application requirements mismatch => inefficiencies Model => visualize efficiency trends
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.