Application-Specific Hardware Computing Without Processors Mihai Budiu October 6, 2001 SOCS-4
2 Complexity Pentium Pentium III
3 A New Approach CPU Compiler Program Executable Reconfigurable hw Configuration Compiler CAD Tool Program
4 Outline Reconfigurable Hardware Application-Specific Hardware (ASH) ASH Properties Conclusions
5 Reconfigurable Hardware Universal gates and/or storage elements Interconnection network Programmable switches
6 Switch controlled by a 1-bit RAM cell Universal gate = RAM a0 a1 a0 a1 data a1 & a2 0 data in control Main RH Ingredient: RAM Cell
7 Place and Route int reverse(int x) { int k,r=0; for (k=0; k<64; k++) r |= x&1; x = x >> 1; r = r << 1; } int func(int* a,int *b) { int j,sum=0; for (j=0; *a>0; j++) sum+=reverse(*b
8 Application C ProgramVerilog CADCompiler OS support communication manual RH Today
9 Three Models of Computation CPUASIC Universal Interpretation Custom Direct execution RH Universal Direct execution Defect tolerance
10 Outline Reconfigurable Hardware Application-Specific Hardware (ASH) ASH Properties Conclusions
11 Application-Specific Hardware Reconfigurable hardware HLL program Compiler Circuit
12 CASH: Compiling for ASH Memory partitioning Interconnection net Circuits C Program RH
13 Stages of Compilation 1. Program int reverse(int x) { int k,r=0; for (k=0; k > 1; r = r << 1; } } Unknown latency ops. Computations & local storage 2. Split-phase Abstract Machines 3. Configurations placed independently 4. Placement on chip
14 Split-phase Abstract Machines SAM 1 SAM 2 SAM 3 CFG
15 Hyperblock => SAM Single-entry, multiple exit May contain loops
16 Computation = Dataflow x = a & 7;... y = x >> 2; Programs & a 7 >> 2 x Circuits variables wires
17 Speculation if (x > 0) y = -x; else y = b*x; * xb0 y ! ComputationPredicates -> Q
18 Loops for (i=0; i < 10; i++) a[i] += i; + load + store &a[0] + 1 i 0 a[0] a[1] a[2] a[3] = Pipelining
19 Example int f(void) { int i=0, j = 0; for (; i < 10; i++) j += i; return j; }
20 Outline Reconfigurable Hardware Application-Specific Hardware (ASH) ASH Properties Conclusions
21 Defect Tolerance CPU One defect: chip useless ASH Can reconfigure around defects
22 Power Consumption CPU 100+W 30M transistors 2Ghz ASH 1 SAM active, all other idle
23 Verification CPU Huge effort Extremely complex ASH Program translation validation: feasible program compilerCPU program compiler P in = P out
24 CAD Tools CPU Lots of exceptions handled manually Very long time ASH Local structures Interactive compilation
25 Circuit Size circuit # operations All circuits for all programs in SpecINT95 and Mediabench
26 Total Size (Largest 2 Programs) Benchmarkjpeg_e147.vortex Lines 26,88167,210 SAMs 1,3311,433 FP Load/store 8,693 24,913 Call/ret 1,964 9,602 Predicates 8,167 39,195 Arithmetic 1,022,023 1,448,933 Mux 200, ,839 Registers 76,722 32,850 UnitsUnits Bit-opsBit-ops
27 Implications Enough resources in the near future A case for datapath oriented RH: –a better match for computation –high density –fast configuration –more amenable to compilation –few predicate operations
28 Instruction-Level Parallelism CPU Wide execution path (4-6) Low sustained ILP (~1.5) ASH ILP statically extracted Sustained >3
29 ILP circuit # avg ILP
30 Cost CPU Plant = 3B$ Mask = 100K$ (need 20+) Design = ?M$ ASH Can use defective chips Same masks for all chips Design = free
31 Summary Microprocessor complexity becomes overwhelming Application-Specific Hardware (ASH) translates applications into hardware ASH has novel properties and promises to scale well with increasing resources
32 Extras CPU + ASH Speculation and critical paths Computing predicates
33 CPU+ASH core computation support computation + OS + VM CPUASH Memory HLL Program back
34 Speculation if (x > 0) y = -x; else y = b*x; * x b0 y ! slow ComputationPredicates -> -> and Eager Muxes back to talk back to extras
35 Computing Predicates Correct for irreducible graphs Correct even when speculatively computed Can be eagerly computed st b back