Download presentation
Presentation is loading. Please wait.
1
Combinational Circuits in Bluespec
Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 10, 2010
2
Bluespec: Two-Level Compilation
(Objects, Types, Higher-order functions) Lennart Augustsson @Sandburst Type checking Massive partial evaluation and static elaboration Level 1 compilation Rules and Actions (Term Rewriting System) Now we call this Guarded Atomic Actions Rule conflict analysis Rule scheduling Level 2 synthesis James Hoe & Arvind @MIT Object code (Verilog/C) February 10, 2010
3
Static Elaboration At compile time Software Toolflow: Hardware
Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most data structure operations source Software Toolflow: source Hardware Toolflow: design2 design3 design1 elaborate w/params .exe compile run1 run2.1 … run3.1 run1.1 run w/ params run w/ params run1 … run February 10, 2010
4
Combinational IFFT … * + - *j t2 t0 t3 t1
Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute * + - *j t2 t0 t3 t1 Constants t0 to t3 are different for each box and can have dramatci impact on optimizations. All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ... February 10, 2010
5
4-way Butterfly Node BSV has a very strong notion of types * + - *i t0
k0 k1 k2 k3 function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); BSV has a very strong notion of types Every expression has a type. Either it is declared by the user or automatically deduced by the compiler The compiler verifies that the type declarations are compatible February 10, 2010
6
BSV code: 4-way Butterfly
function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); Vector#(4,Complex) m, y, z; m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction * + - *i m y z February 10, 2010
7
Complex Arithmetic Addition Multiplication
zR = xR + yR zI = xI + yI Multiplication zR = xR * yR - xI * yI zR = xR * yI + xI * yR The actual arithmetic for FFT is different because we use a non-standard fixed point representations February 10, 2010
8
BSV code for Addition typedef struct{ Int#(t) r; Int#(t) i;
} Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction February 10, 2010
9
Combinational IFFT … stage_f function
Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute stage_f function function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); repeat stage_f three times February 10, 2010
10
BSV Code: Combinational IFFT
function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); The for-loop is unfolded and stage_f is inlined during static elaboration Note: no notion of loops or procedures during execution February 10, 2010
11
BSV Code: Combinational IFFT- Unfolded
function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); stage_data[1] = stage_f(0,stage_data[0]); stage_data[2] = stage_f(1,stage_data[1]); stage_data[3] = stage_f(2,stage_data[2]); Stage_f can be inlined now; it could have been inlined before loop unfolding also. Does the order matter? February 10, 2010
12
Bluespec Code for stage_f
function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); twid’s are mathematically derivable constants February 10, 2010
13
Higher-order functions: Stage functions f1, f2 and f3
function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); function f3(x); return (stage_f(3,x)); What is the type of f1(x) ? February 10, 2010
14
Suppose we want to reuse some part of the circuit ...
Reuse the same circuit three times to reduce area in0 … in1 in2 in63 in3 in4 Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute But why? February 10, 2010
15
Architectural Exploration:
Area-Performance tradeoff in a Transmitter February 10, 2010
16
802.11a Transmitter Overview
headers Must produce one OFDM symbol every 4 msec Controller Scrambler Encoder 24 Uncoded bits data Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol Interleaver Mapper IFFT Cyclic Extend One OFDM symbol (64 Complex Numbers) IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers accounts for 85% area February 10, 2010
17
Preliminary results [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind
Design Lines of Block Code (BSV) Controller Scrambler Conv. Encoder Interleaver Mapper IFFT Cyc. Extender Relative Area 0% 1% 11% 85% 3% Complex arithmetic libraries constitute another 200 lines of code February 10, 2010
18
Combinational IFFT … Reuse the same circuit three times to reduce area
Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute February 10, 2010
19
Design Alternatives Reuse a block over multiple cycles we expect:
f g f g we expect: Throughput to Area to February 10, 2010
20
Circular pipeline: Reusing the Pipeline Stage
… in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter February 10, 2010
21
Superfolded circular pipeline: Just one Bfly-4 node!
… in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Permute Bfly4 64, 2-way Muxes Stage 0 to 2 4, 16-way Muxes Index: 0 to 15 4, 16-way DeMuxes Index == 15? February 10, 2010
22
Pipelining a block inQ outQ f2 f1 f3 Combinational C inQ outQ f2 f1 f3
Pipeline P inQ outQ f Folded Pipeline FP Clock? Area? Throughput? February 10, 2010
23
Synchronous pipeline x sReg1 inQ f1 f2 f3 sReg2 outQ
rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2(sReg1); outQ.enq(f3(sReg2)); endrule This rule can fire only if February 10, 2010
24
Stage functions f1, f2 and f3
function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); function f3(x); return (stage_f(3,x)); The stage_f function was given earlier February 10, 2010
25
Problem: What about pipeline bubbles?
x sReg1 inQ f1 f2 f3 sReg2 outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2(sReg1); outQ.enq(f3(sReg2)); endrule February 10, 2010
26
The Maybe type data in the pipeline
typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= Valid f1(inQ.first()); inQ.deq(); end else sReg1 <= Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= Valid f2(sx1); tagged Invalid: sReg2 <= Invalid; case (sReg2) matches tagged Valid .sx2: outQ.enq(f3(sx2)); endrule sx1 will get bound to the appropriate part of sReg1 February 10, 2010
27
Code for folded pipelined FFT
Next lecture Code for folded pipelined FFT February 10, 2010
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.