Download presentation
Presentation is loading. Please wait.
1
Combinational Circuits in Bluespec
Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology January 17, 2011
2
Bluespec: Two-Level Compilation
(Objects, Types, Higher-order functions) Lennart Augustsson @Sandburst Type checking Massive partial evaluation and static elaboration Level 1 compilation Rules and Actions (Term Rewriting System) Now we call this Guarded Atomic Actions Rule conflict analysis Rule scheduling Level 2 synthesis James Hoe & Arvind @MIT Object code (Verilog/C) January 17, 2011
3
Static Elaboration At compile time Software Toolflow: Hardware
Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most data structure operations source Software Toolflow: source Hardware Toolflow: design2 design3 design1 elaborate w/params .exe compile run1 run2.1 … run3.1 run1.1 run w/ params run w/ params run1 … run January 17, 2011
4
Combinational IFFT … * + - *j t2 t0 t3 t1
Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute * + - *j t2 t0 t3 t1 Constants t0 to t3 are different for each box and can have dramatci impact on optimizations. All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ... January 17, 2011
5
4-way Butterfly Node BSV has a very strong notion of types * + - *i t0
k0 k1 k2 k3 function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); BSV has a very strong notion of types Every expression has a type. Either it is declared by the user or automatically deduced by the compiler The compiler verifies that the type declarations are compatible January 17, 2011
6
BSV code: 4-way Butterfly
function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); Vector#(4,Complex) m, y, z; m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction * + - *i m y z Polymorphic code: works on any type of numbers for which *, + and - have been defined Note: Vector does not mean storage January 17, 2011
7
Complex Arithmetic Addition Multiplication
zR = xR + yR zI = xI + yI Multiplication zR = xR * yR - xI * yI zR = xR * yI + xI * yR The actual arithmetic for FFT is different because we use a non-standard fixed point representation January 17, 2011
8
BSV code for Addition typedef struct{ Int#(t) r; Int#(t) i;
} Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction What is the type of this + ? January 17, 2011
9
Combinational IFFT … stage_f function
Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute stage_f function function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); repeat stage_f three times January 17, 2011
10
BSV Code: Combinational IFFT
function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); The for-loop is unfolded and stage_f is inlined during static elaboration Note: no notion of loops or procedures during execution January 17, 2011
11
BSV Code: Combinational IFFT- Unfolded
function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); stage_data[1] = stage_f(0,stage_data[0]); stage_data[2] = stage_f(1,stage_data[1]); stage_data[3] = stage_f(2,stage_data[2]); Stage_f can be inlined now; it could have been inlined before loop unfolding also. Does the order matter? January 17, 2011
12
Bluespec Code for stage_f
function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); twid’s are mathematically derivable constants January 17, 2011
13
Higher-order functions: Stage functions f1, f2 and f3
function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); function f2(x); return (stage_f(2,x)); What is the type of f0(x) ? function Vector#(64, Complex) f0 (Vector#(64, Complex) x); January 17, 2011
14
Suppose we want to reuse some part of the circuit ...
Reuse the same circuit three times to reduce area in0 … in1 in2 in63 in3 in4 Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute But why? January 17, 2011
15
Architectural Exploration:
Area-Performance tradeoff in a Transmitter January 17, 2011
16
802.11a Transmitter Overview
headers Must produce one OFDM symbol every 4 msec Controller Scrambler Encoder 24 Uncoded bits data Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol Interleaver Mapper IFFT Cyclic Extend One OFDM symbol (64 Complex Numbers) IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers accounts for 85% area January 17, 2011
17
Preliminary results [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind
Design Lines of Block Code (BSV) Controller Scrambler Conv. Encoder Interleaver Mapper IFFT Cyc. Extender Relative Area 0% 1% 11% 85% 3% Complex arithmetic libraries constitute another 200 lines of code January 17, 2011
18
Combinational IFFT … Reuse the same circuit three times to reduce area
Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute January 17, 2011
19
Design Alternatives Reuse a block over multiple cycles we expect:
f g f g we expect: Throughput to Area to decrease – less parallelism decrease – reusing a block The clock needs to run faster for the same throughput hyper-linear increase in energy January 17, 2011
20
Circular pipeline: Reusing the Pipeline Stage
… in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter January 17, 2011
21
Superfolded circular pipeline: Just one Bfly-4 node!
… in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Permute Bfly4 64, 2-way Muxes Stage 0 to 2 4, 16-way Muxes Index: 0 to 15 4, 16-way DeMuxes Index == 15? January 17, 2011
22
Pipelining a block inQ outQ f2 f1 f3 Combinational C inQ outQ f2 f1 f3
Pipeline P inQ outQ f Folded Pipeline FP Clock: C < P FP Clock? Area? Throughput? Area: FP < C < P Throughput: FP < C < P January 17, 2011
23
Synchronous pipeline x sReg0 inQ f0 f1 f2 sReg1 outQ
rule sync-pipeline (True); inQ.deq(); sReg0 <= f0(inQ.first()); sReg1 <= f1(sReg0); outQ.enq(f2(sReg1)); endrule This rule can fire only if - inQ has an element - outQ has space Atomicity: Either all or none of the state elements inQ, outQ, sReg0 and sReg1 will be updated This is real IFFT code; just replace f0, f1 and f2 with stage_f code January 17, 2011
24
Stage functions f1, f2 and f3
function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); function f2(x); return (stage_f(2,x)); The stage_f function was given earlier January 17, 2011
25
Problem: What about pipeline bubbles?
x sReg0 inQ f0 f1 f2 sReg1 outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg0); outQ.enq(f2(sReg1)); endrule Red and Green tokens must move even if there is nothing in the inQ! Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type Modify the rule to deal with these conditions January 17, 2011
26
The Maybe type data in the pipeline
typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg0 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg0 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg1 <= tagged Valid f1(sx1); tagged Invalid: sReg1 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcase endrule sx1 will get bound to the appropriate part of sReg1 January 17, 2011
27
Folded pipeline for FFT
Next lecture Folded pipeline for FFT January 17, 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.