Combinational Circuits in Bluespec

Slides:



Advertisements
Similar presentations
Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Advertisements

March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
November 2, 2006http://csg.csail.mit.edu/6.827/L15-1 An hardware inspired model for parallel programming Arvind Computer Science & Artificial Intelligence.
June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
March 4, 2009L13-1http://csg.csail.mit.edu/6.375 Multiple Clock Domains Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Constructive Computer Architecture Combinational circuits-2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 15, 2010.
Multiple Clock Domains (MCD) Continued … Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology November.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
March, 2007http://csg.csail.mit.edu/arvindIFFT-1 Combinational Circuits: IFFT, Types, Parameterization... Arvind Computer Science & Artificial Intelligence.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011L03-1
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Overview Logistics Last lecture Today HW5 due today
Functions and Types in Bluespec
Bluespec-3: A non-pipelined processor Arvind
Folded Combinational Circuits as an example of Sequential Circuits
Folded “Combinational” circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Architectural Exploration:
Sequential Circuits: Constructive Computer Architecture
Combinational ALU Constructive Computer Architecture Arvind
Combinational Circuits in Bluespec
FFT: An example of complex combinational circuits
Combinational Circuits in Bluespec
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Pipelining combinational circuits
BSV objects and a tour of known BSV problems
Combinational Circuits and Simple Synchronous Pipelines
Combinational Circuits and Simple Synchronous Pipelines
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Modular Refinement - 2 Arvind
Combinational Circuits in Bluespec
Multiple Clock Domains
Elastic Pipelines and Basics of Multi-rule Systems
FFT: An example of complex combinational circuits
Constructive Computer Architecture: Guards
Elastic Pipelines and Basics of Multi-rule Systems
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Modeling Processors Arvind
Pipelining combinational circuits
Architectural Exploration:
Simple Synchronous Pipelines
Combinational ALU 6.S078 - Computer Architecture:
Presentation transcript:

Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology January 17, 2011 http://csg.csail.mit.edu/SNU

Bluespec: Two-Level Compilation (Objects, Types, Higher-order functions) Lennart Augustsson @Sandburst 2000-2002 Type checking Massive partial evaluation and static elaboration Level 1 compilation Rules and Actions (Term Rewriting System) Now we call this Guarded Atomic Actions Rule conflict analysis Rule scheduling Level 2 synthesis James Hoe & Arvind @MIT 1997-2000 Object code (Verilog/C) http://csg.csail.mit.edu/SNU January 17, 2011

Static Elaboration At compile time Software Toolflow: Hardware Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most data structure operations source Software Toolflow: source Hardware Toolflow: design2 design3 design1 elaborate w/params .exe compile run1 run2.1 … run3.1 run1.1 run w/ params run w/ params run1 … run http://csg.csail.mit.edu/SNU January 17, 2011

Combinational IFFT … * + - *j t2 t0 t3 t1 Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute * + - *j t2 t0 t3 t1 Constants t0 to t3 are different for each box and can have dramatci impact on optimizations. All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ... http://csg.csail.mit.edu/SNU January 17, 2011

4-way Butterfly Node BSV has a very strong notion of types * + - *i t0 k0 k1 k2 k3 function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); BSV has a very strong notion of types Every expression has a type. Either it is declared by the user or automatically deduced by the compiler The compiler verifies that the type declarations are compatible http://csg.csail.mit.edu/SNU January 17, 2011

BSV code: 4-way Butterfly function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); Vector#(4,Complex) m, y, z; m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction * + - *i m y z Polymorphic code: works on any type of numbers for which *, + and - have been defined Note: Vector does not mean storage http://csg.csail.mit.edu/SNU January 17, 2011

Complex Arithmetic Addition Multiplication zR = xR + yR zI = xI + yI Multiplication zR = xR * yR - xI * yI zR = xR * yI + xI * yR The actual arithmetic for FFT is different because we use a non-standard fixed point representation http://csg.csail.mit.edu/SNU January 17, 2011

BSV code for Addition typedef struct{ Int#(t) r; Int#(t) i; } Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction What is the type of this + ? http://csg.csail.mit.edu/SNU January 17, 2011

Combinational IFFT … stage_f function Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute stage_f function function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); repeat stage_f three times http://csg.csail.mit.edu/SNU January 17, 2011

BSV Code: Combinational IFFT function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); The for-loop is unfolded and stage_f is inlined during static elaboration Note: no notion of loops or procedures during execution http://csg.csail.mit.edu/SNU January 17, 2011

BSV Code: Combinational IFFT- Unfolded function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); stage_data[1] = stage_f(0,stage_data[0]); stage_data[2] = stage_f(1,stage_data[1]); stage_data[3] = stage_f(2,stage_data[2]); Stage_f can be inlined now; it could have been inlined before loop unfolding also. Does the order matter? http://csg.csail.mit.edu/SNU January 17, 2011

Bluespec Code for stage_f function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); twid’s are mathematically derivable constants http://csg.csail.mit.edu/SNU January 17, 2011

Higher-order functions: Stage functions f1, f2 and f3 function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); function f2(x); return (stage_f(2,x)); What is the type of f0(x) ? function Vector#(64, Complex) f0 (Vector#(64, Complex) x); http://csg.csail.mit.edu/SNU January 17, 2011

Suppose we want to reuse some part of the circuit ... Reuse the same circuit three times to reduce area in0 … in1 in2 in63 in3 in4 Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute But why? http://csg.csail.mit.edu/SNU January 17, 2011

Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter January 17, 2011 http://csg.csail.mit.edu/SNU

802.11a Transmitter Overview headers Must produce one OFDM symbol every 4 msec Controller Scrambler Encoder 24 Uncoded bits data Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol Interleaver Mapper IFFT Cyclic Extend One OFDM symbol (64 Complex Numbers) IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers http://csg.csail.mit.edu/SNU accounts for 85% area January 17, 2011

Preliminary results [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind Design Lines of Block Code (BSV) Controller 49 Scrambler 40 Conv. Encoder 113 Interleaver 76 Mapper 112 IFFT 95 Cyc. Extender 23 Relative Area 0% 1% 11% 85% 3% Complex arithmetic libraries constitute another 200 lines of code http://csg.csail.mit.edu/SNU January 17, 2011

Combinational IFFT … Reuse the same circuit three times to reduce area Bfly4 x16 out0 out1 out2 out63 out3 out4 Permute http://csg.csail.mit.edu/SNU January 17, 2011

Design Alternatives Reuse a block over multiple cycles we expect: f g f g we expect: Throughput to Area to decrease – less parallelism decrease – reusing a block The clock needs to run faster for the same throughput  hyper-linear increase in energy http://csg.csail.mit.edu/SNU January 17, 2011

Circular pipeline: Reusing the Pipeline Stage … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter http://csg.csail.mit.edu/SNU January 17, 2011

Superfolded circular pipeline: Just one Bfly-4 node! … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Permute Bfly4 64, 2-way Muxes Stage 0 to 2 4, 16-way Muxes Index: 0 to 15 4, 16-way DeMuxes Index == 15? http://csg.csail.mit.edu/SNU January 17, 2011

Pipelining a block inQ outQ f2 f1 f3 Combinational C inQ outQ f2 f1 f3 Pipeline P inQ outQ f Folded Pipeline FP Clock: C < P  FP Clock? Area? Throughput? Area: FP < C < P Throughput: FP < C < P http://csg.csail.mit.edu/SNU January 17, 2011

Synchronous pipeline x sReg0 inQ f0 f1 f2 sReg1 outQ rule sync-pipeline (True); inQ.deq(); sReg0 <= f0(inQ.first()); sReg1 <= f1(sReg0); outQ.enq(f2(sReg1)); endrule This rule can fire only if - inQ has an element - outQ has space Atomicity: Either all or none of the state elements inQ, outQ, sReg0 and sReg1 will be updated This is real IFFT code; just replace f0, f1 and f2 with stage_f code http://csg.csail.mit.edu/SNU January 17, 2011

Stage functions f1, f2 and f3 function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); function f2(x); return (stage_f(2,x)); The stage_f function was given earlier http://csg.csail.mit.edu/SNU January 17, 2011

Problem: What about pipeline bubbles? x sReg0 inQ f0 f1 f2 sReg1 outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg0); outQ.enq(f2(sReg1)); endrule Red and Green tokens must move even if there is nothing in the inQ! Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type Modify the rule to deal with these conditions http://csg.csail.mit.edu/SNU January 17, 2011

The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg0 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg0 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg1 <= tagged Valid f1(sx1); tagged Invalid: sReg1 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcase endrule sx1 will get bound to the appropriate part of sReg1 http://csg.csail.mit.edu/SNU January 17, 2011

Folded pipeline for FFT Next lecture Folded pipeline for FFT January 17, 2011 http://csg.csail.mit.edu/SNU