Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.

Slides:



Advertisements
Similar presentations
Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Advertisements

March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Combinational Circuits in Bluespec
November 2, 2006http://csg.csail.mit.edu/6.827/L15-1 An hardware inspired model for parallel programming Arvind Computer Science & Artificial Intelligence.
June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
Computer Architecture: A Constructive Approach Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
Constructive Computer Architecture Combinational circuits-2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 15, 2010.
Multiple Clock Domains (MCD) Continued … Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology November.
March, 2007http://csg.csail.mit.edu/arvindIFFT-1 Combinational Circuits: IFFT, Types, Parameterization... Arvind Computer Science & Artificial Intelligence.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Constructive Computer Architecture Combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011L03-1
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Overview Logistics Last lecture Today HW5 due today
Canvas and Arrays in Apps
Functions and Types in Bluespec
Bluespec-3: A non-pipelined processor Arvind
An Iterative FFT We rewrite the loop to calculate nkyk[1] once
Folded Combinational Circuits as an example of Sequential Circuits
Folded “Combinational” circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Architectural Exploration:
Sequential Circuits: Constructive Computer Architecture
Combinational ALU Constructive Computer Architecture Arvind
Combinational Circuits in Bluespec
Combinational Circuits in Bluespec
FFT: An example of complex combinational circuits
Combinational Circuits in Bluespec
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
Constructive Computer Architecture: Well formed BSV programs
Combinational Circuits and Simple Synchronous Pipelines
Combinational Circuits and Simple Synchronous Pipelines
Combinational circuits
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Combinational Circuits in Bluespec
FFT: An example of complex combinational circuits
Constructive Computer Architecture: Guards
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Pipelining combinational circuits
Architectural Exploration:
Simple Synchronous Pipelines
Combinational ALU 6.S078 - Computer Architecture:
Presentation transcript:

Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 21, 2012L04-1http://csg.csail.mit.edu/6.S078

Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power,... * * * * *j t2t2 t0t0 t3t3 t1t1 February 21, 2012 L04-2http://csg.csail.mit.edu/6.S078

4-way Butterfly Node function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) x); t’s are mathematically derivable constants for each bfly4 and depend upon the position of bfly4 the in the network The compiler verifies that the type declarations are compatible * * * * *i x0x1x2x3x0x1x2x3 t0t1t2t3t0t1t2t3 February 21, 2012 L04-3http://csg.csail.mit.edu/6.S078

BSV code: 4-way Butterfly function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) x); Vector#(4,Complex) m, y, z; m[0] = x[0] * t[0]; m[1] = x[1] * t[1]; m[2] = x[2] * t[2]; m[3] = x[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction * * * * *i my z Note: Vector does not mean storage February 21, 2012 L04-4http://csg.csail.mit.edu/6.S078

Complex Arithmetic Addition z R = x R + y R z I = x I + y I Multiplication z R = x R * y R - x I * y I z I = x R * y I + x I * y R The actual arithmetic for FFT is different because we use a non-standard fixed point representation February 21, 2012 L04-5http://csg.csail.mit.edu/6.S078

BSV code for Addition typedef struct{ Int#(t) r; Int#(t) i; } Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction What is the type of this + ? February 21, 2012 L04-6http://csg.csail.mit.edu/6.S078

Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute stage_f function repeat stage_f three times function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); February 21, 2012 L04-7http://csg.csail.mit.edu/6.S078

BSV Code: Combinational IFFT function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); The for-loop is unfolded and stage_f is inlined during static elaboration Note: no notion of loops or procedures during execution February 21, 2012 L04-8http://csg.csail.mit.edu/6.S078

BSV Code: Combinational IFFT- Unfolded function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); February 21, 2012 L04-9http://csg.csail.mit.edu/6.S078

Bluespec Code for stage_f function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); February 21, 2012 L04-10http://csg.csail.mit.edu/6.S078

Suppose we want to area of the circuit in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute Reuse the same circuit three times to reduce area February 21, 2012 L04-11http://csg.csail.mit.edu/6.S078

fg Reusing a combinational block we expect: Throughput to Area to ff g February 21, 2012 L04-12http://csg.csail.mit.edu/6.S078 Introduce state elements to hold intermediate values

Circular or folded pipeline: Reusing the Pipeline Stage in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter February 21, 2012 L04-13http://csg.csail.mit.edu/6.S078

Folded pipeline rule folded-pipeline (True); if (stage==0) begin sxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1; endrule x sReg inQ f outQ stage no for- loop Need type declarations for sxIn and sxOut February 14, 2011 L04-14http://csg.csail.mit.edu/6.375

Extras: may be useful for Lab 3 February 21, 2012http://csg.csail.mit.edu/6.S078L04-15

Superfolded circular pipeline: Just one Bfly-4 node! in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Bfly4 Permute Index == 15? Index: 0 to 15 64, 2-way Muxes 4, 16-way Muxes 4, 16-way DeMuxes Stage 0 to 2 February 21, 2012 L04-16http://csg.csail.mit.edu/6.S078

Superfolded pipeline One Bfly-4 case f will be invoked for 48 dynamic values of stage each invocation will modify 4 numbers in sReg after 16 invocations a permutation would be done on the whole sReg February 14, 2011 L04-17http://csg.csail.mit.edu/6.375

Superfolded pipeline: stage function f function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Bit#(2) stage Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); Bit#(2+4) (stage,i) should be done only when i=15 February 14, 2011 L04-18http://csg.csail.mit.edu/6.375

Code for the Superfolded pipeline stage function Function Vector#(64, Complex) f (Bit#(6) stagei, Vector#(64, Complex) stage_in); let i = stagei `mod` 16; let twid = getTwiddle(stagei `div` 16, i); let y = bfly4(twid, stage_in[i:i+3]); let stage_temp = stage_in; stage_temp[i] = y[0]; stage_temp[i+1] = y[1]; stage_temp[i+2] = y[2]; stage_temp[i+3] = y[3]; let stage_out = stage_temp; if (i == 15) for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); endfunction One Bfly-4 case February 14, 2011 L04-19http://csg.csail.mit.edu/6.375

Folded pipeline: stage function f The Twiddle constants can be expressed in a table or in a case expression stage getTwiddle0 getTwiddle1 getTwiddle2 twid The rest of stage_f, i.e. Bfly-4s and permutations (shared) sx February 14, 2011 L04-20http://csg.csail.mit.edu/6.375