Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

BSV execution model and concurrent rule scheduling Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February.
Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
An EHR based methodology for Concurrency management Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Combinational Circuits in Bluespec
November 2, 2006http://csg.csail.mit.edu/6.827/L15-1 An hardware inspired model for parallel programming Arvind Computer Science & Artificial Intelligence.
June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 15, 2010.
March, 2007Intro-1http://csg.csail.mit.edu/arvind Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
March, 2007http://csg.csail.mit.edu/arvindIFFT-1 Combinational Circuits: IFFT, Types, Parameterization... Arvind Computer Science & Artificial Intelligence.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011L03-1
Bluespec-3: A non-pipelined processor Arvind
Concurrency properties of BSV methods and rules
Folded Combinational Circuits as an example of Sequential Circuits
Folded “Combinational” circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Architectural Exploration:
Sequential Circuits: Constructive Computer Architecture
Combinational Circuits in Bluespec
FFT: An example of complex combinational circuits
Combinational Circuits in Bluespec
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
Constructive Computer Architecture: Well formed BSV programs
Combinational Circuits and Simple Synchronous Pipelines
Combinational Circuits and Simple Synchronous Pipelines
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Modular Refinement - 2 Arvind
Combinational Circuits in Bluespec
Elastic Pipelines and Basics of Multi-rule Systems
FFT: An example of complex combinational circuits
Constructive Computer Architecture: Guards
Elastic Pipelines and Basics of Multi-rule Systems
Pipelined Processors Constructive Computer Architecture: Arvind
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Pipelining combinational circuits
Constructive Computer Architecture: Well formed BSV programs
Architectural Exploration:
Simple Synchronous Pipelines
Presentation transcript:

Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2012L05-1http://csg.csail.mit.edu/6.S078

Combinational IFFT: Suppose we want to reduce the area of the circuit in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute Reuse the same circuit three times to reduce area Folding February 22, 2012 L05-2http://csg.csail.mit.edu/6.S078

fg Reusing a combinational block we expect: Throughput to Area to ff g decrease – less parallelism The clock needs to run faster for the same throughput  hyper-linear increase in energy decrease – reusing a block February 21, 2012 L04-3http://csg.csail.mit.edu/6.S078 Introduce state elements to hold intermediate values

Circular or folded pipeline: Reusing the Pipeline Stage in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter February 22, 2012 L05-4http://csg.csail.mit.edu/6.S078

Folded pipeline rule folded-pipeline (True); if (stage==0) begin sxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1; endrule x sReg inQ f outQ stage no for- loop Need type declarations for sxIn and sxOut February 22, 2012 L05-5http://csg.csail.mit.edu/6.S078

BSV Code for stage_f function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); February 22, 2012 L05-6http://csg.csail.mit.edu/6.S078

Folded pipeline-multiple rules another way of expressing the same computation rule foldedEntry if (stage==0); sReg <= f(stage, inQ.first()); stage <= stage+1; inQ.deq(); endrule rule foldedCirculate if (stage!=0)&(stage<(n-1)); sReg <= f(stage, sReg); stage <= stage+1; endrule rule foldedExit if (stage==n-1); outQ.enq(f(stage, sReg)); stage <= 0; endrule x sReg inQ f outQ stage Disjoint firing conditions February 22, 2012 L05-7http://csg.csail.mit.edu/6.S078

Superfolded circular pipeline: use just one Bfly-4 node! in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Bfly4 Permute Index == 15? Index: 0 to 15 64, 2-way Muxes 4, 16-way Muxes 4, 16-way DeMuxes Stage 0 to 2 February 21, 2012 L04-8http://csg.csail.mit.edu/6.S078 Lab 3

Combinational IFFT Lot of area and long combinational delay Folded or multi-cycle version can save area and reduce the combinational delay but throughput per clock cycle gets worse Pipelining: a method to increase the circuit throughput to evaluate multiple IFFTs in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute February 22, 2012 L05-9http://csg.csail.mit.edu/6.S078

Pipelining a block inQoutQ f2f1 f3 Combinational C inQoutQ f2f1 f3 Pipeline P inQoutQ f Folded Pipeline FP Clock? Area?Throughput? February 22, 2012 L05-10http://csg.csail.mit.edu/6.S078

Inelastic pipeline x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule This rule can fire only if February 22, 2012 L05-11http://csg.csail.mit.edu/6.S078

Stage functions f1, f2 and f3 function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction The stage_f function was given earlier February 22, 2012 L05-12http://csg.csail.mit.edu/6.S078

Problem: What about pipeline bubbles? x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule February 22, 2012 L05-13http://csg.csail.mit.edu/6.S078

Explicit encoding of Valid/Invalid data rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= f0(inQ.first()); inQ.deq(); sReg1f <= Valid end else sReg1f <= Invalid; sReg2 <= f1(sReg1); sReg2f <= sReg1f; if (sReg2f == Valid) outQ.enq(f2(sReg2)); endrule sReg1 sReg2 x inQ f0f1f2 outQ Valid/Invalid February 22, 2012 L05-14http://csg.csail.mit.edu/6.S078

When is this rule enabled? inQ sReg1f sReg2foutQ NEVVNF NEVVF NEVINF NEVIF NEIVNF NEIVF NEIINF NEIIF EVVNF EVVF EVINF EVIF EIVNF EIVF EIINF EIIF Yes Yes1 = yes but no change inQ sReg1f sReg2foutQ February 22, 2012 L05-15http://csg.csail.mit.edu/6.S078 rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= f0(inQ.first()); inQ.deq(); sReg1f <= Valid end else sReg1f <= Invalid; sReg2 <= f1(sReg1); sReg2f <= sReg1f; if (sReg2f == Valid) outQ.enq(f2(sReg2)); endrule

Area estimates Tool: Synopsys Design Compiler Comb. FFT Combinational area:16536 Noncombinational area: 9279 Linear FFT Combinational area: Noncombinational area: Circular FFT Combinational area: Noncombinational area: February 22, 2012 L05-16http://csg.csail.mit.edu/6.S078 Surprising? Explanation?

The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid.sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid.sx2: outQ.enq(f2(sx2)); endcase endrule February 22, 2012 L05-17http://csg.csail.mit.edu/6.S078