Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.

Slides:



Advertisements
Similar presentations
Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Advertisements

March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
November 2, 2006http://csg.csail.mit.edu/6.827/L15-1 An hardware inspired model for parallel programming Arvind Computer Science & Artificial Intelligence.
June 5, Architectural Exploration: a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
March, 2007http://csg.csail.mit.edu/arvindIFFT-1 Combinational Circuits: IFFT, Types, Parameterization... Arvind Computer Science & Artificial Intelligence.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
March 1, 2006http://csg.csail.mit.edu/6.375/L09-1 Bluespec-3: Architecture exploration using static elaboration Arvind Computer Science & Artificial Intelligence.
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011L03-1
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Elastic Pipelines: Concurrency Issues
EHRs: Designing modules with concurrent methods
Bluespec-3: A non-pipelined processor Arvind
Folded Combinational Circuits as an example of Sequential Circuits
Folded “Combinational” circuits
Scheduling Constraints on Interface methods
Sequential Circuits Constructive Computer Architecture Arvind
Architectural Exploration:
Sequential Circuits: Constructive Computer Architecture
Combinational Circuits in Bluespec
FFT: An example of complex combinational circuits
Combinational Circuits in Bluespec
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
EHR: Ephemeral History Register
Constructive Computer Architecture: Well formed BSV programs
Combinational Circuits and Simple Synchronous Pipelines
Combinational Circuits and Simple Synchronous Pipelines
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Elastic Pipelines: Concurrency Issues
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Modular Refinement - 2 Arvind
Combinational Circuits in Bluespec
IP Lookup: Some subtle concurrency issues
Elastic Pipelines: Concurrency Issues
Elastic Pipelines and Basics of Multi-rule Systems
FFT: An example of complex combinational circuits
Constructive Computer Architecture: Guards
Elastic Pipelines and Basics of Multi-rule Systems
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Pipelining combinational circuits
Constructive Computer Architecture: Well formed BSV programs
Architectural Exploration:
Simple Synchronous Pipelines
Presentation transcript:

Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1

Pipelining a block inQoutQ f2f1 f3 Combinational C inQoutQ f2f1 f3 Pipeline P inQoutQ f Folded Pipeline FP Clock? Area?Throughput? Clock: C < P  FP Area: FP < C < PThroughput: FP < C < P February 14, 2011 L04-2http://csg.csail.mit.edu/6.375

Inelastic Pipeline x sReg1inQ f1f2f3 sReg2outQ rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f1(inQ.first()); inQ.deq();end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid.sx1: sReg2 <= tagged Valid f2(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid.sx2: outQ.enq(f3(sx2)); endcase endrule Both Registers hold values of Maybe type February 14, 2011 L04-3http://csg.csail.mit.edu/6.375

When is this rule enabled? rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid.sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid.sx2: outQ.enq(f2(sx2)); endcase endrule inQsReg1sReg2outQ NEVVNF NEVVF NEVINF NEVIF NEIVNF NEIVF NEIINF NEIIF EVVNF EVVF EVINF EVIF EIVNF EIVF EIINF EIIF yes No Yes No Yes yes No Yes No Yes1 yes Yes1 = yes but no change inQsReg1sReg2outQ February 14, 2011 L04-4http://csg.csail.mit.edu/6.375

Pipelining a block inQoutQ f2f1 f3 Combinational C inQoutQ f2f1 f3 Pipeline P inQoutQ f Folded Pipeline FP Clock? Area?Throughput? Clock: C < P  FP Area: FP < C < PThroughput: FP < C < P February 14, 2011 L04-5http://csg.csail.mit.edu/6.375

Folded pipeline rule folded-pipeline (True); if (stage==0) begin sxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1; endrule x sReg inQ f outQ stage notice stage is a dynamic parameter now! no for- loop Need type declarations for sxIn and sxOut February 14, 2011 L04-6http://csg.csail.mit.edu/6.375

Superfolded pipeline One Bfly-4 case f will be invoked for 48 dynamic values of stage each invocation will modify 4 numbers in sReg after 16 invocations a permutation would be done on the whole sReg February 14, 2011 L04-7http://csg.csail.mit.edu/6.375

Superfolded pipeline: stage function f function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Bit#(2) stage Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); Bit#(2+4) (stage,i) should be done only when i=15 February 14, 2011 L04-8http://csg.csail.mit.edu/6.375

Code for the Superfolded pipeline stage function Function Vector#(64, Complex) f (Bit#(6) stagei, Vector#(64, Complex) stage_in); let i = stagei `mod` 16; let twid = getTwiddle(stagei `div` 16, i); let y = bfly4(twid, stage_in[i:i+3]); let stage_temp = stage_in; stage_temp[i] = y[0]; stage_temp[i+1] = y[1]; stage_temp[i+2] = y[2]; stage_temp[i+3] = y[3]; let stage_out = stage_temp; if (i == 15) for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); endfunction One Bfly-4 case February 14, 2011 L04-9http://csg.csail.mit.edu/6.375

Folded pipeline: stage function f The Twiddle constants can be expressed in a table or in a case or nested case expression stage getTwiddle0 getTwiddle1 getTwiddle2 twid The rest of stage_f, i.e. Bfly-4s and permutations (shared) sx February 14, 2011 L04-10http://csg.csail.mit.edu/6.375

802.11a Transmitter [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind Design Lines of Relative Block Code (BSV)Area Controller 49 0% Scrambler 40 0% Conv. Encoder 113 0% Interleaver 76 1% Mapper % IFFT 95 85% Cyc. Extender 23 3% Complex arithmetic libraries constitute another 200 lines of code February 14, 2011 L04-11http://csg.csail.mit.edu/6.375

802.11a Transmitter Synthesis results (Only the IFFT block is changing) IFFT DesignArea (mm 2 ) Throughput Latency (CLKs/sym) Min. Freq Required Pipelined MHz Combinational MHz Folded (16 Bfly-4s) MHz Super-Folded (8 Bfly-4s) MHz SF(4 Bfly-4s) MHz SF(2 Bfly-4s) MHz SF (1 Bfly4) MHZ TSMC.18 micron; numbers reported are before place and route. The same source code All these designs were done in less than 24 hours! February 14, 2011 L04-12http://csg.csail.mit.edu/6.375

Why are the areas so similar Folding should have given a 3x improvement in IFFT area BUT a constant twiddle allows low- level optimization on a Bfly-4 block a 2.5x area reduction! February 14, 2011 L04-13http://csg.csail.mit.edu/6.375

Elastic pipeline Use FIFOs instead of pipeline registers xx  fifo1  inQ f1f2f3  fifo2  outQ  rule stage1 (True);  fifo1.enq(f1(inQ.first());  inQ.deq();endrule  rule stage2 (True); fifo2.enq(f2(fifo1.first());  fifo1.deq();endrule  rule stage3 (True);  outQ.enq(f3(fifo2.first());  fifo2.deq();endrule Firing conditions? Can tokens be left inside the pipeline? Easier to write? No Maybe types? Can all three rules fire concurrently? February 14, 2011 L04-14http://csg.csail.mit.edu/6.375

What behavior do we want? xx  fifo1  inQ f1f2f3  fifo2  outQ Maximize concurrency - Fire maximum number of rules inQ fifo1 fifo2outQ NENE,NFNE,NFNF NENE,NFNE,NFF NENE,NFNE,FNF NENE,NFNE,FF …. YesYesYes YesYesNo YesNoYes YesNoNo …. rule1 rule2 rule3 FIFOs must permit concurrent enq and deq for the three rules to fire concurrently February 14, 2011 L04-15http://csg.csail.mit.edu/6.375

module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethod endmodule One-Element FIFO enq and deq cannot even be enabled together much less fire concurrently! nn  not empty  not full  rdy  enab  rdy  enab enq deq FIFO module More on FIFOs in the next lecture February 14, 2011 L04-16http://csg.csail.mit.edu/6.375

Concurrency when the FIFOs do not permit concurrent enq and deq xx fifo1inQ f1f2f3 fifo2outQ not empty  & not full not empty  & not full At best alternate stages in the pipeline will be able to fire concurrently February 14, 2011 L04-17http://csg.csail.mit.edu/6.375

Inelastic vs Elastic Pipelines In an Inelastic pipeline: typically only one rule; the designer controls precisely which activities go on in parallel downside: The rule can get too complicated -- easy to make a mistake; difficult to make changes In an Elastic pipeline: several smaller rules, each easy to write, easier to make changes downside: sometimes rules do not fire concurrently when they should February 14, 2011 L04-18http://csg.csail.mit.edu/6.375

The tension If multiple rules never fire in the same cycle then the machine can hardly be called a pipelined machine If all rules fire in parallel every cycle when they are enabled, then, in general, wrong results can be produced More on this in the next lecture February 14, 2011 L04-19http://csg.csail.mit.edu/6.375

Language notes Pattern matching syntax Vector syntax Implicit conditions Static vs dynamic expression February 14, 2011 L04-20http://csg.csail.mit.edu/6.375

Pattern-matching: A convenient way to extract datastructure components The &&& is a conjunction, and allows pattern-variables to come into scope from left to right case (m) matches tagged Invalid : return 0; tagged Valid.x : return x; endcase if (m matches (Valid.x) &&& (x > 10)) typedef union tagged { void Invalid; t Valid; } Maybe#(type t); x will get bound to the appropriate part of m February 14, 2011 L04-21http://csg.csail.mit.edu/6.375

Syntax: Vector of Registers Register suppose x and y are both of type Reg. Then x <= y means x._write(y._read()) Vector of Int x[i] means sel(x,i) x[i] = y[j] means x = update(x,i, sel(y,j)) Vector of Registers x[i] <= y[j] does not work. The parser thinks it means (sel(x,i)._read)._write(sel(y,j)._read), which will not type check (x[i]) <= y[j] parses as sel(x,i)._write(sel(y,j)._read), and works correctly Don’t ask me why February 14, 2011 L04-22http://csg.csail.mit.edu/6.375

Making guards explicit rule recirculate (True); if (p) fifo.enq(8); r <= 7; endrule rule recirculate ((p && fifo.enq G ) || !p); if (p) fifo.enq B (8); r <= 7; endrule Effectively, all implicit conditions (guards) are lifted and conjoined to the rule guard February 14, 2011 L04-23http://csg.csail.mit.edu/6.375

Implicit guards (conditions) Rule rule ( ); ; endrule where ::= r | m.g( ) | if ( ) endif | ; make implicit guards explicit m.g B ( ) when m.g G February 14, 2011 L04-24http://csg.csail.mit.edu/6.375

Guards vs If’s A guard on one action of a parallel group of actions affects every action within the group (a1 when p1); (a2 when p2) ==> (a1; a2) when (p1 && p2) A condition of a Conditional action only affects the actions within the scope of the conditional action (if (p1) a1); a2 p1 has no effect on a2... Mixing ifs and whens (if (p) (a1 when q)) ; a2  ((if (p) a1); a2) when ((p && q) | !p) February 14, 2011 L04-25http://csg.csail.mit.edu/6.375

Static vs dynamic expressions Expressions that can be evaluated at compile time will be evaluated at compile-time 3+4  7 Some expressions do not have run-time representations and must be evaluated away at compile time; an error will occur if the compile-time evaluation does not succeed Integers, reals, loops, lists, functions, … February 14, 2011 L04-26http://csg.csail.mit.edu/6.375

Generalization: n-stage pipeline rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg[0]<= tagged Valid f(1,inQ.first());inQ.deq();end else sReg[0]<= tagged Invalid; for(Integer i = 1; i < n-1; i=i+1) begin case (sReg[i-1]) matches tagged Valid.sx: sReg[i] <= tagged Valid f(i-1,sx); tagged Invalid: sReg[i] <= tagged Invalid; endcase end case (sReg[n-2]) matches tagged Valid.sx: outQ.enq(f(n-1,sx)); endcase endrule sReg[0]inQsReg[1]outQ x f(0)f(1) f(2) f(n-1)... sReg[n-2] February 14, 2011 L04-27http://csg.csail.mit.edu/6.375

Next lecture Concurrency analysis February 14, 2011L