Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.

Slides:



Advertisements
Similar presentations
BSV execution model and concurrent rule scheduling Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February.
Advertisements

Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
An EHR based methodology for Concurrency management Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science.
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Constructive Computer Architecture: FIFO Lab Comments Revisiting CF FIFOs Andy Wright TA October 20, 2014http://csg.csail.mit.edu/6.175L14-1.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Computer Architecture: A Constructive Approach Pipelining combinational circuits Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*, Massachusetts.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011L03-1
Computer Architecture: A Constructive Approach Bluespec execution model and concurrent rule scheduling Teacher: Yoav Etsion Taken (with permission) from.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Bluespec-3: A non-pipelined processor Arvind
Folded Combinational Circuits as an example of Sequential Circuits
Folded “Combinational” circuits
Sequential Circuits Constructive Computer Architecture Arvind
Architectural Exploration:
Sequential Circuits: Constructive Computer Architecture
Combinational Circuits in Bluespec
Combinational Circuits in Bluespec
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
EHR: Ephemeral History Register
Constructive Computer Architecture: Well formed BSV programs
Combinational Circuits and Simple Synchronous Pipelines
Combinational Circuits and Simple Synchronous Pipelines
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Modular Refinement - 2 Arvind
Combinational Circuits in Bluespec
Elastic Pipelines and Basics of Multi-rule Systems
Constructive Computer Architecture: Guards
Elastic Pipelines and Basics of Multi-rule Systems
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Modeling Processors Arvind
Modular Refinement Arvind
Pipelining combinational circuits
Constructive Computer Architecture: Well formed BSV programs
Architectural Exploration:
Simple Synchronous Pipelines
Presentation transcript:

Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1

Combinational IFFT Lot of area and long combinational delay Folded or multi-cycle version can save area and reduce the combinational delay but throughput per clock cycle gets worse Pipelining: a method to increase the circuit throughput by evaluating multiple IFFTs in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute IFFT i IFFT i-1 IFFT i+1 3 different datasets in the pipeline February 20,

Inelastic vs Elastic pipeline xx  fifo1  inQ f1f2f3  fifo2  outQ x sReg1inQ f0f1f2 sReg2outQ Inelastic: all pipeline stages move synchronously Elastic: A pipeline stage can process data if its input FIFO is not empty and output FIFO is not Full Most complex processor pipelines are a combination of the two styles February 20,

Inelastic vs Elastic Pipelines Inelastic pipeline: typically only one rule; the designer controls precisely which activities go on in parallel downside: The rule can get too complicated -- easy to make mistakes; difficult to make changes Elastic pipeline: several smaller rules, each easy to write, easier to make changes downside: sometimes rules do not fire concurrently when they should February 20,

Inelastic pipeline x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule This is real IFFT code; just replace f0, f1 and f2 with stage_f code This rule can fire only if Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated - inQ has an element - outQ has space February 20,

Inelastic pipeline Making implicit guard conditions explicit x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (!inQ.empty() && !outQ.full); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule Suppose sReg1 and sReg2 have data, outQ is not full but inQ is empty. What behavior do you expect? Leave green and red data in the pipeline? February 20,

Pipeline bubbles x sReg1inQ f0f1f2 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule Red and Green tokens must move even if there is nothing in inQ! Modify the rule to deal with these conditions Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type February 20,

Explicit encoding of Valid/Invalid data rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= f0(inQ.first()); inQ.deq(); sReg1f <= Valid end else sReg1f <= Invalid; sReg2 <= f1(sReg1); sReg2f <= sReg1f; if (sReg2f == Valid) outQ.enq(f2(sReg2)); endrule sReg1 sReg2 x inQ f0f1f2 outQ Valid/Invalid February 20, typedef union tagged {void Valid; void Invalid; } Validbit deriving (Eq, Bits);

When is this rule enabled? inQ sReg1f sReg2foutQ NEVVNF NEVVF NEVINF NEVIF NEIVNF NEIVF NEIINF NEIIF EVVNF EVVF EVINF EVIF EIVNF EIVF EIINF EIIF yes No Yes No Yes yes No Yes No Yes1 yes Yes1 = yes but no change inQ sReg1f sReg2foutQ rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= f0(inQ.first()); inQ.deq(); sReg1f <= Valid end else sReg1f <= Invalid; sReg2 <= f1(sReg1); sReg2f <= sReg1f; if (sReg2f == Valid) outQ.enq(f2(sReg2)); endrule sReg1 sReg2 inQ f0 f1f2 outQ February 20,

The Maybe type A useful type to capture valid/invalid data typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values Some useful functions on Maybe type: isValid(x) returns true if x is Valid fromMaybe(d,x) returns the data value in x if x is Valid the default value d if x is Invalid February 20,

Using the Maybe type typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline if (True); if (inQ.notEmpty()) begin sReg1 <= Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= Invalid; sReg2 <= isValid(sReg1)? Valid f1(fromMaybe(d, sReg1)) : Invalid; if isValid(sReg2) outQ.enq(f2(fromMaybe(d, sReg2))); endrule February 20,

Pattern-matching: An alternative syntax to extract datastructure components The &&& is a conjunction, and allows pattern-variables to come into scope from left to right case (m) matches tagged Invalid : return 0; tagged Valid.x : return x; endcase if (m matches (Valid.x) &&& (x > 10)) typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); x will get bound to the appropriate part of m February 20,

The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline if (True); if (inQ.notEmpty()) begin sReg1 <= Valid (f0(inQ.first())); inQ.deq(); end else sReg1 <= Invalid; case (sReg1) matches tagged Valid.sx1: sReg2 <= Valid f1(sx1); tagged Invalid: sReg2 <= Invalid; endcase case (sReg2) matches tagged Valid.sx2: outQ.enq(f2(sx2)); endcase endrule sx1 will get bound to the appropriate part of sReg1 February 20,

Generalization: n-stage pipeline rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg[0]<= Valid f(1,inQ.first());inQ.deq();end else sReg[0]<= Invalid; for(Integer i = 1; i < n-1; i=i+1) begin case (sReg[i-1]) matches tagged Valid.sx: sReg[i] <= Valid f(i-1,sx); tagged Invalid: sReg[i] <= Invalid; endcase end case (sReg[n-2]) matches tagged Valid.sx: outQ.enq(f(n-1,sx)); endcase endrule sReg[0]inQsReg[1]outQ x f(0)f(1) f(2) f(n-1)... sReg[n-2] February 20,

Elastic pipeline Use FIFOs instead of pipeline registers xx  fifo1  inQ f1f2f3  fifo2  outQ rule stage1 if (True);  fifo1.enq(f1(inQ.first());  inQ.deq();endrule rule stage2 if (True); fifo2.enq(f2(fifo1.first());  fifo1.deq();endrule rule stage3 if (True);  outQ.enq(f3(fifo2.first());  fifo2.deq();endrule What is the firing condition for each rule? Can tokens be left inside the pipeline? No need for Maybe types February 20,

Firing conditions for reach rule xx  fifo1  inQ f1f2f3  fifo2  outQ inQ fifo1 fifo2outQ NENE,NFNE,NFNF NENE,NFNE,NFF NENE,NFNE,FNF NENE,NFNE,FF …. YesYesYes YesYesNo YesNoYes YesNoNo …. rule1 rule2 rule3 This is the first example we have seen where multiple rules may be ready to execute concurrently Can we execute multiple rules together? February 20,

Informal analysis xx  fifo1  inQ f1f2f3  fifo2  outQ inQ fifo1 fifo2outQ NENE,NFNE,NFNF NENE,NFNE,NFF NENE,NFNE,FNF NENE,NFNE,FF …. YesYesYes YesYesNo YesNoYes YesNoNo …. rule1 rule2 rule3 FIFOs must permit concurrent enq and deq for all three rules to fire concurrently February 20,

Concurrency when the FIFOs do not permit concurrent enq and deq xx fifo1inQ f1f2f3 fifo2outQ not empty  & not full not empty  & not full At best alternate stages in the pipeline will be able to fire concurrently February 20,

Pipelined designs expressed using Multiple rules If rules for different pipeline stages never fire in the same cycle then the design can hardly be called a pipelined design If all the enabled rules fire in parallel every cycle then, in general, wrong results can be produced We need a clean model for concurrent firing of rules February 20,

BSV Rule Execution A BSV program consists of state elements and rules, aka, Guarded Atomic Actions (GAA) that operate on the state elements Application of a rule modifies some state elements of the system in a deterministic manner current state next state values next state computation f x f x guard reg en’s nextState AND February 20,

BSV Execution Model Repeatedly: Select a rule to execute Compute the state updates Make the state updates Highly non- deterministic User annotations can help in rule selection February 20,

One-rule-at-time-semantics The legal behavior of a BSV program can always be explained by observing the state updates obtained by applying only one rule at a time Implementation concern: Schedule multiple rules concurrently without violating one-rule-at-a-time semantics February 20,

Guard lifting For concurrent scheduling of rules, we only need to consider those rules that can be concurrently enabled, i.e., whose guards are true In order to understand when a rule can be enabled, we need to understand precisely how implicit guards are lifted precisely to form the rule guard February 20, 2013 L05-23http://csg.csail.mit.edu/6.375

Making guards explicit rule foo if (True); if (p) fifo.enq(8); r <= 7; endrule rule foo if ((p && fifo.notFull) || !p); if (p) fifo.enq(8); r <= 7; endrule Effectively, all implicit conditions (guards) are lifted and conjoined to the rule guard February 20,

Implicit guards (conditions) rule if ( ); ; endrule make implicit guards explicit m.g B ( ) when m.g G ::= r | if ( ) | ; | m.g( ) | t = ::= r | if ( ) | when ( ) | ; | m.g B ( ) | t = February 20,

Guards vs If’s A guard on one action of a parallel group of actions affects every action within the group (a1 when p1); a2 ==> (a1; a2) when p1 A condition of a Conditional action only affects the actions within the scope of the conditional action (if (p1) a1); a2 p1 has no effect on a2... Mixing ifs and whens (if (p) (a1 when q)) ; a2  ((if (p) a1); a2) when ((p && q) | !p)  ((if (p) a1); a2) when (q | !p) February 20,

Guard Lifting rules All the guards can be “lifted” to the top of a rule (a1 when p) ; a2 a1 ; (a2 when p) if (p when q) a if (p) (a when q) (a when p1) when p2 x <= (e when p) similarly for expressions... Rule r (a when p)  Can you prove that using these rules all the guards will be lifted to the top of an action?  (a1 ; a2) when p  (if (p) a) when q  (if (p) a) when (q | !p)  a when (p1 & p2)  (x <= e) when p  Rule r (if (p) a) February 20,

BSV provides a primitive (impCondOf) to make guards explicit and lift them to the top From now on in concurrency discussions we will assume that all guards have been lifted to the top in every rule February 20, 2013http://csg.csail.mit.edu/6.375L05-28