Implementing for Correct Concurrency Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 9, 2011 http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375
Dealing with Conflicts When do conflicts arise? How do we Analyze them? How do we fix them? How do we make sure we’re okay? http://csg.csail.mit.edu/6.375 March 9, 2011
SFIFO n = # of bits needed to represent the values of type “t“ interface SFIFO#(type t, type tr, type v); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Maybe#(v) find(tr); // search FIFO endinterface n enab n = # of bits needed to represent the values of type “t“ m = # of bits needed values of type “tr“ v = # of bits needed values of type “v“ enq rdy not full enab rdy deq SFIFO module not empty n first not empty rdy enab clear V m bool find http://csg.csail.mit.edu/6.375 March 9, 2011 3
Processor Example CPU pc rf fetch decode execute memory write- back iMem dMem CPU 5 – stage Processor. 1 element FIFOs in between stages Let’s add bypassing http://csg.csail.mit.edu/6.375 March 9, 2011 4
Search through each place in design Decode Rule Decode is also correct correct anytime it’s allowed to execute rule decode (!newStallFunc(instr, d2eQ, e2mQ, m2wQ)); let fetInst = f2dQ.first(); f2dQ.deq(); match {.ra, .rb} = getRARB(fetInst); let va0 = rf[ra]; let va1 = fromMaybe (m2wQ.find(ra), va0); let va2 = fromMaybe (e2mQ.find(ra), va1); let vb0 = rf[rb]; let vb1 = fromMaybe (m2wQ.find(rb), vb0); let vb2 = fromMaybe (e2mQ.find(rb), vb1); let newInst = case (fetInst) match Add: return (DAdd .va2 .vb2); … endcase; d2eQ.enq(newInst); endrule Search through each place in design When do we want it to execute? http://csg.csail.mit.edu/6.375 March 9, 2011
some insight into Concurrent rule firing Rules Ri Rj Rk rule steps Rj HW Rk clocks Ri There are more intermediate states in the rule semantics (a state after each rule step) In the HW, states change only at clock edges http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 6
Parallel execution reorders reads and writes Rules rule steps reads writes reads writes reads writes reads writes reads writes reads writes reads writes clocks HW In the rule semantics, each rule sees (reads) the effects (writes) of previous rules In the HW, rules only see the effects from previous clocks, and only affect subsequent clocks http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 7
Correctness Rules Ri Rj Rk rule steps Rj HW Rk clocks Ri Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution Consequence: the HW can never reach a state unexpected in the rule semantics http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 8
Upshot Given the concurrency of method/rules in a system we can determine viable schedules Some variation do to applicability BUT we know what schedule we want (mostly) We should be able to back propagate results to submodules http://csg.csail.mit.edu/6.375 March 9, 2011
Determining Concurrency Properties http://csg.csail.mit.edu/6.375 March 9, 2011
Processor: Concurrencies pc rf fetch decode execute memory write- back iMem dMem CPU In-order: F < D < E < M < W Pipelined W < M < E < D < F http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 11
Concurrency requirements for Full Pipelining – Reg File fetch execute imem rf CPU decode memory pc write- back dMem In-Order RF: (D calls sub) < (W calls upd) Pipelined RF: (W calls upd) < (D calls sub) http://csg.csail.mit.edu/6.375 March 9, 2011
Concurrency requirements for Full Pipelining – FIFOs fetch execute imem rf CPU decode memory pc write- back dMem In-Order FIFOs: 1. m2wQ, e2mQ: find < enq < first < deq 2. d2eQ: find < enq < first < deq, clear Pipeline FIFOs: 3. m2wQ, e2mQ : first < deq < enq < find 4. d2eQ : first < deq < find < enq http://csg.csail.mit.edu/6.375 March 9, 2011
Constructing Appropriately concurrent submodules http://csg.csail.mit.edu/6.375 March 9, 2011
From Analysis to Design We need to create modules which behave as needed Construct modules using “unsafe” primitives to have “safe” behaviors Three major concepts: Use primitives which remove “false” concurrency orderings (e.g. ConfigRegs vs. Regs) Add RWires for forwarding values intra-cycle Reason carefully to assure that execution appears “atomic” http://csg.csail.mit.edu/6.375 March 9, 2011
ConfigReg and RWire mkConfigReg is a Reg without this restriction mkReg requires that read < write Allows us to read stale values (dangerous) RWire is a “wire” wset :: a -> Action writes wget :: Maybe#(a) returns written value if read happened. wset happens before wget each cycle http://csg.csail.mit.edu/6.375 March 9, 2011
Let’s implement some modules http://csg.csail.mit.edu/6.375 March 9, 2011
Processor Redux In-order: F < D < E < M < W pc rf fetch decode execute memory write- back iMem dMem CPU In-order: F < D < E < M < W Pipelined W < M < E < D < F http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 18
Concurrency: RegFile The standard library regfile is implemented using with concurrency (sub < upd) This handles the in-order case We need to build a RegisterFile for the pipelined case http://csg.csail.mit.edu/6.375 March 9, 2011
BypassRegFile module mkBypassRegFile(RegFile#(a,d)) #(d l, d h) provisos#(Bits(a,asz), Bits#(d,dsz)); RegFile#(a,d) rfInt <- mkRegFileWCF(l,h); RWire#(Tuple2#(a,d)) curWrite <- mkRWire(); method Action upd(a x, d v); rfInternal.upd(x,v); curWrite.wset(tuple2(x,v)); endmethod method d sub(a x); case (curWrite.wget()) matches tagged Valid {.wa, .wd} &&& wa == a: return wd; default: return rfInternal.sub(a); endcase endmethod endmodule http://csg.csail.mit.edu/6.375 March 9, 2011
Processor Redux In-order: F < D < E < M < W pc rf fetch decode execute memory write- back iMem dMem CPU In-order: F < D < E < M < W Pipelined W < M < E < D < F http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 21
One Element SFIFO (Naïve) module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; method t first() if (full); return (data); method Maybe#(v) find(tr r); return (full ? findf(r, data): Nothing); endmethod endmodule Concurrency: find < first < (enq C deq) http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 22
One Element SFIFO (In-Order d2eQ #1) find < first < enq < deq module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; method t first() if (full); return data; method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 23
One Element SFIFO (In-Order e2mQ, m2wQ #2) find < enq < first < deq One Element SFIFO (In-Order e2mQ, m2wQ #2) module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; method t first() if (full || isValid(enqv.wget())); return (fromMaybe(enqv.wget(), data)); method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 24
One Element Searchable SFIFO (Pipelined #3) first < deq < enq < find module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); RWire#(void) enqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; enqw.wset(x); endmethod method Action deq() if (full); full <= False; deqw.wset(?); method t first() if (full); return (data); method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r,data) : isValid(enqw.wget()) ? findf(r, fromMaybe(enqw.wget(),?)): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 25
One Element Searchable SFIFO (Pipelined #4) first < deq < find < enq module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqw.wset(?); method t first() if (full); return (data); method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r, data): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 26
One Element Searchable SFIFO (Pipelined #4) first < deq < find < enq module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full|| deqp); full <= True; data <= x; 12endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); method Maybe#(v) find(tr r); return (full&&!deqp) ? findf(r, data): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 27
Up-Down Counter http://csg.csail.mit.edu/6.375 March 9, 2011
Counter Module Interface interface Counter method Action up(); method Action down(); method Bit#(32) _read(); endinterface Concurrency: up and down should be independent http://csg.csail.mit.edu/6.375 March 9, 2011
Naïve Counter Example module mkCounter(Counter); Reg#(int) r <- mkReg(); method int _read(); return r; endmethod method Action up(); r <= r + 1; method Action down(); c <= r – 1; endmodule http://csg.csail.mit.edu/6.375 March 9, 2011
Counter Example module mkCounter(Counter); Reg#(int) r <- mkConfigReg(); RWire#(void) upW <- mkRWire(); RWire#(void) downW <- mkRWire(); method int _read(); return r; endmethod method Action up(); upW.wset(); endmethod method Action down(); downW.wset(); endmethod rule updateR(True); r <= r + (isValid( upW.wget()) ? 1 : 0) - (isValid(downW.wget()) ? 1 : 0); endrule endmodule What if want to call up then _read? http://csg.csail.mit.edu/6.375 March 9, 2011
Completion Buffer http://csg.csail.mit.edu/6.375 March 9, 2011
Completion buffer: Interface cbuf getResult getToken put (result & token) interface CBuffer#(type t); method ActionValue#(Token) getToken(); method Action put(Token tok, t d); method ActionValue#(t) getResult(); endinterface typedef Bit#(TLog#(n)) TokenN#(numeric type n); typedef TokenN#(16) Token; http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 33
IP-Lookup module with the completion buffer done? RAM fifo enter getResult cbuf yes no getToken module mkIPLookup(IPLookup); rule recirculate… ; rule exit …; method Action enter (IP ip); Token tok <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(tok,ip[15:0])); endmethod method ActionValue#(Msg) getResult(); let result <- cbuf.getResult(); return result; endmodule for enter and getResult to execute simultaneously, cbuf.getToken and cbuf.getResult must execute simultaneously http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 34
IP Lookup rules with completion buffer rule recirculate (!isLeaf(ram.peek())); match{.tok,.rip} = fifo.first(); fifo.enq(tuple2(tok,(rip << 8))); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); cbuf.put(ram.peek()); fifo.deq(); ram.deq(); endrule For rule exit and method enter to execute simultaneously, cbuf.put and cbuf.getToken must execute simultaneously For no dead cycles cbuf.getToken and cbuf.put and cbuf.getResult must be able to execute simultaneously http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 35
Naïve Completion Buffer module mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkReg(0); Reg#(Token) wrP <- mkReg(0); Reg#(Token) cnt <- mkReg(0); method ActionValue#(Token) getToken() if (cnt < Max); cnt <= cnt + 1; rdP <= nextPointer(rdP); valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); method ActionValue#(t) getResult() if (valids[wrP]) cnt <= cnt -1; wrP <= nextPointer(wrP); return (data.sub(wrP)); endmodule http://csg.csail.mit.edu/6.375 March 9, 2011
Completion buffer: Interface Requirements cbuf getResult getToken put (result & token) Rules and methods concurrency requirement to avoid dead-cycles: exit < getResult < enter cbuf methods’ concurency: cbuf.getResult < cbuf.put < cbuf.getToken http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 March 9, 2011 37
Completion Buffer getResult < put < getToken Is valids okay? module mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkConfigReg(0); Reg#(Token) wrP <- mkConfigReg(0); Counter cnt <- mkCounter(); method ActionValue#(Token) getToken() if (cnt < Max); cnt.up(); rdP <= rdP + 1; valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); method ActionValue#(t) getResult() if (valids[wrP]) cnt.down(); wrP <= wrP + 1; return (data.sub(wrP)); endmodule Is valids okay? Is the ordering correct? http://csg.csail.mit.edu/6.375 March 9, 2011