October 6, 2009http://csg.csail.mit.edu/koreaL10-1 IP Lookup-2: The Completion Buffer Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Slides:

Advertisements

Similar presentations

Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

Advertisements

March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.

March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.

IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.

IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013

February 22, 2005http://csg.csail.mit.edu/6.884/L07-1 Bluespec-1: Design Affects Everything Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

March 6, 2006http://csg.csail.mit.edu/6.375/L10-1 Bluespec-4: Rule Scheduling and Synthesis Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

Constructive Computer Architecture Tutorial 2 Advanced BSV Sizhuo Zhang TA Oct 9, 2015T02-1http://csg.csail.mit.edu/6.175.

Constructive Computer Architecture Tutorial 2 Advanced BSV Andy Wright TA September 26, 2014http://csg.csail.mit.edu/6.175T02-1.

September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.

Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1

Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Functions and Types in Bluespec

Bluespec-3: A non-pipelined processor Arvind

Bluespec-6: Modeling Processors

Folded “Combinational” circuits

Scheduling Constraints on Interface methods

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Sequential Circuits Constructive Computer Architecture Arvind

Sequential Circuits: Constructive Computer Architecture

Caches-2 Constructive Computer Architecture Arvind

IP Lookup: Some subtle concurrency issues

Stmt FSM Arvind (with the help of Nirav Dave)

Performance Specifications

Pipelining combinational circuits

Multirule Systems and Concurrent Execution of Rules

Bluespec-1: Design Affects Everything

Constructive Computer Architecture: Guards

Sequential Circuits Constructive Computer Architecture Arvind

Pipelining combinational circuits

Bluespec-4: Architectural exploration using IP lookup Arvind

Constructive Computer Architecture: Well formed BSV programs

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Modules with Guarded Interfaces

Pipelining combinational circuits

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Bluespec-3: A non-pipelined processor Arvind

Multirule systems and Concurrent Execution of Rules

Stmt FSM Arvind (with the help of Nirav Dave)

Caches-2 Constructive Computer Architecture Arvind

IP Lookup Arvind Computer Science & Artificial Intelligence Lab

IP Lookup: Some subtle concurrency issues

Elastic Pipelines and Basics of Multi-rule Systems

Constructive Computer Architecture: Guards

Elastic Pipelines and Basics of Multi-rule Systems

Multirule systems and Concurrent Execution of Rules

IP Lookup: Some subtle concurrency issues

Bluespec-5: Scheduling & Rule Composition

Modeling Processors Arvind

Modular Refinement Arvind

Implementing for Correct Concurrency

Constructive Computer Architecture: Well formed BSV programs

Caches-2 Constructive Computer Architecture Arvind

Presentation transcript:

October 6, 2009http://csg.csail.mit.edu/koreaL10-1 IP Lookup-2: The Completion Buffer Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

IP-Lookup module without the completion buffer module mkIPLookup(IPLookup); rule recirculate… ; rule exit …; method Action enter (IP ip); ram.req(ip[31:16]); fifo.enq(ip[15:0]); endmethod method ActionValue#(Msg) getResult(); outQ.deq(); return outQ.first(); endmethod endmodule done? RAM fifo outQ enter getResult The packets may come out of order October 6, 2009 L10-2

IP Lookup rules rule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq(); endrule Method enter and exit can be scheduled simultaneously, assuming fifo.enq and fifo.deq can be done simultaneously October 6, 2009 L10-3

IP-Lookup module with the completion buffer done? RAM fifo enter getResult outQ cbuf yes no getToken Completion buffer - ensures that departures take place in order even if lookups complete out-of-order - gives out tokens to control the entry into the circular pipeline The fifo now must also hold the “token” while the memory access is in progress: Tuple2#(Token,Bit#(16)) remainingIP October 6, 2009 L10-4

Completion buffer: Interface interface CBuffer#(type t); method ActionValue#(Token) getToken(); method Action put(Token tok, t d); method ActionValue#(t) getResult(); endinterface typedef Bit#(TLog#(n)) TokenN#(numeric type n); typedef TokenN#(16) Token; cbuf getResults getToken put (result & token) October 6, 2009 L10-5

Completion buffer: Implementation module mkCBuffer (CBuffer#(t)) provisos (Bits#(t,sz)); RegFile#(Token, Maybe#(t)) buf <- mkRegFileFull(); Reg#(Token) i <- mkReg(0); //input index Reg#(Token) o <- mkReg(0); //output index Reg#(Int#(32)) cnt <- mkReg(0); //number of filled slots … IIVIVIIIVIVI cnt i o buf Elements must be representable as bits A circular buffer with two pointers i and o, and a counter cnt Elements are of Maybe type October 6, 2009 L10-6

Completion buffer: Implementation // state elements // buf, i, o, n... method ActionValue#(t) getToken() if (cnt < maxToken); cnt <= cnt + 1; i <= (i== maxToken) ? 0: i + 1; buf.upd(i, Invalid); return i; endmethod method Action put(Token tok, t data); return buf.upd(tok, Valid data); endmethod method ActionValue#(t) getResult() if (cnt > 0) &&& (buf.sub(o) matches tagged (Valid.x)); o <= (o==maxToken) ? 0 : o + 1; cnt <= cnt - 1; return x; endmethod IIVIVIIIVIVI cnt i o buf Can these methods execute concurrently? Does it matter? October 6, 2009 L10-7

IP-Lookup module with the completion buffer module mkIPLookup(IPLookup); rule recirculate… ; rule exit …; method Action enter (IP ip); Token tok <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(tok,ip[15:0])); endmethod method ActionValue#(Msg) getResult(); let result <- cbuf.getResult(); return result; endmethod endmodule done? RAM fifo enter getResult cbuf yes no getToken for enter and getResult to execute simultaneously, cbuf.getToken and cbuf.getResult must execute simultaneously October 6, 2009 L10-8

IP Lookup rules with completion buffer rule recirculate (!isLeaf(ram.peek())); match{.tok,.rip} = fifo.first(); fifo.enq(tuple2(tok,(rip << 8))); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); cbuf.put(ram.peek()); fifo.deq(); ram.deq(); endrule For rule exit and method enter to execute simultaneously, cbuf.put and cbuf.getToken must execute simultaneously October 6, 2009 L10-9  For no dead cycles cbuf.getToken and cbuf.put and cbuf.getResult must be able to execute simultaneously

Completion buffer: Concurrency Issue // state elements // buf, i, o, n... method ActionValue#(t) getToken() if (cnt < maxToken); cnt <= cnt + 1; i <= (i==maxToken) ? 0 : i + 1; buf.upd(i, Invalid); return i; endmethod method Action put(Token tok, t data); buf.upd(tok, Valid data); endmethod method ActionValue#(t) getResult() if (cnt > 0) &&& (buf.sub(o) matches tagged (Valid.x)); o <= (o==maxToken) ? 0 : o + 1; cnt <= cnt - 1; return x; endmethod IIVIVIIIVIVI cnt i o buf Can these methods execute concurrently? NO! October 6, 2009 L10-10

Concurrency Analysis IIVIVIIIVIVI cnt i o buf A circular buffer with two pointers i and o, and a counter cnt Elements are of Maybe type buf must allow two simultaneous updates and one read It is possible to design such a buf because the updates are always to different addresses because of the way buf is used Use a vector of registers No compiler can detect that October 6, 2009 L10-11

Adding Concurrent methods to the completion buffer Assume: only put tokens given to the completion buffer Can we reach a state where two methods write on the same data? No.

Bypass Registers For concurrency, we want to write a register logically earlier in the cycle where it’s read For correctness we must bypass the written value to the reads Build a BypassRegister from RWire write < read September 29, 2009 L

Configuration Registers Sometimes the higher-level semantics say that the bypass is not necessary The bypass condition never arises In such cases, an ordinary register behaves like a bypass register But the concurrency analysis prohibits the bypass orderings Configuration Registers are ordinary registers with CF reads and writes Allows the bypass ordering October 6, 2009 L10-14

Concurrent Completion buffer: Implementation module mkCBuffer (CBuffer#(t)) provisos (Bits#(t,sz)); Vector#(TokenSz, Maybe#(t)) buf <- replicateM(mkConfigReg(Invalid)); Reg#(Token) i <- mkReg(0); //input index Reg#(Token) o <- mkReg(0); //output index Counter#(32) cnt <- mkCounter(0); //number of filled slots … IIVIVIIIVIVI cnt i o buf A circular buffer with two pointers i and o, and a counter cnt Elements are of Maybe type October 6, 2009 L10-15

A special counter module We often need to keep count of certain events Need to read count, decrement and increment Since decrementing and incrementing don’t change the count we can remove some bypassing links Implemented as Counter Library modules (implemented using Rwires) October 6, 2009 L10-16

Completion buffer: Concurrency Issue // state elements // buf, i, o, n... method ActionValue#(t) getToken() if (cnt.read() < maxToken); cnt.inc(); i <= (i==maxToken) ? 0 : i + 1; buf[i] <= Invalid; return i; endmethod method Action put(Token tok, t data); buf[tok] <= Valid data; endmethod method ActionValue#(t) getResult() if (cnt.read() > 0) &&& (buf[o] matches tagged Valid.x); o <= (o==maxToken) ? 0 : o + 1; cnt.dec(); return x; endmethod IIVIVIIIVIVI cnt i o buf Can these methods execute concurrently? Yes! October 6, 2009 L10-17

Longest Prefix Match for IP lookup: 3 possible implementation architectures Rigid pipeline Inefficient memory usage but simple design Linear pipeline Efficient memory usage through memory port replicator Circular pipeline Efficient memory with most complex control Which is “best”? October 6, 2009 L10-18 Arvind, Nikhil, Rosenband & Dave ICCAD 2004

October 6, 2009 L10-19http://csg.csail.mit.edu/korea Implementations of Static pipelines Two designers, two results LPM versions Best Area (gates) Best Speed (ns) Static V (Replicated FSMs) Static V (Single FSM) Replicated: RAM FSM MUX / De-MUX FSM Counter MUX / De-MUX result IP addr FSM RAM MUX result IP addr BEST: Each packet is processed by one FSM Shared FSM

Synthesis results LPM versions Code size (lines) Best Area (gates) Best Speed (ns) Mem. util. (random workload) Static V % Static BSV (5% larger)3.32 (7% faster)63.5% Linear V % Linear BSV (8% larger)4.7 (same)99.9% Circular V % Circular BSV (1% larger)3.67 (2% slower)99.9% Synthesis: TSMC 0.18 µm lib - Bluespec results can match carefully coded Verilog - Micro-architecture has a dramatic impact on performance - Architecture differences are much more important than language differences in determining QoR October 6, 2009 L10-20 V = Verilog;BSV = Bluespec System Verilog