IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2010 http://csg.csail.mit.edu/6.375
but first a correction from the last lecture … February 22, 2010 http://csg.csail.mit.edu/6.375
One-Element Pipeline FIFO !empty !full rdy enab enq deq module FIFO or module mkPipelineFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); method t first() if (full); return (data); method Action clear(); full <= False; endmethod endmodule This actually won’t work! This works correctly in both cases (fifo full and fifo empty). first < enq deq < enq enq < clear deq < clear http://csg.csail.mit.edu/6.375 February 22, 2010
One-Element Pipeline FIFO Analysis !empty !full rdy enab enq deq module FIFO or module mkPipelineFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); ... Rwire allows us to create a combinational path between enq and deq but does not affect the conflict analysis Conflict analysis: Rwire deqEN allows concurrent execution of enq & deq with the functionality deq<enq; However, the conflicts around Register full remain! http://csg.csail.mit.edu/6.375 February 22, 2010
Solution- Config registers Lie a little ConfigReg is a Register (Reg#(a)) Reg#(t) full <- mkConfigRegU; Same HW as Register, but the definition says read and write can happen in either order However, just like a HW register, a read after a write gets the old value Primarily used to fool the compiler analysis to do the right thing http://csg.csail.mit.edu/6.375 February 22, 2010
One-Element Pipeline FIFO A correct solution !empty !full rdy enab enq deq module FIFO or module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); method t first() if (full); return (data); method Action clear(); full <= False; endmethod endmodule No conflicts around full: when both enq and deq happen; if we want deq < enq then full must be set to True in case enq occurs. Scheduling constraint on deqEn forces deq < enq first < enq deq < enq enq < clear deq < clear http://csg.csail.mit.edu/6.375 February 22, 2010
An aside Unsafe modules Bluespec allows you to import Verilog modules by identifying wires that correspond to methods Such modules can be made safe either by asserting the correct scheduling properties of the methods or by wrapping the unsafe modules in appropriate Bluespec code Config Reg is an example of an unsafe module http://csg.csail.mit.edu/6.375 February 22, 2010
back to today’s lecture … February 22, 2010 http://csg.csail.mit.edu/6.375
IP Lookup block in a router Queue Manager Packet Processor Exit functions Control Processor Line Card (LC) IP Lookup SRAM (lookup table) Arbitration Switch LC The design example you’ll be seeing is from an IP switch. We’ll be looking at the IP lookup function that you typically find on the ingress port of a line card. A packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing table Line rate and the order of arrival must be maintained line rate 15Mpps for 10GE http://csg.csail.mit.edu/6.375 February 22, 2010
Sparse tree representation F … A … 7 A … B F … F * E 5.*.*.* D 10.18.200.5 C 10.18.200.* B 7.14.7.3 A 7.14.*.* 14 3 5 E F … F 7 F … 10 F … F … 18 255 F … F … IP address Result M Ref 7.13.7.3 F 10.18.201.5 7.14.7.2 5.13.7.2 E 10.18.200.7 C C … 5 D 200 2 F … 3 4 A In this lecture: Level 1: 16 bits Level 2: 8 bits Level 3: 8 bits 1 to 3 memory accesses 1 4 http://csg.csail.mit.edu/6.375 February 22, 2010
“C” version of LPM int lpm (IPA ipa) /* 3 memory lookups */ { int p; /* Level 1: 16 bits */ p = RAM [ipa[31:16]]; if (isLeaf(p)) return value(p); /* Level 2: 8 bits */ p = RAM [ptr(p) + ipa [15:8]]; /* Level 3: 8 bits */ p = RAM [ptr(p) + ipa [7:0]]; return value(p); /* must be a leaf */ } … 216 -1 28 -1 Not obvious from the C code how to deal with - memory latency - pipelining Here’s a longest prefix match algorithm in software. It’s pretty simple, up to three lookups of different parts of the IP address. (Click now) Can you visualize how to implement this in HW? It’s not obvious. Memory latency ~30ns to 40ns Must process a packet every 1/15 ms or 67 ns Must sustain 3 memory dependent lookups in 67 ns http://csg.csail.mit.edu/6.375 February 22, 2010
Longest Prefix Match for IP lookup: 3 possible implementation architectures Rigid pipeline Inefficient memory usage but simple design Linear pipeline Efficient memory usage through memory port replicator Circular pipeline Efficient memory with most complex control So, let’s look at three different approaches: Rigid pipeline Linear pipeline Circular pipeline They all appear at a high-level to have different advantages, but… 1 2 3 Designer’s Ranking: Which is “best”? http://csg.csail.mit.edu/6.375 February 22, 2010 Arvind, Nikhil, Rosenband & Dave ICCAD 2004
Circular pipeline enter? done? RAM yes inQ fifo no outQ The fifo holds the request while the memory access is in progress The architecture has been simplified for the sake of the lecture. Otherwise, a “completion buffer” has to be added at the exit to make sure that packets leave in order. Next lecture http://csg.csail.mit.edu/6.375 February 22, 2010
FIFO interface FIFO#(type t); method Action enq(t x); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item endinterface n enab enq rdy not full n = # of bits needed to represent a value of type t enab rdy deq module FIFO not empty n first not empty rdy http://csg.csail.mit.edu/6.375 February 22, 2010
Request-Response Interface for Synchronous Memory Data Ack Ready req deq peek Addr Ready ctr (ctr > 0) ctr++ ctr-- deq Enable enq Synch Mem Latency N interface Mem#(type addrT, type dataT); method Action req(addrT x); method Action deq(); method dataT peek(); endinterface Making a synchronous component latency- insensitive http://csg.csail.mit.edu/6.375 February 22, 2010
Circular Pipeline Code enter? done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule done? Is the same as isLeaf rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule When can enter fire? inQ has an element and ram & fifo each has space http://csg.csail.mit.edu/6.375 February 22, 2010
Circular Pipeline Code: discussion enter? done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule When can recirculate fire? ram & fifo each has an element and ram, fifo & outQ each has space Is this possible? http://csg.csail.mit.edu/6.375 February 22, 2010
Ordinary FIFO won’t work but a pipeline FIFO would February 22, 2010 http://csg.csail.mit.edu/6.375
Problem solved! PipelineFIFO fifo <- mkPipelineFIFO; // use a Pipeline fifo rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule RWire has been safely encapsulated inside the Pipeline FIFO – users of the fifo need not be aware of RWires http://csg.csail.mit.edu/6.375 February 22, 2010
Dead cycles rule enter (True); IP ip = inQ.first(); done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule assume simultaneous enq & deq is allowed rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule Can a new request enter the system when an old one is leaving? Is this worth worrying about? http://csg.csail.mit.edu/6.375 February 22, 2010
The Effect of Dead Cycles enter done? RAM yes in fifo no Circular Pipeline RAM takes several cycles to respond to a request Each IP request generates 1-3 RAM requests FIFO entries hold base pointer for next lookup and unprocessed part of the IP address What is the performance loss if “exit” and “enter” don’t ever happen in the same cycle? >33% slowdown! Unacceptable http://csg.csail.mit.edu/6.375 February 22, 2010
Scheduling conflicting rules When two rules conflict on a shared resource, they cannot both execute in the same clock The compiler produces logic that ensures that, when both rules are applicable, only one will fire Which one? source annotations (* descending_urgency = “recirculate, enter” *) http://csg.csail.mit.edu/6.375 February 22, 2010
So is there a dead cycle? rule enter (True); IP ip = inQ.first(); done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule In general these two rules conflict but when isLeaf(p) is true there is no apparent conflict! http://csg.csail.mit.edu/6.375 February 22, 2010
Rule Spliting rule foo (True); if (p) r1 <= 5; else r2 <= 7; endrule rule fooT (p); r1 <= 5; endrule rule fooF (!p); r2 <= 7; rule fooT and fooF can be scheduled independently with some other rule http://csg.csail.mit.edu/6.375 February 22, 2010
Spliting the recirculate rule rule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq(); endrule rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule Now rules enter and exit can be scheduled simultaneously, assuming fifo.enq and fifo.deq can be done simultaneously http://csg.csail.mit.edu/6.375 February 22, 2010
Packaging a module: Turning a rule into a method inQ outQ enter? RAM done? fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(p[15:0]); inQ.deq(); endrule method Action enter (IP ip); ram.req(ip[31:16]); fifo.enq(ip[15:0]); endmethod Similarly a method can be written to extract elements from the outQ next lecture- Completion Buffer http://csg.csail.mit.edu/6.375 February 22, 2010