IP Lookup: Some subtle concurrency issues

Slides:

Advertisements

Similar presentations

Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.

Advertisements

Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.

March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.

IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.

IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013

February 22, 2005http://csg.csail.mit.edu/6.884/L07-1 Bluespec-1: Design Affects Everything Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.

September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.

March 6, 2006http://csg.csail.mit.edu/6.375/L10-1 Bluespec-4: Rule Scheduling and Synthesis Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.

Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

October 6, 2009http://csg.csail.mit.edu/koreaL10-1 IP Lookup-2: The Completion Buffer Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Elastic Pipelines: Concurrency Issues

EHRs: Designing modules with concurrent methods

EHRs: Designing modules with concurrent methods

Bluespec-3: A non-pipelined processor Arvind

Concurrency properties of BSV methods and rules

Bluespec-6: Modeling Processors

Folded “Combinational” circuits

Scheduling Constraints on Interface methods

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Sequential Circuits Constructive Computer Architecture Arvind

Sequential Circuits: Constructive Computer Architecture

IP Lookup: Some subtle concurrency issues

Stmt FSM Arvind (with the help of Nirav Dave)

Performance Specifications

Pipelining combinational circuits

Multirule Systems and Concurrent Execution of Rules

Bluespec-1: Design Affects Everything

Constructive Computer Architecture: Guards

Sequential Circuits Constructive Computer Architecture Arvind

Pipelining combinational circuits

EHR: Ephemeral History Register

Bluespec-4: Architectural exploration using IP lookup Arvind

Constructive Computer Architecture: Well formed BSV programs

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Modeling Processors: Concurrency Issues

Modules with Guarded Interfaces

Pipelining combinational circuits

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Elastic Pipelines: Concurrency Issues

Bluespec-3: A non-pipelined processor Arvind

Multirule systems and Concurrent Execution of Rules

Stmt FSM Arvind (with the help of Nirav Dave)

Modular Refinement - 2 Arvind

IP Lookup Arvind Computer Science & Artificial Intelligence Lab

IP Lookup: Some subtle concurrency issues

Elastic Pipelines: Concurrency Issues

Elastic Pipelines and Basics of Multi-rule Systems

Constructive Computer Architecture: Guards

Elastic Pipelines and Basics of Multi-rule Systems

Multirule systems and Concurrent Execution of Rules

Introduction to Bluespec: A new methodology for designing Hardware

Bluespec-5: Scheduling & Rule Composition

Modular Refinement Arvind

Implementing for Correct Concurrency

Constructive Computer Architecture: Well formed BSV programs

Caches-2 Constructive Computer Architecture Arvind

Presentation transcript:

IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2010 http://csg.csail.mit.edu/6.375

but first a correction from the last lecture … February 22, 2010 http://csg.csail.mit.edu/6.375

One-Element Pipeline FIFO !empty !full rdy enab enq deq module FIFO or module mkPipelineFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); method t first() if (full); return (data); method Action clear(); full <= False; endmethod endmodule This actually won’t work! This works correctly in both cases (fifo full and fifo empty). first < enq deq < enq enq < clear deq < clear http://csg.csail.mit.edu/6.375 February 22, 2010

One-Element Pipeline FIFO Analysis !empty !full rdy enab enq deq module FIFO or module mkPipelineFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); ... Rwire allows us to create a combinational path between enq and deq but does not affect the conflict analysis Conflict analysis: Rwire deqEN allows concurrent execution of enq & deq with the functionality deq<enq; However, the conflicts around Register full remain! http://csg.csail.mit.edu/6.375 February 22, 2010

Solution- Config registers Lie a little ConfigReg is a Register (Reg#(a)) Reg#(t) full <- mkConfigRegU; Same HW as Register, but the definition says read and write can happen in either order However, just like a HW register, a read after a write gets the old value Primarily used to fool the compiler analysis to do the right thing http://csg.csail.mit.edu/6.375 February 22, 2010

One-Element Pipeline FIFO A correct solution !empty !full rdy enab enq deq module FIFO or module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); method t first() if (full); return (data); method Action clear(); full <= False; endmethod endmodule No conflicts around full: when both enq and deq happen; if we want deq < enq then full must be set to True in case enq occurs. Scheduling constraint on deqEn forces deq < enq first < enq deq < enq enq < clear deq < clear http://csg.csail.mit.edu/6.375 February 22, 2010

An aside Unsafe modules Bluespec allows you to import Verilog modules by identifying wires that correspond to methods Such modules can be made safe either by asserting the correct scheduling properties of the methods or by wrapping the unsafe modules in appropriate Bluespec code Config Reg is an example of an unsafe module http://csg.csail.mit.edu/6.375 February 22, 2010

back to today’s lecture … February 22, 2010 http://csg.csail.mit.edu/6.375

IP Lookup block in a router Queue Manager Packet Processor Exit functions Control Processor Line Card (LC) IP Lookup SRAM (lookup table) Arbitration Switch LC The design example you’ll be seeing is from an IP switch. We’ll be looking at the IP lookup function that you typically find on the ingress port of a line card. A packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing table Line rate and the order of arrival must be maintained line rate  15Mpps for 10GE http://csg.csail.mit.edu/6.375 February 22, 2010

Sparse tree representation F … A … 7 A … B F … F * E 5.*.*.* D 10.18.200.5 C 10.18.200.* B 7.14.7.3 A 7.14.*.* 14 3 5 E F … F 7 F … 10 F … F … 18 255 F … F … IP address Result M Ref 7.13.7.3 F 10.18.201.5 7.14.7.2 5.13.7.2 E 10.18.200.7 C C … 5 D 200 2 F … 3 4 A In this lecture: Level 1: 16 bits Level 2: 8 bits Level 3: 8 bits  1 to 3 memory accesses 1 4 http://csg.csail.mit.edu/6.375 February 22, 2010

“C” version of LPM int lpm (IPA ipa) /* 3 memory lookups */ { int p; /* Level 1: 16 bits */ p = RAM [ipa[31:16]]; if (isLeaf(p)) return value(p); /* Level 2: 8 bits */ p = RAM [ptr(p) + ipa [15:8]]; /* Level 3: 8 bits */ p = RAM [ptr(p) + ipa [7:0]]; return value(p); /* must be a leaf */ } … 216 -1 28 -1 Not obvious from the C code how to deal with - memory latency - pipelining Here’s a longest prefix match algorithm in software. It’s pretty simple, up to three lookups of different parts of the IP address. (Click now) Can you visualize how to implement this in HW? It’s not obvious. Memory latency ~30ns to 40ns Must process a packet every 1/15 ms or 67 ns Must sustain 3 memory dependent lookups in 67 ns http://csg.csail.mit.edu/6.375 February 22, 2010

Longest Prefix Match for IP lookup: 3 possible implementation architectures Rigid pipeline Inefficient memory usage but simple design Linear pipeline Efficient memory usage through memory port replicator Circular pipeline Efficient memory with most complex control So, let’s look at three different approaches: Rigid pipeline Linear pipeline Circular pipeline They all appear at a high-level to have different advantages, but… 1 2 3 Designer’s Ranking: Which is “best”? http://csg.csail.mit.edu/6.375 February 22, 2010 Arvind, Nikhil, Rosenband & Dave ICCAD 2004

Circular pipeline enter? done? RAM yes inQ fifo no outQ The fifo holds the request while the memory access is in progress The architecture has been simplified for the sake of the lecture. Otherwise, a “completion buffer” has to be added at the exit to make sure that packets leave in order. Next lecture http://csg.csail.mit.edu/6.375 February 22, 2010

FIFO interface FIFO#(type t); method Action enq(t x); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item endinterface n enab enq rdy not full n = # of bits needed to represent a value of type t enab rdy deq module FIFO not empty n first not empty rdy http://csg.csail.mit.edu/6.375 February 22, 2010

Request-Response Interface for Synchronous Memory Data Ack Ready req deq peek Addr Ready ctr (ctr > 0) ctr++ ctr-- deq Enable enq Synch Mem Latency N interface Mem#(type addrT, type dataT); method Action req(addrT x); method Action deq(); method dataT peek(); endinterface Making a synchronous component latency- insensitive http://csg.csail.mit.edu/6.375 February 22, 2010

Circular Pipeline Code enter? done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule done? Is the same as isLeaf rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule When can enter fire? inQ has an element and ram & fifo each has space http://csg.csail.mit.edu/6.375 February 22, 2010

Circular Pipeline Code: discussion enter? done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule When can recirculate fire? ram & fifo each has an element and ram, fifo & outQ each has space Is this possible? http://csg.csail.mit.edu/6.375 February 22, 2010

Ordinary FIFO won’t work but a pipeline FIFO would February 22, 2010 http://csg.csail.mit.edu/6.375

Problem solved! PipelineFIFO fifo <- mkPipelineFIFO; // use a Pipeline fifo rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule RWire has been safely encapsulated inside the Pipeline FIFO – users of the fifo need not be aware of RWires http://csg.csail.mit.edu/6.375 February 22, 2010

Dead cycles rule enter (True); IP ip = inQ.first(); done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule assume simultaneous enq & deq is allowed rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule Can a new request enter the system when an old one is leaving? Is this worth worrying about? http://csg.csail.mit.edu/6.375 February 22, 2010

The Effect of Dead Cycles enter done? RAM yes in fifo no Circular Pipeline RAM takes several cycles to respond to a request Each IP request generates 1-3 RAM requests FIFO entries hold base pointer for next lookup and unprocessed part of the IP address What is the performance loss if “exit” and “enter” don’t ever happen in the same cycle? >33% slowdown! Unacceptable http://csg.csail.mit.edu/6.375 February 22, 2010

Scheduling conflicting rules When two rules conflict on a shared resource, they cannot both execute in the same clock The compiler produces logic that ensures that, when both rules are applicable, only one will fire Which one? source annotations (* descending_urgency = “recirculate, enter” *) http://csg.csail.mit.edu/6.375 February 22, 2010

So is there a dead cycle? rule enter (True); IP ip = inQ.first(); done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule In general these two rules conflict but when isLeaf(p) is true there is no apparent conflict! http://csg.csail.mit.edu/6.375 February 22, 2010

Rule Spliting rule foo (True); if (p) r1 <= 5; else r2 <= 7; endrule rule fooT (p); r1 <= 5; endrule rule fooF (!p); r2 <= 7;  rule fooT and fooF can be scheduled independently with some other rule http://csg.csail.mit.edu/6.375 February 22, 2010

Spliting the recirculate rule rule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq(); endrule rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule Now rules enter and exit can be scheduled simultaneously, assuming fifo.enq and fifo.deq can be done simultaneously http://csg.csail.mit.edu/6.375 February 22, 2010

Packaging a module: Turning a rule into a method inQ outQ enter? RAM done? fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(p[15:0]); inQ.deq(); endrule method Action enter (IP ip); ram.req(ip[31:16]); fifo.enq(ip[15:0]); endmethod Similarly a method can be written to extract elements from the outQ next lecture- Completion Buffer http://csg.csail.mit.edu/6.375 February 22, 2010