September 24, 2009 L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.

Slides:



Advertisements
Similar presentations
Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Advertisements

Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
February 22, 2005http://csg.csail.mit.edu/6.884/L07-1 Bluespec-1: Design Affects Everything Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
March 6, 2006http://csg.csail.mit.edu/6.375/L10-1 Bluespec-4: Rule Scheduling and Synthesis Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
October 6, 2009http://csg.csail.mit.edu/koreaL10-1 IP Lookup-2: The Completion Buffer Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Elastic Pipelines: Concurrency Issues
EHRs: Designing modules with concurrent methods
EHRs: Designing modules with concurrent methods
Bluespec-3: A non-pipelined processor Arvind
Bluespec-6: Modeling Processors
Folded “Combinational” circuits
Scheduling Constraints on Interface methods
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Sequential Circuits: Constructive Computer Architecture
IP Lookup: Some subtle concurrency issues
Stmt FSM Arvind (with the help of Nirav Dave)
Performance Specifications
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Bluespec-1: Design Affects Everything
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
EHR: Ephemeral History Register
Bluespec-4: Architectural exploration using IP lookup Arvind
Constructive Computer Architecture: Well formed BSV programs
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Modeling Processors: Concurrency Issues
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Elastic Pipelines: Concurrency Issues
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Stmt FSM Arvind (with the help of Nirav Dave)
Modular Refinement - 2 Arvind
IP Lookup Arvind Computer Science & Artificial Intelligence Lab
IP Lookup: Some subtle concurrency issues
Elastic Pipelines: Concurrency Issues
Elastic Pipelines and Basics of Multi-rule Systems
Constructive Computer Architecture: Guards
Elastic Pipelines and Basics of Multi-rule Systems
Multirule systems and Concurrent Execution of Rules
IP Lookup: Some subtle concurrency issues
Bluespec-5: Scheduling & Rule Composition
Implementing for Correct Concurrency
Constructive Computer Architecture: Well formed BSV programs
Presentation transcript:

September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

IP Lookup block in a router September 24, 2009 L Queue Manager Packet Processor Exit functions Control Processor Line Card (LC) IP Lookup SRAM (lookup table) Arbitration Switch LC A packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing table Line rate and the order of arrival must be maintained line rate  15Mpps for 10GE

IP addressResultM Ref F F E C Sparse tree representation 3 A … A … B C … C … 5 D F … F … 14 A … A … 7 F … F … 200 F … F … F* E5.*.*.* D C * B A7.14.*.* F … F … F F … E In this lecture: Level 1: 16 bits Level 2: 8 bits Level 3: 8 bits  1 to 3 memory accesses September 24, 2009 L08-3

“C” version of LPM int lpm (IPA ipa) /* 3 memory lookups */ { int p; /* Level 1: 16 bits */ p = RAM [ipa[31:16]]; if (isLeaf(p)) return value(p); /* Level 2: 8 bits */ p = RAM [ptr(p) + ipa [15:8]]; if (isLeaf(p)) return value(p); /* Level 3: 8 bits */ p = RAM [ptr(p) + ipa [7:0]]; return value(p); /* must be a leaf */ } Not obvious from the C code how to deal with - memory latency - pipelining … … … … 0 Must process a packet every 1/15 s or 67 ns Must sustain 3 memory dependent lookups in 67 ns Memory latency ~30ns to 40ns September 24, 2009 L08-4

Longest Prefix Match for IP lookup: 3 possible implementation architectures Rigid pipeline Inefficient memory usage but simple design Linear pipeline Efficient memory usage through memory port replicator Circular pipeline Efficient memory with most complex control September 24, 2009 L Arvind, Nikhil, Rosenband & Dave ICCAD 2004

Circular pipeline The fifo holds the request while the memory access is in progress The architecture has been simplified for the sake of the lecture. Otherwise, a “completion buffer” has to be added at the exit to make sure that packets leave in order. enter? done? RAM yes inQ fifo no outQ September 24, 2009 L Next lecture

interface FIFO#(type t); method Action enq(t x);// enqueue an item method Action deq();// remove oldest entry method t first();// inspect oldest item endinterface FIFO n = # of bits needed to represent a value of type t not full not empty rdy enab n n rdy enab rdy enq deq first FIFO module September 24, 2009 L08-7

Addr Ready ctr (ctr > 0) ctr++ ctr-- deq Enable enq Request-Response Interface for Synchronous Memory Synch Mem Latency N interface Mem#(type addrT, type dataT); method Action req(addrT x); method Action deq(); method dataT peek(); endinterface Data Ack Data Ready req deq peek Making a synchronous component latency- insensitive September 24, 2009 L08-8

rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule Circular Pipeline Code rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule enter? done? RAM inQ fifo When can enter fire? done? Is the same as isLeaf September 24, 2009 L08-9

rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule Circular Pipeline Code: discussion rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule enter? done? RAM inQ fifo When can recirculate fire? September 24, 2009 L

Ordinary FIFO won’t work but a pipeline FIFO would September 24, 2009http://csg.csail.mit.edu/koreaL08-11

module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethod endmodule One-Element Pipeline FIFO !empty !full rdy enab rdy enab enq deq FIFO module or !full This works correctly in both cases (fifo full and fifo empty). first < enq deq < enq enq < clear deq < clear September 24, 2009 L

Problem solved! rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule LFIFO fifo <- mkLFIFO; // use a Pipeline fifo RWire has been safely encapsulated inside the Pipeline FIFO – users of Loopy fifo need not be aware of RWires September 24, 2009 L

Dead cycles rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule enter? done? RAM inQ fifo rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule Can a new request enter the system when an old one is leaving? assume simultaneous enq & deq is allowed September 24, 2009 L

The Effect of Dead Cycles enter done? RAM yes in fifo no What is the performance loss if “exit” and “enter” don’t ever happen in the same cycle? Circular Pipeline RAM takes several cycles to respond to a request Each IP request generates 1-3 RAM requests FIFO entries hold base pointer for next lookup and unprocessed part of the IP address September 24, 2009 L

Scheduling conflicting rules When two rules conflict on a shared resource, they cannot both execute in the same clock The compiler produces logic that ensures that, when both rules are applicable, only one will fire Which one? source annotations (* descending_urgency = “recirculate, enter” *) September 24, 2009 L

So is there a dead cycle? rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule enter? done? RAM inQ fifo rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule September 24, 2009 L

Rule Spliting rule foo (True); if (p) r1 <= 5; else r2 <= 7; endrule rule fooT (p); r1 <= 5; endrule rule fooF (!p); r2 <= 7; endrule  rule fooT and fooF can be scheduled independently with some other rule September 24, 2009 L

Spliting the recirculate rule rule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq(); endrule rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule Now rules enter and exit can be scheduled simultaneously, assuming fifo.enq and fifo.deq can be done simultaneously September 24, 2009 L

Packaging a module: Turning a rule into a method inQ enter? done? RAM fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(p[15:0]); inQ.deq(); endrule outQ September 24, 2009 L