Designing Packet Buffers for Router Linecards Sundar Iyer, Ramana Kompella, Nick McKeown Reviewed by: Sarang Dharmapurikar.

Slides:



Advertisements
Similar presentations
EE384Y: Packet Switch Architectures
Advertisements

1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Sundar Iyer Winter 2012 Lecture 8a Packet Buffers with Latency EE384 Packet Switch Architectures.
Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego.
Optimal-Complexity Optical Router Hadas Kogan, Isaac Keslassy Technion (Israel)
Lecture 12. Emulating the Output Queue So far we have shown that it is possible to obtain the same throughput with input queueing as with output queueing.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown
Sizing Router Buffers Guido Appenzeller Isaac Keslassy Nick McKeown Stanford University.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
Router Architecture : Building high-performance routers Ian Pratt
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
High Performance All-Optical Networks with Small Buffers Yashar Ganjali High Performance Networking Group Stanford University
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Analysis of a Statistics Counter Architecture Devavrat Shah, Sundar Iyer, Balaji Prabhakar & Nick McKeown (devavrat, sundaes, balaji,
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
High Performance Networking with Little or No Buffers Yashar Ganjali on behalf of Prof. Nick McKeown High Performance Networking Group Stanford University.
Sizing Router Buffers (Summary)
Sizing Router Buffers Nick McKeown Guido Appenzeller & Isaac Keslassy SNRC Review May 27 th, 2004.
Modeling TCP in Small-Buffer Networks
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
Reducing the Buffer Size in Backbone Routers Yashar Ganjali High Performance Networking Group Stanford University February 23, 2005
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Isaac Keslassy (Technion) Guido Appenzeller & Nick McKeown (Stanford)
Fundamental Complexity of Optical Systems Hadas Kogan, Isaac Keslassy Technion (Israel)
Ph. D Oral Examination Load-Balancing and Parallelism for the Internet Stanford University Ph.D. Oral Examination Tuesday, Feb 18 th 2003 Sundar Iyer
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, Departments.
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
7/15/2015HY220: Ιάκωβος Μαυροειδής1 HY220 Schedulers.
1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Network Processor Algorithms: Design and Analysis Stochastic Networks Conference Montreal July 22, 2004 Balaji Prabhakar Stanford University.
Optics in Internet Routers Mark Horowitz, Nick McKeown, Olav Solgaard, David Miller Stanford University
Sizing Router Buffers How much packet buffers does a router need? C Router Source Destination 2T The current “Rule of Thumb” A router needs a buffer size:
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Author: Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29 1.
Platform Architecture Lab USB Performance Analysis of Bulk Traffic Brian Leete
Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,
Applied research laboratory 1 Scaling Internet Routers Using Optics Isaac Keslassy, et al. Proceedings of SIGCOMM Slides:
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
An Introduction to Packet Switching Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
1 Fair Queuing Hamed Khanmirza Principles of Network University of Tehran.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Queue Scheduling Disciplines
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford.
Sachin Katti, CS244 Slides courtesy: Nick McKeown
Weren’t routers supposed
Parallelism in Network Systems Joint work with Sundar Iyer
EE384x: Packet Switch Architectures
Techniques and problems for
Techniques for Fast Packet Buffers
Presentation transcript:

Designing Packet Buffers for Router Linecards Sundar Iyer, Ramana Kompella, Nick McKeown Reviewed by: Sarang Dharmapurikar

Sarang Dharmapurikar 2 Background ●Routers need to buffer packets during congestion  Thumb rule : Buffer size should be RTT x R  With RTT = 0.25s and R = 40 Gbps, Buffer size = 10 Gbits oCan’t use SRAM, they are small and consume too much power oOnly SDRAM provides the required density

Sarang Dharmapurikar 3 Problems.. ●SDRAM is slow, hence less memory bandwidth ●Why not use a big data bus to get more memory bandwidth?

Sarang Dharmapurikar 4 Answer… 320 bytes Packet A Packet B Packet C underutilized bandwidth

Sarang Dharmapurikar 5 Parallel Memory Banks ●However, packet departure order is not known ●Scheduler might request for the packets which happen to be stored in the same bank ●Hence one bank will be busy and others idle, degrading the throughput 320 bytes

Sarang Dharmapurikar 6 Alternative ●Cache packets and write them in SDRAM as one word at a time for one queue ●Likewise, read packets from DRAM, one word for a queue at a time, give out the necessary data and cache the rest A1 320 bytes B1 C1 A2 B2 C2

Sarang Dharmapurikar 7 Architecture of the Hybrid SRAM-SDRAM buffer

Sarang Dharmapurikar 8 Head SRAM buffer w X(i,t)D(i,t) 2 1 i Q ‘b’ bytes leave in b time slots Of these ‘b’ bytes, any byte can be from any queue ‘b’ bytes arrive in ‘b’ time slots all for the same queue Objective : Put a bound on ‘w’ Scheduler

Sarang Dharmapurikar 9 Lower Bound on the Head-SRAM size ●Theorem 1:  w > (b-1)(2 + lnQ) ●Example:  Let b = 3  Q = 9 ●Bytes required = (b-1) + (b-1) + under run Additional b-1 bytes starting b-1 bytesUnder run

Sarang Dharmapurikar 10 Lower bound on the Head-SRAM size ●Proof of theorem 1: ●First iteration : read one byte from each FIFO  Q/b FIFOs will be replenished with b bytes each  Q(1-1/b) FIFOs will have a deficit of D(i,Q) = 1 ●Second iteration : read one byte from each of Q(1-1/b) FIFOs having D(i,Q) = 1  Q(1-1/b)/b will be replenished with b bytes each  Q(1-1/b) 2 will have a deficit of D(i,Q) = 2 ●Xth iteration :  Q(1-1/b) x FIFOs will have a deficit of D(i,Q) = x ●Solve for Q(1-1/b) x = 1  X > (b-1)lnQ ●Hence, buffer rquirement is : w > (b-1)(2 + lnQ)

Sarang Dharmapurikar 11 A Memory Management Algorithm ●Objective: Give an Algorithm that is closer to this bound ●Most Deficit Queue First (MDQF)  Service (replenish) the SRAM FIFO with most deficit first

Sarang Dharmapurikar 12 Some Terminology… Q π1π1 π2π2 π3π3 π4π4 πQπQ

Sarang Dharmapurikar 13 F(2, t-b) F(1) MDQF analysis ●Lemma1 : F(1) < b[2+lnQ] tt -b i i +b j

Sarang Dharmapurikar 14 F(3, t-b) F(2) MDQF Analysis tt -b m m +b p n n

Sarang Dharmapurikar 15 F(i+1, t-b) F(i) MDQF Analysis ●Theorem 2: For MDQF to guarantee that a requested byte is in SRAM, it is sufficient to hold b(3 + lnQ) bytes in each Head-FIFO tt -b +b

Sarang Dharmapurikar 16 MMA that tolerates bounded pipeline delay ●Pre-compute some of the memory requests to find out which queue under-runs ●Critical Queue : A queue with more requests in the look ahead buffer than bytes to give ●Earliest Critical Queue : A queue that turns critical the earliest

Sarang Dharmapurikar 17 Most Deficit Queue First with Pipeline Delay (MDQFP) ●Algorithm:  Replenish the earliest critical queue first  If no critical queues then replenish the one that will most likely become critical in the future ●Lemma3: ●Theorem 3: w = F x (1) + b ●Corollary 1 : x → Qb, w → 2b

Sarang Dharmapurikar 18 Tradeoff between SRAM size and pipeline delay x, pipeline delay QF x (1), total SRAM Q = 1000, b=10

Sarang Dharmapurikar 19 Dynamic SRAM allocation ●So far all the queues had the same length which was static ●SRAM can allocated dynamically to the queues depending on the requirement  further reduction in SRAM size ●The amount of SRAM can be reduced to Q(b-1) for a Look ahead buffer of Q(b-1) + 1

Sarang Dharmapurikar 20 Conclusions ●High capacity and high throughput packet buffers are needed in any router line card ●Packet buffers made out of only SRAMs are impractical, SDRAMs are used ●SDRAM buffer memory used with SRAM cache memory can give the required throughput performance ●Without any pipeline delay the SRAM requirement scales as QblnQ ●With With tolerable delay of Qb time slots, the requirement scales as Qb