Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,

Slides:



Advertisements
Similar presentations
EE384Y: Packet Switch Architectures
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Sundar Iyer Winter 2012 Lecture 8a Packet Buffers with Latency EE384 Packet Switch Architectures.
Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego.
Courtesy: Nick McKeown, Stanford 1 Intro to Quality of Service Tahir Azim.
Design and Analysis of a Robust Pipelined Memory System Hao Wang †, Haiquan (Chuck) Zhao *, Bill Lin †, and Jun (Jim) Xu * † University of California,
Router Buffer Sizing and Reliability Challenges in Multicast Aditya Akella 02/28.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown
Sizing Router Buffers Guido Appenzeller Isaac Keslassy Nick McKeown Stanford University.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
Router Architecture : Building high-performance routers Ian Pratt
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Analysis of a Statistics Counter Architecture Devavrat Shah, Sundar Iyer, Balaji Prabhakar & Nick McKeown (devavrat, sundaes, balaji,
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
Sizing Router Buffers (Summary)
Sizing Router Buffers Nick McKeown Guido Appenzeller & Isaac Keslassy SNRC Review May 27 th, 2004.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
Modeling TCP in Small-Buffer Networks
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Introduction.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Isaac Keslassy (Technion) Guido Appenzeller & Nick McKeown (Stanford)
Ph. D Oral Examination Load-Balancing and Parallelism for the Internet Stanford University Ph.D. Oral Examination Tuesday, Feb 18 th 2003 Sundar Iyer
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, Departments.
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
CS144, Stanford University Error in Q3-7. CS144, Stanford University Using longest prefix matching, the IP address will match which entry? a /8.
Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
CS144 An Introduction to Computer Networks
Optics in Internet Routers Mark Horowitz, Nick McKeown, Olav Solgaard, David Miller Stanford University
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
Designing Packet Buffers for Router Linecards Sundar Iyer, Ramana Kompella, Nick McKeown Reviewed by: Sarang Dharmapurikar.
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Winter 2008CS244a Handout 81 CS244a: An Introduction to Computer Networks Handout 8: Congestion Avoidance and Active Queue Management Nick McKeown Professor.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Winter 2008CS244a Handout 71 CS244a: An Introduction to Computer Networks Handout 7: Congestion Control Nick McKeown Professor of Electrical Engineering.
Winter 2006EE384x Handout 11 EE384x: Packet Switch Architectures Handout 1: Logistics and Introduction Professor Balaji Prabhakar
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1 Fair Queuing Hamed Khanmirza Principles of Network University of Tehran.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford.
Buffers: How we fell in love with them, and why we need a divorce Hot Interconnects, Stanford 2004 Nick McKeown High Performance Networking Group Stanford.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
scheduling for local-area networks”
Weren’t routers supposed
Packet Forwarding.
Parallelism in Network Systems Joint work with Sundar Iyer
EE384x: Packet Switch Architectures
Memory System Performance Chapter 3
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Techniques and problems for
Techniques for Fast Packet Buffers
Presentation transcript:

Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University

Winter 2006EE384x2 The Problem  All packet switches (e.g. Internet routers, ATM switches) require packet buffers for periods of congestion.  Size: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data.  Speed: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Memory Linerate, R Memory Linerate, R Memory 1 N 1 N

Winter 2006EE384x3 An Example Packet buffers for a 40Gb/s router linecard Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns 10Gbits Buffer Manager Unpredictable Scheduler Requests

Winter 2006EE384x4 Memory Technology  Use SRAM? + Fast enough random access time, but - Too low density to store 10Gbits of data.  Use DRAM? + High density means we can store data, but - Can’t meet random access time.

Winter 2006EE384x5 Can’t we just use lots of DRAMs in parallel? Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Read/write 320B every 32ns 40-79Bytes: 0-39…………… B

Winter 2006EE384x6 Works fine if there is only one FIFO Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79Bytes: 0-39…………… B Buffer Memory 320B 40B 320B 40B 320B

Winter 2006EE384x7 Works fine if there is only one FIFO Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79Bytes: 0-39…………… B Buffer Memory 320B ?B 320B ?B 320B Variable Length Packets

Winter 2006EE384x8 In practice, buffer holds many FIFOs 40-79Bytes: 0-39…………… B 1 2 Q e.g.  In an IP Router, Q might be 200.  In an ATM switch, Q might be Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 320B ?B 320B ?B How can we write multiple variable-length packets into different queues?

Winter 2006EE384x9 Problems 1. A 320B block will contain packets for different queues, which can’t be written to, or read from the same location. 2. If instead a different address is used for each memory, and packets in the 320B block are written to different locations, how do we know the memory will be available for reading when we need to retrieve the packet?

Winter 2006EE384x10 Arriving Packets R Unpredictable Scheduler Requests Departing Packets R 12 1 Q Small head SRAM cache for FIFO heads SRAM Hybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs Q 2 Writing b bytes Reading b bytes cache for FIFO tails Q 2 Small tail SRAM DRAM

Winter 2006EE384x11 Some Thoughts 1. What is the minimum SRAM needed to guarantee that a byte is always available in SRAM when requested? 2. What algorithm should we use to manage the replenishment of the SRAM “cache” memory?

Winter 2006EE384x12 An Example Q = 5, w = 9+, b = 6 t = 1 Bytes t = 3 Bytes t = 4 Bytes t = 5 Bytes t = 7 Bytes t = 2 Bytes t = 6 Bytes t = 0 Bytes Replenish

Winter 2006EE384x13 An Example Q = 5, w = 9+, b = 6 t = 8 Bytes t = 9 Bytes … t = 10 Bytes t = 11 Bytes t = 12 Bytes t = 13 Bytes Replenish … t = 19 Bytes Replenish t = 23 Bytes Read

Winter 2006EE384x14 The size of the SRAM cache  Necessity:  How large does the SRAM cache need to be under any MMA?  Theorem: wQ > Q(b - 1)(2 + lnQ)  Sufficiency:  For a specific MMA, and for any pattern of arrivals, what is the smallest SRAM cache needed so that a byte is always available when requested?  For one particular algorithm: wQ = Qb(2 + lnQ) w Bytes Q w

Winter 2006EE384x15 Some Definitions  Occupancy: X(q,t)  The number of bytes in FIFO q (in SRAM) at time t.  Deficit: D(q,t) = w - X(q,t) w Q w occupancy deficit

Winter 2006EE384x16 Smallest SRAM cache Necessity

Winter 2006EE384x17 Smallest SRAM cache Necessity  In addition, each queue needs to hold (b – 1) bytes in case it is replenished with b bytes when only 1 byte has been removed.  Therefore, SRAM size must be at least: Qw > Q(b – 1)(2 + lnQ).

Winter 2006EE384x18 Most Deficit Queue First MMA Sufficiency  Algorithm: Every b timeslots, MDQF-MMA replenishes the queue with the largest deficit.  Theorem: With MDQF-MMA, an SRAM cache of size Qw > Qb(2 + lnQ) is sufficient. Examples: 1.40Gb/s linecard, b =640, Q =128: SRAM = 560kBytes 2.160Gb/s linecard, b =2560, Q =512: SRAM = 10MBytes

Winter 2006EE384x19 Reducing the size of the SRAM Intuition:  If we use a lookahead buffer to peek at the requests “in advance”, we can replenish the SRAM cache only when needed.  This increases the latency from when a request is made until the byte is available.  But because it is a pipeline, the issue rate is the same.

Winter 2006EE384x20 The ECQF-MMA Algorithm 2.Compute: Determine which queue will run into “trouble” soonest. green! 1.Lookahead: Next Q(b – 1) + 1 arbiter requests are known. Q(b-1) + 1 Requests in Lookahead Buffer b - 1 Q Queues 3.Replenish: Fetch b bytes for the “troubled” queue. Q b - 1 Queues

Winter 2006EE384x21 Example of ECQF-MMA: Q=4, b=4 t = 0 ; Green Critical Requests in lookahead buffer Queues t = 1 Queues Requests in lookahead buffer t = 2 Queues Requests in lookahead buffer t = 3 Requests in lookahead buffer t = 4 ; Blue Critical Requests in lookahead buffer t = 5 Requests in lookahead buffer t = 6 Requests in lookahead buffer t = 7 Requests in lookahead buffer t = 8 ; Red Critical Requests in lookahead buffer

Winter 2006EE384x22 Theorem Patient Arbiter: An SRAM cache of size Q(b – 1) bytes is sufficient to guarantee that a requested byte is available within Q(b – 1) + 1 request times. Algorithm is called ECQF-MMA (Earliest Critical Queue first). Example: 160Gb/s linecard, b =2560, Q =512: SRAM = 1.3MBytes, delay bound is 65  s (equivalent to 13 miles of fiber).

Winter 2006EE384x23 Maximum Deficit Queue First with Latency (MDQFL-MMA)  What if application can only tolerate a latency l max < Q(b – 1) + 1 timeslots?  Algorithm: Maximum Deficit Queue First with latency (MDQFL-MMA) services a queue, once every b timeslots in the following order: 1. If there is an earliest critical queue, replenish it. 2. If not, then replenish the queue that will have the most deficit l max timeslots in the future.

Winter 2006EE384x24 Pipeline Latency, x SRAM Size Queue Length for Zero Latency Queue Length for Maximum Latency Queue length vs. Pipeline depth Q=1000, b = 10