Buffer Management and Arbiter in a Switch

Buffer Management and Arbiter in a Switch
Lecture 6: Buffer Management and Arbiter in a Switch

Example Assume central memory packet switches. All N ingress ports deposit R bits per second into this central memory; all N egress ports remove R bits per second from this central memory; thus, the central memory must support 2 NR bits/second. Suppose we have 500Mhz central memory and width W=64 bit access: 2NR = 500Mhz * W N = 500Mhz * 64bit / 2R = 500*106 *64 / 2*2.5*109 = 6.4 ~ 6

Memory Width Having maxigram (max packet size) memory width is wasteful. In varying packets length, the last time will be not totally full. W+1 problem:2 memory writes and 2 reads We pay 2W bandwidth for W+1 To overcome inefficiency, we need 2w/w+1 faster – factor of 2. This reduces N to 3, since R becomes 2R. Minigram problem: we increase width but minigram remains constant.. We must support enough minigram read/ write events per second to support all ports. Thus: When , we need additional speedup number of packets in one second possible # operations on the bus per packet in one second

Example Assume that our minigram is 48 bytes, and that 2.5 Gb/s is available to carry frames on each port. Our memory limit is 500 MHz. We first attempt to build a N = 40 port switch. Minigrams must arrive at rate 2NR/mgm = 2*40*6.5*10^6 = 520 We must reduce N<=38, suppose N=32 2. Let’s find Width requirement with factor 2: 2*(2NR) = 500Mhz * W W = 4NR / 500 = 640bits = 80 bytes 3. We must check this design for the minigram inefficiency problem . Our minigram is 48 bytes; our W is 80 bytes. Thus, our minigram is greater than half of our memory transfer size, so the minigram inefficiency problem is less severe than the W + 1 problem, and our speedup factor of 2.0 covers both issues, given that we reduced the number of ports to N = 32 to meet the minigram frequency limit.

Central Memory Cell Switches
All cells have one fixed length. No W + 1 problem or minigram problem. If W = fixed cell length, then number of ports that can be supported is: However, the cell flow often originates in a segmented packet flow, and that the segmentation process will produce a cell stream that needs the same W + 1 speedup. So the cell stream carries twice the bandwidth of the original packet. In practice the bw is lower, since unlikely that all have w+1 problem at the same time

No central buffer design
Ideally, we would like to have one central buffer pool for all. The problem is that central buffering demands that all the frame buffers and all the switching data paths and logic be co - located. For nontrivial switch sizes, this is not feasible. We are forced to move something off the central switching chip. The most rewarding and feasible thing to move off the central switch chip(s) are the buffers. That leads us to ingress or egress -buffered designs.

Ingress-Buffered Packet-Cell Switches
Divide the central memory into memory per ingress port:

Blocking B C B C Packet destined to port B will be dropped, while port B is idle In order to avoid internal blocking, speedup N is needed! It is not practical: therefore upto 20% speedup is usually provided. Egress buffers are needed. Does not guarantee switching performance, but statistical statement.

Non-blocking Ingress - Buffered Frame Switches
No speedup is needed Each egress can see what frames are queued for it and can implement any policy it chooses to select the next frame

Non-blocking Ingress - Buffered Frame Switches
Additions to the tail of the queue occur at the ingress port, where the queue ’ s buffers are located. Removals of frames from the head of the queue are controlled by the egress port. Thus, the data structures that represent the queue must be accessible from two distinct locations in the switch, and they must permit (and control) concurrent access by the ingress insertion process and the egress removal process. The combination is generally known as “ ingress buffered, virtual output queued. ”

Control System We don’t want all memory on one chip
Ingress buffers are located on port cards Switch is a simple frame crossbar Control System determines when the various ingress-egress connections are to be made and to be broken Control Systems needs: To know depths of all queues To communicate with ingress and egress Solution Use cells instead of packets Use Request-Grant protocol to communicate Use arbiter on the switch to determine which cells and which connections should proceed at a given time

Use of Cells Distributed algorithms are difficult to design
Synchronous distributed is easier: when timing is controlled by a central clock and there is a fixed known schedule based on this clock Synchronous control system has all transfers through the frame crossbar be of fixed size and to have all such transfers start and end on common temporal boundaries. Thus, the emission of a set of fixed - length cells through a cell crossbar becomes the common, coordinating schedule. This solution has the added advantage that practical scheduling algorithms can pack switching work more efficiently into a cell crossbar than into a frame crossbar, with the frame crossbar ’ s less disciplined packing of transfers into available time. The most serious disadvantages of this approach are that: Frames in the ingress ports must be segmented into cells. Cell headers must be added to each segment, thereby decreasing the efficiency of the switching operation. Frames in the egress ports must be reassembled from cells. Cell streams speed up, due to packet segmentation in the ports.

Request-Grant Protocol
The frame queues reside on the port cards; the cell scheduling control system resides in the switch core. Control system maintains a count of cells in each queue in the switch core. A request – grant protocol maintains the counts on the switch core. When the switch is initialized, all of the queue - depth counts are zeroed. Each time an ingress port creates another cell (out of some received packet) and places it in one of its ingress queues, it sends a request message to the switch core. The request message contains the number of the queue to which the frame was just added. On receiving the request message, the switch core increments a counter associated with that queue.

Request-Grant Protocol – cont.
For each cell transfer time, the switch core will examine the set of all queue counters and pick a set of ingress – egress connections to schedule for the next cell transfer time. The winning queues should have their depth counts decremented to reflect their true depth after the newly scheduled set of cell transfers. The winning ingress ports must be informed of their recent victories and must know which of their multiple queues was their winning candidate. This is accomplished by a grant message from the switch core to the ingress ports. Once the ingress port receives the grant, it immediately forwards the winning cell to the switch core. By this time, the switch core has set the cell crossbar to the correct settings to cause each incoming cell to be forwarded to the appropriate egress port Also, the egress ports know from the global cell schedule when they should expect the next cell to arrive (if any has been scheduled for it during the current cell transfer cycle).

Permutation Arbitration by Wavefront Computation
We use permutation pattern, since during each cell transfer period, each ingress port can provide one cell and each egress port can accept one cell. We require a control mechanism that can examine the counts representing the depths of the various ingress queues and find a subset of those queues that represent as nearly as possible a complete permutation load for the core cell crossbar. We would like each such schedule to be a complete permutation to maximize use of the cell crossbar, but under lightly loaded conditions we must expect not to be able to fill out this schedule completely.

Example N=4, C=1 Q2,1,1 has 3 cells
In permutation, the sum of each row and of each column is 1

Example – cont. Then we reduce the cell count

Process view

Permutation computation
Cell size is exactly twice the minigram size For TCP/IP cell size including header is 80 bytes. If the switch and the port running 1Gb/s, then one cell time is 256 ns. Control system must compute permutation each cell time If N increases and C increases, the calculation becomes unfeasible. We must use parallel hardware implementation!

Wavefront arbiter Each element is a processing element
Each element contains the queue count Computation starts from up down and from left to right On the left initial values are inserted indicating that ingress port is still “hunting” On top initial values are inserted indicating that egress port is still “available” Each cell in the wavefront array tries to match hunting and available and non-empty queue When a match was made, the value is not passed to the right and not passed down

Wavefront arbiter <1,1> has hunting and available, but queue is empty Pass hunting to <1,2> and pass available to <2,1> <1,2> receives hunting from <1,1> and available from up row. Match is made in <1,2>, the count is decremented. <1,3>,<1,4> do not receive hunting <2,1> has hunting and available, but empty queue <2,2> does not receive available <2,3> has hunting and available, but empty queue <2.4> has hunting and available and non-empty queue. Match.

Request-grant issues Latency – is tripled: request, grant, cell
Arbitration bias -ingress 1 always has the first chance to grab any egress. Egress1 is the first path that will be found by any ingress that is still hunting.

Multiple Classes One NxN wavefront array is used for each priority class The highest priority class received hunting and available from all ports. A match is found and only still available and hunting are now passed to the lower priority and etc. Only rigid priority scheme is supported : no balance between priorities is available Processing time increases Circuit area increases

Multiple classes

Exercise 3 1. Consider an Output Queue switch with N different physical memories, and all links operating at rate 10 Gigabits/s. Memory limit is 650Mhz. Minigram=64 bytes. How many ports can the switch support with reasonable performance? 2. What is time and space complexity of the wavefront arbiter to find one permutation in the Request-grant protocol in switch -with one priority class? -with C priority classes?

Buffer Management and Arbiter in a Switch

Similar presentations

Presentation on theme: "Buffer Management and Arbiter in a Switch"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Buffer Management and Arbiter in a Switch

Similar presentations

Presentation on theme: "Buffer Management and Arbiter in a Switch"— Presentation transcript:

Similar presentations

About project

Feedback