Download presentation
Presentation is loading. Please wait.
Published byAmie Rogers Modified over 8 years ago
1
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University
2
Stanford University 2 Problem Statement Redefined Motivation: To design an extremely high speed packet buffer architecture with fast access time and large size. This talk: I s about the analysis of one such well known approach.
3
Stanford University 3 Characteristics of Packet Buffer Architectures The total throughput needed is at least 2(Ingress Rate) Size of Buffer is at least R * RTT The buffers have one or more FIFOs The sequence in which the FIFOs are accessed is determined by an arbiter and is unknown apriori
4
Stanford University 4 Memory Hierarchy of Packet Buffer Arriving Packets Departing Packets Large DRAM memory with access time T’ Ingress SRAMEgress SRAM cache of FIFO heads 1 Q 1 Q 1 Q b cells RR Arbiter b cellsb Write Access Read Access Time = T= 2T’ Memory Management Algorithm cache of FIFO tails grants
5
Stanford University 5 System Design Parameters Main Parameters –SRAM Size –Latency faced by a cell System Parameters –I/O Bandwidth –Number of addresses Use single address on every DRAM Use different addresses on every DRAM –Use/Non Use of DRAM Burst Mode –(non) Existence of Bank conflicts
6
Stanford University 6 Today’s Talk… Optimize Main Parameters –Minimize latency at cost of SRAM size –(Necessity and Sufficiency) …… (later) Minimize SRAM size at cost of Latency Assumptions on system parameters No speedup on I/O –I/O = 2R Simple address architecture –Use single address from every DRAM
7
Stanford University 7 More Assumptions.. We shall assume that we have only cells of size “C” which arrive in the system No use of DRAM Burst Mode No bank conflicts
8
Stanford University 8 Symmetry Argument The analysis and working of the ingress and egress buffer architectures are similar We shall analyze only the egress buffer architecture
9
Stanford University 9 A Bad Case for the Queues …1 t = 0 t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 t = 7 w
10
Stanford University 10 A Bad Case for the Queues … 2 t = 8 t = 9 t = 10 t = 11 t = 12 t = 13 t = 14 … t = 17
11
Stanford University 11 Observation There exists some value of “w” for which the buffer does not overflow w = qb is one such sufficient value Threshold value “T i ” governs “w”. w TiTi b -1 Q
12
Stanford University 12 Definitions Occupancy –This is the number of cells in the SRAM for a particular queue Active Queue –An active queue is one which has an occupancy less than the threshold and has cells in the DRAM present for it
13
Stanford University 13 One More Definition Deficit –This is defined as the difference between the threshold ‘T’ and the occupancy of an active queue. –For a queue which is not active the deficit is zero occupancy b -1 deficit TiTi
14
Stanford University 14 Can we Bound the Maximum Value of the Deficit? Define f(i,q) –The maximum deficit that a set of “i” queues can have in a system of “q” queues We are interested in f(1,q) f(q,q) < qb …. trivially
15
Stanford University 15 Largest Deficit Queue First Recurrence Equations f(2,q) >= f(1,q) –b + [f(1,q) –b] f(3,q) >= f(2,q) –b + [f(2,q) –b]/2 f(4,q) >= f(3,q) –b + [f(3,q) –b]/3 …… f(q,q) >= f(q-1,q) –b + [f(q-1,q) –b]/(q-1)
16
Stanford University 16 Dirty Math.. qb > f(q,q) … trivially >= [f(q-1,q) –b] + [f(q-1,q) –b]/(q-1) >= f(q-1,q)(q/q-1) – b(q/q-1) >= {f(q-2,q)(q-1/q-2) –b(q-1/q-2)}(q/q-1) – b[q/q-1] >= f(q-2,q)q/q-2 –bq/q-2 –bq/q-1 >= f(q-3,q)q/q-3 –bq/q-3 –bq/q-2 - bq/q-1 ….. >= f(1,q) q/1 – bq sigma [1/i] This gives, f(1,q) <= b[1 + ln q]
17
Stanford University 17 Results If the MMA services the queue, –with the largest deficit & –has a simple address architecture –and no I/O speedup then –A latency of zero can be guaranteed when the –width of the SRAM is b[1 + lnq] + b = b [2 + ln q] –And the size of SRAM is [2 + lnq]qb
18
Stanford University 18 Necessity Traffic Pattern – b=2, q=8 t = 0 t = 8 t = 8 +8/2 t = 8 + 8/2 + 8/4
19
Stanford University 19 Necessity Analysis … 1 In 1 st iteration –q(b-1/b) queues with deficit 1 In 2 nd iteration –q(b-1/b) 2 queues with deficit 2 In xth iteration –q(b-1/b) x = 1 queues with deficit x X = log (b/b-1) q = ln q/ ln (1 +1/b-1) ; (Use ln (1+x) = x) = ln q(b-1)
20
Stanford University 20 Necessity Analysis ….2 In xth iteration –We can delete another “b” –Deficit is x + b = ln q(b-1) + b = b[ 1 + ln q(b-1)/b] = approx b [1 + lnq] Width of SRAM = b [2 + lnq] Size of SRAM = qb[2 + lnq]
21
Stanford University 21 A Dose of Reality Typical values –“b” is typically <= 10 –q = Np, where N = # of ports (for VOQ) p = number of classes per port Implementations –VOQ N = 32, p = 1, q = 2 5, b = 2 3, SRAM = 700 kb –Diffserv N = 32, p = 16, q = 2 9, b = 2 3, SRAM = 17 Mb –Intserv Lets not think about it!
22
Stanford University 22 Future Work Discussion on trading off latency for SRAM size Analysis of other parameters –Relaxing I/O, address constraints Implementation Pain …. Still a long way to go
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.