Download presentation
Presentation is loading. Please wait.
Published byEmery Young Modified over 9 years ago
1
An Introduction to Packet Switching Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm
2
Sir William Preece, Chief of the British Postal System, 1876: “The Americans may have need of the telephone, but we do not. We have plenty of messenger boys.”
3
Outline Introduction What is a packet-switch? The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
4
Introduction What is a Packet Switch? Introduction What is a packet-switch? –Basic Architectural Components –Some Example Packet Switches –The Evolution of IP Routers The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
5
Basic Architectural Components Policing Output Scheduling Switching Routing Congestion Control Reservation Admission Control Datapath: per-packet processing
6
Basic Architectural Components Datapath: per-packet processing Forwarding Decision Forwarding Decision Forwarding Decision Forwarding Table Forwarding Table Forwarding Table Interconnect Output Scheduling 1. 2. 3.
7
Where high performance packet switches are used Enterprise WAN access & Enterprise Campus Switch - Carrier Class Core Router - ATM Switch - Frame Relay Switch The Internet Core Edge Router
8
Introduction What is a Packet Switch? Introduction What is a packet-switch? –Basic Architectural Components –Some Example Packet Switches –The Evolution of IP Routers The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
9
ATM Switch Lookup cell VCI/VPI in VC table. Replace old VCI/VPI with new. Forward cell to outgoing interface. Transmit cell onto link.
10
Ethernet Switch Lookup frame DA in forwarding table. –If known, forward to correct port. –If unknown, broadcast to all ports. Learn SA of incoming frame. Forward frame to outgoing interface. Transmit frame onto link.
11
IP Router Lookup packet DA in forwarding table. –If known, forward to correct port. –If unknown, drop packet. Decrement TTL, update header Cksum. Forward packet to outgoing interface. Transmit packet onto link.
12
Introduction What is a Packet Switch? Introduction What is a packet-switch? –Basic Architectural Components –Some Example Packet Switches –The Evolution of IP Routers The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
13
First Generation Packet Switches Shared Backplane Line Interface CPU Memory CPU Buffer Memory Line Interface DMA MAC Line Interface DMA MAC Line Interface DMA MAC Fixed length “DMA” blocks or cells. Reassembled on egress linecard Fixed length cells or variable length packets
14
Second Generation Packet Switches CPU Buffer Memory Line Card DMA MAC Local Buffer Memory Line Card DMA MAC Local Buffer Memory Line Card DMA MAC Local Buffer Memory
15
Third Generation Packet Switches Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory Switched Backplane Line Interface CPU Memory
16
Fourth Generation Packet Switches
17
Outline Introduction What is a packet-switch? The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
18
Two Basic Techniques Input-queued Crossbar Shared Memory 1+1 = 2 operations per cell time N+N = 2N operations per cell time
19
Shared Memory The Ideal A ZZ A ZZZ A A Z A ZPIKTD AAAAAAA FXHBAD Numerous work has proven and made possible: –Fairness –Delay Guarantees –Delay Variation Control –Loss Guarantees –Statistical Guarantees
20
A Comparison Memory speeds for 32x32 switch Line RateMemory BW Access Time Per cell Memory BW Access Time Shared-Memory Input-queued 100 Mb/s6.4 Gb/s80 ns200 Mb/s2.12 s 1 Gb/s64 Gb/s8 ns2 Gb/s212 ns 2.5 Gb/s160 Gb/s3.2 ns5 Gb/s84.8 ns 10 Gb/s640 Gb/s 0.8 ns 20 Gb/s21.2 ns
21
Buffer Memory How Fast Can I Make a Packet Buffer? Buffer Memory 5ns SRAM Rough Estimate: –5ns per memory operation. –Two memory operations per packet. –Therefore, maximum 51.2Gb/s. –In practice, closer to 40Gb/s. 64-byte wide bus
22
Buffer Memory Is It Going to Get Better? time Specmarks, Memory size, Gate density time Memory Bandwidth (to core)
23
Progression Shared Memory Input Queued Combined Input and Output Queued Parallel Packet Switches 3 7 5 2 6 0 1 4 7 2 3 5 6 1 0 4 7 5 2 3 1 0 6 4 7 0 5 1 3 4 2 6 7 4 5 6 0 3 1 2 7 6 4 5 3 2 0 2 7 6 5 4 3 2 1 0 000 001 010 011 100 101 110 111 Batcher SorterSelf-Routing Network Multi stage
24
Outline Introduction What is a packet-switch? The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
25
Input Queueing configuration Data In Data Out Scheduler Memory b/w = 2R
26
Input Queueing Head of Line Blocking Delay Load 58.6% 100%
27
Head of Line Blocking
30
Input Queueing Virtual output queues
31
Input Queues Virtual Output Queues Delay Load 100% Proof by Lyapunov function
32
Outline Introduction What is a packet-switch? The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
33
The Speedup Problem Find a compromise: 1 < Speedup << N - to get the performance of a shared memory switch - close to the cost of an IQ switch
34
Some Early Approaches Probabilistic Analyses - assume traffic models (Bernoulli, Markov-modulated, Numerical Methods - use actual and simulated traffic traces - run different algorithms - set the “speedup dial” at various values non-uniform loading, “friendly correlated”) - obtain mean throughput and delays, bounds on tails - analyze different fabrics (crossbar, multistage, etc)
35
The findings Very tantalizing... - under different settings (traffic, loading, algorithm, etc) - and even for varying switch sizes A speedup of between 2 and 5 was sufficient!
36
Using Speedup 1 1 1 2 2
37
The Ideal Solution NN Output Queued Switch 1 N = ? Combined Input-Output Queued Switch 1 N
38
Interesting Result Theorem: For a switch with combined input and output queueing to exactly mimic an output queued switch, for all types of traffic, a speedup of 2-1/N is necessary and sufficient. Joint work with Balaji Prabhakar, Ashish Goel and Shang-tse Chuang.
39
Outline Introduction What is a packet-switch? The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
40
Optical Physical Layers… …are Going to Make Things “Worse” DWDM: –More ’s per fiber more ports per switch. –# ports: 16, …, 1000’s. Data rate: –More b/s per higher capacity. –Data rates: 2.5Gb/s, 10Gb/s, 40Gb/s, 160Gb/s, …
41
Approach #1: Ping-pong Buffering Buffer Memory 64-byte wide bus Buffer Memory 64-byte wide bus
42
Approach #1: Ping-pong Buffering Buffer Memory 64-byte wide bus Buffer Memory 64-byte wide bus Memory bandwidth doubled to ~80 Gb/s
43
Approach #2: Multiple Parallel Buffers aka Banking, Interleaving Buffer Memory Buffer Memory Buffer Memory Buffer Memory
44
The Fork Join Router 1 2 k 1 N rate, R 1 N Router Bufferless
45
The Fork Join Router Advantages –k memory bandwidth –k lookup/classification rate –k routing/classification table size Problems –How to demultiplex prior to lookup/classification? –How does the system perform/behave? –Can we predict/guarantee performance?
46
A Parallel Packet Switch 1 N rate, R 1 N Output Queued Switch Output Queued Switch Output Queued Switch 1 2 k
47
Parallel Packet Switch Questions 1.Can it be work-conserving? 2.Can it emulate a single big shared memory switch? 3.Can it support delay guarantees, strict-priorities, WFQ, …?
48
Parallel Packet Switch Work Conservation rate, R 1 1 2 k 1 R/k Input Link Constraint Output Link Constraint
49
Parallel Packet Switch Work Conservation rate, R 1 1 2 k 1 R/k 1 2 3 Output Link Constraint 45 1 2 3 4 1234115
50
Parallel Packet Switch Work Conservation 1 N rate, R 1 N Output Queued Switch Output Queued Switch Output Queued Switch 1 2 k S(R/k)
51
Parallel Packet Switch Theorems 1.If S > 2k/(k+2) 2 then a parallel packet switch can be work- conserving for all traffic. 2.If S > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.
52
Parallel Packet Switch Theorems 3. If S > 3k/(k+3) 3 then a parallel packet switch can be precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic. With Sundar Iyer and Amr Awadallah
53
Precise Emulation of an FCFS Shared Memory Switch NN Shared Memory 1 N Parallel Packet Switch = ? 1 N 1 N
54
An aside Unbuffered Clos Circuit Switch Expansion factor required = 2-1/N
55
Clos Network I1I1 IXIX a b c O1O1 OXOX m { }m}m }m}m O 1 O 2 O 3 O x I 1 I 2 I 3 I x b <= min(R,m) entries in each row <= min(R,m) entries in each column R middle stage switches
56
Clos Network I1I1 IXIX a b c O1O1 OXOX m { }m}m }m}m O 1 O 2 O 3 O x I 1 I 2 I 3 I x b <= min(R,m) entries in each row <= min(R,m) entries in each column R middle stage switches Define: UIL(I i ) = used links at switch I i to connect to middle stages. UOL(O i ) = used links at switch O i to connect to middle stages. If we wish to connect I i to O i : When adding connection: |UIL(I i )| <= m-1 and |UOL(O i )| <= m-1 Worst-case: |UIL(I i ) U UOL(O i )| = 2m -2 Therefore, if R >= 2m-2 there are always enough middle stages.
57
An aside Unbuffered Clos Circuit Switch Expansion factor required = 2-1/N
58
Outline Introduction What is a packet-switch? The Memory Bandwidth Problem Input-Queued Switches Reducing memory bandwidth requirements Combined Input-Output Queued Switches Making input-queued switches useful Parallel Packet Switches Further reducing memory b/width requirements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.