Download presentation
Presentation is loading. Please wait.
Published byBertram Jacobs Modified over 9 years ago
1
Lecture Note on Switch Architectures
2
Function of Switch
3
Naive Way
4
Bus-Based Switch No buffering at input port processor (IPP) Output port processor (OPP) buffers cells Controller exchanges control message with terminals and other controller. Disadvantage: –Bus bandwidth is equal to sum of external link for non-blocking –IPPs and OPPs must operate at full bus bandwidth –Bus width increases with number of links
5
Centralized Bus Arbitration IPPs send requests to central arbiter Request may includes: –Priority –Waiting time –OPP destination(s) –Length of IPP queue Arbitration complexity is O(N^2) Distributed version is preferred, but may degrade throughput.
6
Bus Arbitration Using Rotating Daisy Chain Rotating token eliminates positional favoritism
7
Ring Switch Same bandwidth and complexity as bus switch Avoids capacitive loading of bus, allowing higher clock frequencies Control mechanisms –Token passing –Slotted ring with busy bit
8
Shared Buffer Switch Individual queues are rarely full. Shared memory needs two times of external link bandwidth Require less memory Better ability to handle burst traffic
9
Crossbar Switch
10
Output Buffering Efficient, but needs N time speed up internally.
11
Input Buffering Multiple packets simultaneously transmitted distinct outputs. Require sophisticated arbitration No speed up required Head-of-line blocking
15
Bi-partite Matching Require global information Complexity is O(N^(5/2)) Not suitable for hardware implementation May leads to starvation
16
Desired Arbitration Algorithms High throughput –Low backlog in each input queue –Close to 100% for each input and output Starvation free –No queue will be hold indefinitely Simple to implement
17
Options to Build High Performance Switches Bufferless crossbar Buffered crossbar Shared buffer
18
Bufferless Crossbar Centralized arbitrator is required –Arbitration complexity is O(N*N) –O(log 2 N) iterations of arbitration needed for high throughput Synchronization in all elements Single point failure: central arbitrator Complex line interface
19
Buffered Crossbar Simple scheduling algorithms –Ingress: O(1) –Egress: O(N) Inefficient use of memory –Memory linearly increased with number of ports
20
Shared Memory No central arbitrator needed Reduced memory requirements Distributed flow control Less timing constrains Simpler line card interface
21
Comparisons (1)
22
Comparison (2) Assume 10G for each port and packet size is 64 bytes.
23
Scaling Number of Ports Single larger switch is less expensive, more reliable, easier to maintain and offer better performance, but –O(n 2 ) complexity –Board-level buses limited by capacitive loading Port multiplexing Buffered multistage routing –Dynamic routing: Benes network –Static routing: Clos network Bufferless multistage routing –Deflection routing
24
Port Multiplexing High speed core can handles high speed links as well as low speed Sharing of common circuitry Reduced complexity in interconnection network Better queueing performance for bursty traffic Less fragmentation of bandwidth and memory
25
Dynamic Routing – Benes Network Network expanded by adding stages on left and right –2k-1 stages with d port switch elements supports d k ports Traffic distribution on first k-1 stages Routing on last k stages Internal load external load Traffic maybe out of order: need re-sequencing
26
Static Routing - Clos network All traffic follows same path r 2d-1 to be strict non-blocking.
27
Deflection Routing
28
Basic Architectural Components: Forwarding Decision Forwarding Decision Forwarding Decision Forwarding Decision Forwarding Table Forwarding Table Interconnect Output Scheduling 1. 2. 3. Forwarding Table
29
ATM Switches Direct Lookup VCI Address Memory Data (Port, VCI)
30
Hashing Function CRC-16 16 Linked lists #1#2#3#4 #1#2 #1#2#3 Memory Search Data 48 log 2 N Associated Data Hit? Address { Ethernet Switches Hashing Advantages Simple Expected lookup time can be small Disadvantages Non-deterministic lookup time Inefficient use of memory
31
Per-Packet Processing in IP Routers 1. Accept packet arriving on an incoming link. 2. Lookup packet destination address in the forwarding table, to identify outgoing port(s). 3. Manipulate packet header: e.g., update header checksum. 4. Send packet to the outgoing port(s). 5. Classify and buffer packet in the queue. 6. Transmit packet onto outgoing link.
32
IP Router Lookup Destination Address Next Hop Link ---- DestinationNext Hop Forwarding Table Next Hop Computation Forwarding Engine Incoming Packet HEADERHEADER
33
payload Lookup and Forwarding Engine header Packet Router Destination Address Outgoing Port Forwarding Engine Lookup Data
34
Example Forwarding Table Destination IP PrefixOutgoing Port 65.0.0.0/ 83 128.9.0.0/161 142.12.0.0/197 IP prefix: 1-32 bits Prefix length
35
Multiple Matching 128.9.16.0/21128.9.172.0/21 128.9.176.0/24 Routing lookup: Find the longest matching prefix (or the most specific route) among all prefixes that match the destination address. 0 2 32 -1 128.9.0.0/16 142.12.0.0/19 65.0.0.0/8 128.9.16.14 Longest matching prefix
36
Longest Prefix Matching Problem 2-dimensional search Prefix Length Prefix Value Performance Metrics Lookup time Storage space Update time Preprocessing time
37
Required Lookup Rates 12540.0OC768c2002-05 31.2510.0OC192c2000-01 7.812.5OC48c1999-00 1.940.622OC12c1998-99 40B packets (Mpps) Line-rate (Gbps) LineYear DRAM: 50-80 ns, SRAM: 5-10 ns 31.25 Mpps 33 ns
38
Size of Forward Table 959697989900 Year Number of Prefixes 10,000/year Renewed growth due to multi-homing of enterprise networks
39
Trees and Tries Binary Search Tree <> <><> log 2 N N entries Binary Search Trie 01 0101 111010
40
Typical Profile of Forward Table Prefix Length Number Most prefixes are 24-bits or shorter
41
Basic Architectural Components: Interconnect Forwarding Decision Forwarding Decision Forwarding Decision Forwarding Table Forwarding Table Interconnect Output Scheduling 1. 2. 3. Forwarding Table
42
First-Generation Routers CPU Buffer Memory Line Interface DMA MAC Line Interface DMA MAC Line Interface DMA MAC
43
Second-Generation Routers CPU Buffer Memory Line Card DMA MAC LocalBufferMemory Line Card DMA MAC LocalBufferMemory Line Card DMA MAC LocalBufferMemory
44
Third-Generation Routers Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory
45
Switching Goals
46
Circuit Switches A switch that can handle N calls has N logical inputs and N logical outputs –N up to 200,000 Moves 8-bit samples from an input to an output port –Samples have no headers –Destination of sample depends on time at which it arrives at the switch In practice, input trunks are multiplexed –Multiplexed trunks carry frames, i.e., set of samples Extract samples from frame, and depending on position in frame, switch to output –each incoming sample has to get to the right output line and the right slot in the output frame
47
Blocking in Circuit Switches Can’t find a path from input to output Internal blocking –slot in output frame exists, but no path Output blocking –no slot in output frame is available
48
Time Division Switching Time division switching interchanges sample position within a frame: time slot interchange (TSI)
49
Scaling Issues with TSI
50
Space Division Switching Each sample takes a different path through the switch, depending on its destination
51
Time Space Switching
52
Time Space Time Switching
53
Packet Switches In a circuit switch, path of a sample is determined at time of connection establishment. No need for header. In a packet switch, packets carry a destination field or label. Need to look up destination port on-the-fly. –Datagram switches: lookup based on destination address –Label switches: lookup based on labels
54
Blocking in Packet Switches Can have both internal and output blocking Internal –no path to output Output –trunk unavailable Unlike a circuit switch, cannot predict if packets will block. If packet is blocked, must buffer or drop
55
Dealing with Blocking in Packet Switches Over-provisioning –internal links much faster than inputs Buffers –at input or output Backpressure –if switch fabric doesn’t have buffers, prevent packet from entering until path is available Parallel switch fabrics –increases effective switching capacity
56
Basic Architectural Components: Queuing, Classification Forwarding Decision Forwarding Decision Forwarding Decision Forwarding Table Forwarding Table Interconnect Output Scheduling 1. 2. 3. Forwarding Table
57
Techniques in Queuing Input Queueing Output Queueing
58
Individual Output QueuesCentralized Shared Memory Memory b/w = (N+1).R 1 2 N Memory b/w = 2N.R 1 2 N
59
Input Queuing
60
Input Queueing Performance Delay Load 58.6% 100%
61
Virtual Output Queues at Input Delay Load 100%
62
Classification Action ---- PredicateAction Classifier (Policy Database) Packet Classification Forwarding Engine Incoming Packet HEADERHEADER
63
Multi-Field Packet Classification Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.