Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Switches Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing (From Kwangwoon Univ.)

Similar presentations


Presentation on theme: "Fast Switches Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing (From Kwangwoon Univ.)"— Presentation transcript:

1 Fast Switches Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing (From Kwangwoon Univ.)

2 Introduction The way of the determination and setting the path –Centralized control : single point control –Distributed control : per input port processing –Self-routing : autonomous control –Distributed control & Self-routing Advantage : don’t limit scalability Disadvantage : difficult global optimization Blocking Characteristics –Strictly nonblocking –Wide-sense nonblocking : switching algorithm (set path) –Rearrangeably nonblocking : rearrange path –Virtually nonblocking : low probability of blocking Nonblocking Switch Fabric Principle : Avoid blocking by space-division parallelism, internal speedup, and internal pipelined buffering with cut-through

3 Buffering Why buffering? –If all traffic were uniform, buffering would not be needed. –If traffic is bursty, however, buffering would be needed because packets are to be dropped. IN 1 IN 2 OUT collisions delayed

4 Buffering Buffer Location –Unbuffered This way is undesirable for fast packet switches.(buffering) Optical components are suitable, because there is no way to queue. –Dealing with an optical burst switch »Dropping burst and retransmitting end-to-end are enable. »Burst be deflected by scheduling. »Convert burst to the electronic domain for queueing. –Internally buffered Increases complexity –Input or ouput queued –Input AND ouput queued –Shared buffer switch A logical partitioning of physical memory

5 Buffering Buffer location –Unbuffered vs internally buffered

6 Buffering Buffer Location –Input or output buffered switches.

7 Buffering Buffer Location –Combined input/output buffered switch

8 Buffering Buffer Location –Shared buffer switch

9 Buffering Head-of-line blocking –Input queueing Input queueing holds packets until the switch is able to direct them to the appropriate output –Output queueing Shared medium network due to contention from other network nodes for MAC –Speedup (S) : the ratio of internal to external data rates –Internal buffering –Internal expansion : clos fabic Head-of-line blocking Avoidance principle Output queueing requires internal speedup, expansion, or buffering. Virtual output requires additional queues or queuing complexity. The two techniques must be traded against on another,and can be used in combination

10 Buffering Head-of-Line Blocking –Input vursus output queueing

11 Buffering Head-of-line blocking –Clos fabric

12 Buffering Virtual Output Queueing –This scheme requires that packets be multiplexed and timestamped todetermain the arrival orderamong the queues at each input –A scheduling algorithm can be applied to determine which packets to accept to match a set of nonconflicting output –Disadvantage Waste of buffer space –tradoff Increase memory density for more queues practical Increased logic density makes more complex hardware

13 Buffering Virtual Output Queueing : –Head of line blocking can be eliminated

14 Single-Stage Shared Elements Bus Interconnects i 0 i 1 i 2 i 3 i 4 i 5 i 6 i 7 o 0 o 1 o 2 o 3 o 4 o 5 o 6 o 7 w

15 Single-Stage Shared Elements Bus Interconnects –Packet must wait in input queues until the bus is free –Aggregate throughput : r i < w/nt (w:bandwidth, 1/nt:n port, bit rate) –Bus speedup is limited by the available electronic technology –Multicast –Ring Switches –Throughput can be higher due to better ring utilization of the MAC protocol and the isolation of electrical effects.

16 Single-Stage Shared Elements Shared Memory Fabrics Shared memory Output demultiplex INPUTMULTIPLEXINPUTMULTIPLEX I0I1I2I3I4I5I6I7I0I1I2I3I4I5I6I7 o 0 o 1 o 2 o 3 o 4 o 5 o 6 o 7

17 Single-Stage Shared Elements Shared Memory Fabrics –Difficulties memory density is increaing exponentially, memory access time are not. Packet must typically be completely read into memory before being output –Multicast

18 Single-Stage Space Division Elements Basic Switch Element –Electronic Switch Elements : 2 * 2 switch element straightcrossduplicate Control Packet buffer Cut-through Output multiplexor o0o0 i1i1 c i0i0 o1o1

19 Single-Stage Space Division Elements Basic Switch Element –Electronic Switch Elements(2 * 2 Self-routing switch element) Control Packet buffer Cut-through Output multiplexor o0o0 i1i1 i0i0 o1o1 Header decode Header decode delay

20 Single-Stage Space Division Elements Basic Switch Element –Optical Switch Elements electrode i0i0 i1i1 o0o0 o0o0 i0i0 i1i1 o0o0 o0o0 Cross state straight state (voltage applied)

21 Single-Stage Space Division Elements Crossbar –Crossbar switch point states column ojoj i ojoj i electronic Optical MEMS crossturn duplicate

22 Multistage Switches Crossbar –Advantage Simple and regularity –Disadvantage Scaling complexity(n 2 ) Simple model of the cost in chip area –A=a c + n(a i + a o ) + n 2 a x

23 Single-Stage Space Division Elements Crossbar –Crossbar switch I0I1I2I3I4I5I6I7I0I1I2I3I4I5I6I7 o 0 o 1 o 2 o 3 o 4 o 5 o 6 o 7

24 Multistage Switches Tiling Crossbar –Tile switch elements in a square array –This is not a cost effective solution for large switches Multistage Interconnection Networks(MINs) –Delta switch advantage –Elimination of central switch control(self-routing) Disadvantage –Preservation of packet sequence since cell has same path –Load is not distributed –Benes switch Dinamically route packets with additional stages. –Resequencing buffer by using a timestamp inserted into the internal switch header –Banyan switch Using shared memory and crossbar switchs.

25 1010 Multistage Switches Multistage Interconnection Networks –Delta switch I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I 13 I 14 I 15 o 0 o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 o 9 o 10 o 11 o 12 o 13 o 14 o 15

26 Multistage Switches Multistage Interconnection Networks –Benes switch I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I 13 I 14 I 15 o 0 o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 o 9 o 10 o 11 o 12 o 13 o 14 o 15 s 0 s 1 s 2 s 3 s 4 s 5 s 1010 1010

27 Multistage Switches Multistage Interconnection Networks –Banyan switch I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I 13 I 14 I 15 o 0 o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 o 9 o 10 o 11 o 12 o 13 o 14 o 15 S0S0 S1S1

28 Multistage Switches Multistage Interconnection Networks –Optical Multistage Networks Incapable of buffering : nonblocking bufferless interconnection fabrics Crosstalk problem : dilation techniques Dilated Benes switch Pass Cross

29 Multistage Switches Scaling Speed(parallel switch slices) datadelay σ0σ0 σ m-1 ioio i1i1 I n-1 coco c1c1 c n-1 o n-1 o1o1 o1o1 Fabric Control

30 Multicast Support Crossbar Switch Multicast –Service disciplines No fanout splitting : according to output blocking Fanout splitting –The Goal of schedule servicing Throughput is high Some fairness measure is met, in particular packets should not be starved The scheduling discipline can be implemented at high-speed(line rate) –Variety of scheduling are possible Concentrates residue among as few inputs as possible Weight based

31 Multicast Support Crossbar Switch Multicast scheduling I 1 I 2 I 3 I 4 I 5 1_3_5 _2345_2345 1234_ _23_5 _2_4_ 1 1 122 342 5 2 3334 54 o 1 o 2 o 3 o 4 o 5

32 Multicast Support Multistage Fabric Multicast I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I 13 I 14 I 15 o 0 o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 o 9 o 10 o 11 o 12 o 13 o 14 o 15 Copy stagesRouting stagesTranslate 0000 0100 1010 1110

33 Review – Fast Packet Switching 80’s link rate technology improvement. Connection-oriented fast packet switching technologies for high speed networks. 90’s widely deployed. –ex. ATM for high-speed backbone networks Benefit (5.3) –Simplifying packet processing and forwarding. –Eliminating the store and forward latency. –Provide QoS quarantees, Resources reservation.

34 Fast Datagram Switches Resisted the global deployment of CON. –IP-based Internet, WWW. –shared medium link protocols were overcome Fast Datagram Switches –Motivation High Performance maintaining. Support Connectionless networks. –Derivation Complexity of Switch input and output processing

35 Fast Packet Switching Architecture

36 Connection-Oriented Vs Connectionless Similarity –At a high-level, Each switch has the same functional block. –Ex. Routing, Signaling, Management… Difference –Input processing Address lookup using a prefix table. Packet classification. –Output processing Packet scheduling to meet QoS requirement.

37 Architecture of Fast Datagram Switching

38 Packet Processing Rates Design a switch –Datagram size : Min 40Byte ~ Max 1500Byte. –Rule of thumb : average packet size. Form of Processing –Sequentially processing for minimum packet size. –Parallel processing for average packet size. Packet Processing Rate The Packet processing rate is a key throughput measure of a switch. Packet processing software and shared parallel hardware resources must be able to sustain the average packet processing rate.

39 Fast Forwarding Lookup Review - Fast Packet Switching –CID for Fast Packet Switching. –Problem : Table entry size. Fast Datagram Switching –Problem : similar to Fast Packet Switching. –Solutions Flat Addressing Hierarchical Address Software Prefix Matching Hardware Matching Support Source Routing

40 Flat Addressing

41 Software Search Lookup Time –Worst case : minimum packet size, worst-case lookup algorithm Memory Required –Trade Off( performance vs cost ). –Amount of memory reasonable to contain in the switch input processing. Update Time –Lookup data structure. Techniques –Tree search( O(log B N) for N entries, B is branch factor ). –Hash function( O(1) for no hash collisions ).

42 Content Addressable Memory(CAM) Feature –Parallel scan –Memory Access –Referencing One(by Key) –Return Associate Data Benefit –Initutive & speed Model Each word consists of a. All words are checked in parallel in a single CAM cycle. Return-field portion of the word is the output of the CAM read. CAMs specifically designed for network address lookup. Key Data Association Data

43 Hierarchical Addresses Exploited to reduce the size of the forwarding tables. Forwarding entries can be represented as prefix addresses. Higher order bit portion of an address that must be matched to lead toward the destination. Similar to PSTN.

44 Software Prefix Matching

45 Basic Prefix Matching Algorithm

46 Hardware Matching Support Motivation –Complexity of software algorithms. Hardware techniques for line rate lookup. –Assisting logic can be Embedded in the memory. CAMs for Variable Prefixes. –Translation logic can be provided that assists the location of addresses in conventional memory. Multistage Lookup.

47 CAMs for Variable Prefixes

48 Multistage Lookup

49 Source Routing Eliminate the per hop address lookup –By precomputing the route. Include the entire path in the packet header.

50 Packet Classification Two other common forms –Separation of control packets. –Separation of packets belonging to different traffic classes. General classification include –Classification into a QoS traffic class. –Policy based routing. –Security. –Higher-layer switching functions. –Active networking.

51 Packet Filtering Problem Classification occur before queueing in the node. General problem of classification.

52 Packet Classification Implements Hardware Classification –Ternary CAMs can be used to match the rules in parallel. –Similar to the address lookup. Software Classification –Forwarding table lookup(section 5.1.1). –“Grid of tries”, “Tuple space search”. Preprocessing Classifiers –Preprocess all possible packet fields.

53 Output Processing and Packet Scheduling (1) Reasons to perform output scheduling –Datagrams are consist of Quranteed Service classes. Best Effort Traffic. –Sufficient to meet delay and bandwidth bounds. –Fair service among the best-effort flows. –Congestion control mechanisms does not protect quaranteed service classes from the best-effort traffic.

54 Output Processing and Packet Scheduling (2) Fair Queuing –Packet Fair Queuing(PFQ). –Weighted Fair Queuing(WFQ). Per-Flow Queuing –The highest degree of isolation. –Control occurs when per flow queuing is used. Congestion Control –Large building queues increase delay, resulting in congestion. –Discard to keep Queues from building. –Ex. RED(Random early detection)

55 Higher-Layer and Active Processing Active networking uses general classification techniques. –First, identify packets for active processing. –Executes active applications in the network nodes on the identified packets, connections, or flows to provide the desired service. Motivation for “Active networking” –Open flexible interfaces to allow provisioning of new protocols and services. Condition for “Active networking” –Should not impede the non-Active fast path.

56 Active Network Node Reference Model


Download ppt "Fast Switches Switch Fabric Architecture Fast Datagram Switches Higher-Layer and Active Processing (From Kwangwoon Univ.)"

Similar presentations


Ads by Google