CS 3035/GZ01: Networked Systems Kyle Jamieson Department of Computer Science University College London Inside Internet Routers.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Internetworking II: MPLS, Security, and Traffic Engineering
Delivery and Forwarding of
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Router Architecture : Building high-performance routers Ian Pratt
Prepared By: Eng.Ola M. Abd El-Latif
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Spring 2002CS 4611 Router Construction Outline Switched Fabrics IP Routers Tag Switching.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Big, Fast Routers Dave Andersen CMU CS
CS 268: Lectures 13/14 (Route Lookup and Packet Classification) Ion Stoica April 1/3, 2002.
CS 268: Route Lookup and Packet Classification Ion Stoica March 11, 2003.
EE 122: Router Design Kevin Lai September 25, 2002.
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
CS 268: Route Lookup and Packet Classification
Delivery, Forwarding, and Routing
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
4: Network Layer4b-1 Router Architecture Overview Two key router functions: r run routing algorithms/protocol (RIP, OSPF, BGP) r switching datagrams from.
Chapter 4 Queuing, Datagrams, and Addressing
Computer Networks Switching Professor Hui Zhang
TCP/IP Protocol Suite 1 Chapter 6 Upon completion you will be able to: Delivery, Forwarding, and Routing of IP Packets Understand the different types of.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
IP Address Lookup Masoud Sabaei Assistant professor
Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.
Router Architecture Overview
Advance Computer Networking L-8 Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan.
Dr. Clincy1 Chapter 6 Delivery & Forwarding of IP Packets Lecture #4 Items you should understand by now – before routing Physical Addressing – with in.
Chapter 6 Delivery and Forwarding of IP Packets
Local-Area-Network (LAN) Architecture Department of Computer Science Southern Illinois University Edwardsville Fall, 2013 Dr. Hiroshi Fujinoki
Delivery, Forwarding, and Routing of IP Packets
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
Computer Networks: Switching and Queuing Ivan Marsic Rutgers University Chapter 4 – Switching and Queuing Delay Models.
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
CS 4396 Computer Networks Lab Router Architectures.
A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.
Forwarding.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
1 Kyung Hee University Chapter 6 Delivery Forwarding, and Routing of IP Packets.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Lecture Note on Switch Architectures. Function of Switch.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
Packet Switch Architectures The following are (sometimes modified and rearranged slides) from an ACM Sigcomm 99 Tutorial by Nick McKeown and Balaji Prabhakar,
Spring 2000CS 4611 Router Construction Outline Switched Fabrics IP Routers Extensible (Active) Routers.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
Chapter 3 Part 3 Switching and Bridging
Packet Forwarding.
Chapter 6 Delivery & Forwarding of IP Packets
Addressing: Router Design
Chapter 4: Network Layer
Chapter 3 Part 3 Switching and Bridging
What’s “Inside” a Router?
Advance Computer Networking
Bridges and Extended LANs
Router Construction Outline Switched Fabrics IP Routers
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Chapter 3 Part 3 Switching and Bridging
Network Layer: Control/data plane, addressing, routers
Delivery, Forwarding, and Routing of IP Packets
CS 6290 Many-core & Interconnect
Chapter 4: Network Layer
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Presentation transcript:

CS 3035/GZ01: Networked Systems Kyle Jamieson Department of Computer Science University College London Inside Internet Routers

Today: Inside Internet routers 1.Longest-prefix lookup for IP forwarding – The Luleå algorithm 2.Router architecture Cisco CRS-1 Carrier Routing System Networked Systems 3035/GZ01 2

The IP forwarding problem Core Internet links have extremely fast line speeds: – SONET optical fiber links 2.4 Gbits/s: backbones of secondary ISPs 10 Gbits/s: widespread in the core 40 Gbits/s: found in many core links Internet routers must handle minimum-sized packets (40−64 bytes) at the line speed of the link – Minimum-sized packets are the hardest case for IP lookup They require the most lookups per second At 10 Gbits/s, have 32−51 ns to decide for each packet Compare: DRAM latency ≈ 50 ns; SRAM latency ≈ 5 ns Networked Systems 3035/GZ01 3

The IP forwarding problem (2) Given an incoming packet with IP destination D, choose an output port for the packet by longest prefix match rule – Then we will configure the switching fabric to connect the input port of the packet to the chosen output port What kind of data structure can we use for longest prefix match? Networked Systems 3035/GZ01 4

Longest prefix match (LPM) Given an incoming IP datagram to destination D – The forwarding table maps a prefix to an outgoing port: Longest prefix match rule: Choose the forwarding table entry with the longest prefix P/x that matches D in the first x bits Forward the IP datagram out the chosen entry’s outgoing port Networked Systems 3035/GZ01 5 PrefixPort / / / /232

Computing the longest prefix match (LPM) The destination matches forwarding table entry /4 Destination: Forwarding Table: PrefixPort ✔ / / / / Destination: Prefix (/4): IP Header Networked Systems 3035/GZ01 6

Computing the longest prefix match (LPM) No match for forwarding table entry /17 Destination: Forwarding Table: PrefixPort ✔ /4 2 ✗ / / / Destination: Prefix (/17): IP Header Networked Systems 3035/GZ01 7

Computing the longest prefix match (LPM) The destination matches forwarding table entry /21 Destination: Forwarding Table: PrefixPort ✔ /4 2 ✗ /17 1 ✔ / / Destination: Prefix (/21): IP Header Networked Systems 3035/GZ01 8

Computing the longest prefix match (LPM) The destination matches forwarding table entry /23 Destination: Forwarding Table: PrefixPort ✔ /4 2 ✗ /17 1 ✔ /21 3 ✔ / Destination: Prefix (/21): IP Header Networked Systems 3035/GZ01 9

Computing the longest prefix match (LPM) Applying the longest prefix match rule, we consider only all three matching prefixes We choose the longest, /23 Destination: Forwarding Table: PrefixPort ✔ /4 2 ✗ /17 1 ✔ /21 3 ✔ /23 2 IP Header Networked Systems 3035/GZ01 10

LPM: Performance How fast does the preceding algorithm run? Number of steps is linear in size of the forwarding table – Today, that means 200,000−250,000 entries! – And, the router may have just tens of nanoseconds before the next packet arrives – Recall, DRAM latency ≈ 50 ns; SRAM latency ≈ 5 ns So, we need much greater efficiency to keep up with line speed – Better algorithms – Hardware implementations Algorithmic problem: How do we do LPM faster than a linear scan? Networked Systems 3035/GZ01 11

Store routing table prefixes, outgoing port in a binary tree For each routing table prefix P/x: – Begin at root of a binary tree – Start from the most-significant bit of the prefix – Traverse down i th -level branch corresponding to i th most significant bit of prefix, store prefix and port at depth x First attempt: Binary tree Networked Systems 3035/GZ01 12 Length (bits) /1 80/ /1 Root C0/2 Note convention: Hexadecimal notation

Example: /4 in the binary tree Networked Systems 3035/GZ xC /4: PrefixPort / /1 Root C0/2 C0/3 C0/4: Port Forwarding Table:

When a packet arrives: – Walk down the tree based on the destination address – The deepest matching node corresponds to the longest- prefix match How much time do we need to perform an IP lookup? – Still need to keep big routing table in slow DRAM – In the worst case, scales directly with the number of bits in longest prefix, each of which involves a slow memory lookup – Back-of-the-envelope calculation: 20-bit prefix × 50 ns DRAM latency = 1 μs (goal, ≈ 32 ns) Routing table lookups with the binary tree Networked Systems 3035/GZ01 14

Luleå algorithm Degermark et al., Small Forwarding Tables for Fast Routing Lookups (ACM SIGCOMM ‘97) Observation: Binary tree is too large – Won’t fit in fast CPU cache memory in a software router – Memory bandwidth becomes limiting factor in a hardware router Therefore, goal becomes: How can we minimize memory accesses and the size of data structure? – Method for compressing the binary tree – Compact 40K routing entries into 150−160 Kbytes – So we can use SRAM for the lookup, and thus perform IP lookup at Gigabit line rates! Networked Systems 3035/GZ01 15

Luleå algorithm: Binary tree The full binary tree has a height of 32 (number of bits in IP address) – So has 2 32 leaves (one for each possible IP address) Luleå stores prefixes of different lengths differently – Level 1 stores prefixes ≤ 16 bits in length – Level 2 stores prefixes bits in length – Level 3 stores prefixes bits in length 31 IP address space: 2 32 possible addresses 0 Networked Systems 3035/GZ01 16

Luleå algorithm: Level 1 Imagine a cut across the binary tree at level 16 Construct a length 2 16 bit vector that contains information about the routing table entries at or above the cut – Bit vector stores one bit per /16 prefix Let’s zoom in on the binary tree here: 31 IP address space: 2 32 possible addresses 0 Level-16 cut Networked Systems 3035/GZ01 17

Constructing the bit vector Put a 1 in the bit vector wherever there is a routing table entry at depth ≤ … … … – If the entry is above level 16, follow pointers left (i.e., 0) down to level 16 These routing table entries are called genuine heads Networked Systems 3035/GZ01 18

Constructing the bit vector Put a 0 in the bit vector wherever a genuine head at depth < 16 contains this /16 prefix … … … For example, 0000/15 contains 0001/16 (they have the same next hop) These routing table entries are called members Networked Systems 3035/GZ01 19

Constructing the bit vector Put a 1 in the bit vector wherever there are routing entries at depth > 16, below the cut … … … These bits correspond to root heads, and are stored in Levels 2 and 3 of the data structure Networked Systems 3035/GZ01 20

Bit masks The bit vector is divided into bit bit masks From the figure below, notice: – The first 16 bits of the lookup IP address index the bit in the bit vector – The first 12 bits of the lookup IP address index the containing bit mask Networked Systems 3035/GZ01 21 Bit mask depth 12 Bit vector: 000/ /16 … … depth 16

The pointer array To compress routing table, Luleå stores neither binary tree nor bit vector Instead, pointer array has one entry per set bit (i.e., equal to 1) in the bit mask – For genuine heads, pointer field indexes into a next hop table – For root heads, pointer field indexes into a table of Level 2 chunks Given a lookup IP, how to compute the correct index into the pointer array, pix ? – No simple relationship between lookup IP and pix Networked Systems 3035/GZ01 22 ‸ ‸ GenuineNext hop index Pointer array type (2) pointer (14) Genuine Next hop index RootL2 chunk index pix … … … …

Finding the pointer group: Code word array Group the pointers associated with each bit mask together (pointer group) Networked Systems 3035/GZ / / /16 Bit mask for 000/12 (nine bits set)Bit mask for 001/12 (five bits set) 0 code: six 9 … 000: 001: : The code word array stores the index in the pointer array index where each pointer group begins in its six field –code has 2 12 entries, one for each bit mask – Indexed by top 12 bits of the lookup IP

Finding the pointer group: Base index array After four bit masks, 4 × 16 = 64 bits could be set, too many to represent in six bits, so we “reset” six to zero every fourth bit mask The base index array base stores the index in the pointer array where groups of four bit masks begin – Indexed by the top 10 bits of the lookup IP Networked Systems 3035/GZ code: six … 000: 001: : : : 004/12 000/10 000/12 (9 bits set)001/12 (5 bits set)002/12 (2 bits set)003/12 (2 bits set)004/12 (2 bits set) 004/10 0 base: 16 bits 18 … 000: 001: 2 10 So, the pointer group begins at base[IP[31:22]] + code[IP[31:20]].six So, the pointer group begins at base[IP[31:22]] + code[IP[31:20]].six

Finding the correct pointer in the pointer group The high-order 16 bits (31−16) of the lookup IP identify a leaf at depth 16 – Recall: The first 12 of those (31−20) determine the bit mask – The last four of those (19−16, i.e. the path between depth 12 and depth 16) determine the bit within a bit mask The maptable tells us how many pointers to skip in the pointer group to find a particular bit’s pointer – There are relatively few possible bit masks, so store all possibilities Networked Systems 3035/GZ01 25 … … depth 16 depth 12 IP[19]: IP[18]: IP[17]: IP[16]: Bit mask

Completing the binary tree But, a problem: two different prefix trees can have the same bit vector, e.g.: Networked Systems 3035/GZ … … … 1 0 … … … So, the algorithm requires that the prefix tree be complete: each node in the tree have either zero or two children Nodes with a single child get a sibling node added with duplicate next-hop information as a preliminary step 1 1 … … …

How many bit masks are possible? Bit masks are constrained: not all 2 16 combinations are possible, since they are formed from a complete prefix tree How many are there? Let’s count: – Let a(n) be the number of valid non-zero bit masks of length 2 n a(0) = 1 (one possible non-zero bit mask of length one) a(n) = 1 + a(n − 1) 2 – Either bit mask with value 1, or any combination of non- zero, half-size masks e.g. a(1): [1 1] [1 0] – a(4) = 677 possible non-zero bit masks of length 16 – So we need 10 bits to index the 678 possibilities Networked Systems 3035/GZ01 27

Finding the correct pointer in the pointer group ten field of code word array stores a row index of maptable that has offsets for the bit mask pattern at that location in the tree Bits 19−16 of the IP address index maptable columns, to get the right (4-bit) offset in the bit mask of the bit we’re interested in maptable entries are pre-computed, don’t depend on the routing table code : ten six maptable: IP address maptable entry Networked Systems 3035/GZ01 28

Luleå: Summary of finding pointer index ten = code[ix].ten six = code[ix].six pix = base[bix] + six + maptable[ten][bit] pointer = pointer_array[pix] Networked Systems 3035/GZ01 29

Optimization at level 1 When the bit mask is zero, or has a single bit set, the pointer array is holding an index into the next-hop table (i.e., a genuine head) In these cases, store the next-hop index directly in the codeword Networked Systems 3035/GZ /12 000/11 This reduces the number of rows needed for maptable from 678 to 676 Bit mask

Luleå algorithm: Levels 2 and 3 Root heads point to subtrees of height at most eight, called chunks – A chunk itself contains at most 256 heads There are three types of chunk, depending on how many heads it contains: – Sparse: 1-8 heads, array of next hop indices of the heads within a chunk – Dense: 9-64 heads, same format as Level 1, but no base index – Very dense: heads, same format as Level 1 Networked Systems 3035/GZ01 31 GenuineNext hop index Pointer array type (2) pointer (14) Root L2 chunk index GenuineNext hop index pix …

Sprint routing table lookup time distribution (Alpha CPU) Fastest lookups (17 cycles = 85 ns): code word directly storing next-hop index 41 clock cycles: pointer in level 1 indexes next-hop table (genuine head) Longer lookup times correspond to searching at levels 2 or 3 Networked Systems 3035/GZ01 32 Count (cycle time = 5 ns) ns worst-case lookup time on a commodity CPU

Luleå algorithm: Summary Current state of the art in IP router lookup Tradeoff mutability and table construction time for speed – Adding a routing entry requires rebuilding entire table – But, routing tables don’t often change, and they argue they can sustain one table rebuild/second on their platform Table size: 150 Kbytes for 40,000 entries, so can fit in fast SRAM on a commodity system Utilized in hardware as well as software routers to get lookup times down to tens of nanoseconds Networked Systems 3035/GZ01 33

Today: Inside Internet routers 1.Longest-prefix lookup for IP forwarding – The Luleå algorithm 2.Router architecture – Crossbar scheduling and the iSLIP algorithm – Self-routing fabric: Banyan network Cisco CRS-1 Carrier Routing System Networked Systems 3035/GZ01 34

Router architecture 1.Data path: functions performed on each datagram – Forwarding decision – Switching fabric (backplane) – Output link scheduling 2.Control plane: functions performed relatively infrequently – Routing table information exchange with others – Configuration and management Key design factor: Rate at which components operate (packets/sec) – If one component operates at n times rate of another, we say it has speedup of n relative to the other Networked Systems 3035/GZ01 35

Input port functionality IP address lookup – CIDR longest-prefix match – Uses a copy of forwarding table from control processor Check IP header, decrement TTL, recalculate checksum, prepend next-hop link-layer address Possible queuing, depending on design R Networked Systems 3035/GZ01 36

Switching fabric So now the input port has tagged the packet with the right output port (based on the IP lookup) Job of switching fabric is to move the packet from an input port to the right output port How can this be done? 1.Copy it into some memory location and out again 2.Send it over a shared hardware bus 3.Crossbar interconnect 4.Self-routing fabric Networked Systems 3035/GZ01 37

Switching via shared memory First generation routers: traditional computers with switching under direct control of CPU 1.Packet copied from input port across shared bus to RAM 2.Packet copied from RAM across shared bus to output port Simple design All ports share queue memory in RAM – Speed limited by CPU: must process every packet [Image: N. McKeown] Networked Systems 3035/GZ01 38

Switching via shared bus Datagram moves from input port memory to output port memory via a shared bus – e.g. Cisco 5600: 32 Gbit/s bus yields sufficient speed for access routers Eliminates CPU bottleneck – Bus contention: switching speed limited by shared bus bandwidth – CPU speed still a factor [Image: N. McKeown] Networked Systems 3035/GZ01 39

Switched interconnection fabrics Shared buses divide bandwidth among contenders – Electrical reason: speed of bus limited by # connectors Crossbar interconnect – Up to n 2 connects join n inputs to n outputs Multiple input ports can then communicate simultaneously with multiple output ports [Image: N. McKeown] Networked Systems 3035/GZ01 40

Switching via crossbar Datagram moves from input port memory to output port memory via the crossbar e.g. Cisco family: 60 Gbit/s; sufficient speed for core routers Eliminates bus bottleneck Custom hardware forwarding engines replace general purpose CPUs – Requires algorithm to determine crossbar configuration – Requires n× output port speedup [Image: N. McKeown] Crossbar Networked Systems 3035/GZ01 41

Where does queuing occur? Central issue in switch design; three choices: – At input ports (input queuing) – At output ports (output queuing) – Some combination of the above n = max(# input ports, # output ports) Networked Systems 3035/GZ01 42

Output queuing No buffering at input ports, therefore: – Multiple packets may arrive to an output port in one cycle; requires switch fabric speedup of n – Output port buffers all packets Drawback: Output port speedup required: n Networked Systems 3035/GZ01 43

Input queuing Input ports buffer packets Send at most one packet per cycle to an output port Networked Systems 3035/GZ01 44

Input queuing: Head-of-line blocking One packet per cycle sent to any output Blue packet blocked despite the presence of available capacity at output ports and in switch fabric Reduces throughput of the switch Networked Systems 3035/GZ01 45

Virtual output queuing On each input port, place one input queue per output port Use a crossbar switch fabric Input port places packet in virtual output queue (VOQ) corresponding to output port of forwarding decision No head-of-line blocking All ports (input and output) operate at same rate – Need to schedule fabric: choose which VOQs get service when Output ports (3) Networked Systems 3035/GZ01 46

Virtual output queuing [Image: N. McKeown] Networked Systems 3035/GZ01 47

Today: Inside Internet routers 1.Longest-prefix lookup for IP forwarding – The Luleå algorithm 2.Router architecture – Crossbar scheduling and the iSLIP algorithm – Self-routing fabric: Banyan network Cisco CRS-1 Carrier Routing System Networked Systems 3035/GZ01 48

Crossbar scheduling algorithm: goals 1.High throughput – Low queue occupancy in VOQs – Sustain 100% of rate R on all n inputs, n outputs 2.Starvation-free – Don’t allow any one virtual output queue to be unserved indefinitely 3.Speed of execution – Should not be the performance bottleneck in the router 4.Simplicity of implementation – Will likely be implemented on a special purpose chip Networked Systems 3035/GZ01 49

iSLIP algorithm: Introduction Model problem as a bipartite graph – Input port = graph node on left – Output port = graph node on right – Edge (i, j) indicates packets in VOQ Q(i, j) at input port i Scheduling = a bipartite matching (no two edges connected to the same node) Request graphBipartite matching Networked Systems 3035/GZ01 50

iSLIP: High-level overview For simplicity, we will look at “single-iteration” iSLIP – One iteration per packet Each iteration consists of three phases: 1.Request phase: all inputs send requests to outputs 2.Grant phase: all outputs grant requests to some input 3.Accept phase: input chooses an output’s grant to accept Networked Systems 3035/GZ01 51

iSLIP: Accept and grant counters Each input port i has a round-robin accept counter a i Each output port j has a round-robin grant counter g j a i and g j are round-robin counters: 1, 2, 3, …, n, 1, 2, … g2g a2a a1a1 a3a3 a4a4 g1g1 g3g3 g4g4 Networked Systems 3035/GZ01 52

iSLIP: One iteration in detail 1.Request phase – Input sends a request to all backlogged outputs 2.Grant phase – Output j grants the next request grant pointer g j points to 3.Accept phase – Input i accepts the next grant its accept pointer a i points to – For all inputs k that have accepted, increment then a k g2g a2a a1a1 a3a3 a4a4 g1g1 g3g3 g4g4 Networked Systems 3035/GZ01 53

iSLIP example Two inputs, two outputs – Input 1 always has traffic for outputs 1 and 2 – Input 2 always has traffic for outputs 1 and 2 All accept and grant counters initialized to a1a1 1 2 a2a2 1 2 g1g1 1 2 g2g2 1 2 Networked Systems 3035/GZ01 54

iSLIP example: Packet time a1a1 1 2 a2a2 1 2 g1g1 1 2 g2g2 1 2 Request phase a1a1 1 2 a2a2 1 2 g1g1 1 2 g2g2 1 2 Grant phase Accept phase a2a2 1 2 g1g1 1 2 g2g2 1 2 a1a1 1 2 Networked Systems 3035/GZ01 55

iSLIP example: Packet time 2 Request phase a2a2 1 2 g2g2 1 2 a1a1 1 2 g1g1 1 2 Accept phase a1a1 1 2 g2g2 1 2 g1g1 1 2 a2a2 1 2 Grant phase a2a2 1 2 g2g2 1 2 a1a1 1 2 g1g1 1 2 Networked Systems 3035/GZ01 56

iSLIP example: Packet time 3 Request phase Grant phase Accept phase a1a1 1 2 g2g2 1 2 g1g1 1 2 a2a2 1 2 a1a1 1 2 g2g2 1 2 g1g1 1 2 a2a2 1 2 a2a2 1 2 g2g2 1 2 a1a1 1 2 g1g1 1 2 Networked Systems 3035/GZ01 57

Implementing iSLIP 1 r 11 = 1 r 21 = 1 r 12 = 1 r 22 = Request vector: Grant arbiters Accept arbiters Decision vector: Request phaseGrant phaseAccept phase Networked Systems 3035/GZ01 58

Implementing iSLIP: General circuit Networked Systems 3035/GZ01 59

Implementing iSLIP: Inside an arbiter Highest priority Incrementer Networked Systems 3035/GZ01 60

Today: Inside Internet routers 1.Longest-prefix lookup for IP forwarding – The Luleå algorithm 2.Router architecture – Crossbar scheduling and the iSLIP algorithm – Self-routing fabric: Banyan network Cisco CRS-1 Carrier Routing System Networked Systems 3035/GZ01 61

Can we achieve high throughput without a crossbar scheduling algorithm? One way: self-routing fabrics – Input port appends self- routing header to packet – Self-routing header contains output port – Output port removes self- routing header Example: Banyan-Batcher architecture Switching via self-routing fabric Networked Systems 3035/GZ01 62

Composition: 2 × 2 comparator elements Comparator element switches its output connections so that 0-tagged packet exits top, 1- tagged packet exits bottom Comparator blocks on two packets with the same header value Self-routing fabric example: Banyan network Networked Systems 3035/GZ01 63

Organized in stages Designed to deliver packet with self-routing header x to output x Self-routing header: use i th most significant bit for the i th stage First stage moves packets to correct upper or lower half based on 1 st bit (0↗, 1↘) Self-routing fabric example: Banyan network Banyan with four arriving packets Stage:12 3 Output: Networked Systems 3035/GZ

Organized in stages Designed to deliver packet with self-routing header x to output x Self-routing header: use i th most significant bit for the i th stage First stage moves packets to correct upper or lower half based on 1 st bit (0↗, 1↘) Self-routing fabric example: Banyan network Banyan with four arriving packets Networked Systems 3035/GZ half 1 half Stage:12 3 Output:

2 nd stage moves packets to correct quadrant based on 2 nd bit (0↗, 1↘) Self-routing fabric example: Banyan network Banyan with four arriving packets Networked Systems 3035/GZ01 66 Stage:12 3 Output: 1 quad 0 quad 1 quad

3 rd stage moves packets to correct output based on third bit (0↗, 1↘) Fact: Banyan network is blocking-free if packets are presented in sorted ascending order Self-routing fabric example: Banyan network Banyan with four arriving packets Networked Systems 3035/GZ01 67 Stage:12 3 Output:

NEXT TIME Content Delivery: HTTP, Web Caching, and Content Distribution Networks (KJ) Pre-reading: P & D, Section 9.4.3