Professor Yashar Ganjali Department of Computer Science University of Toronto
CSC 2203 – Packet Switch and Network Architectures2University of Toronto – Fall 2012 Today Outline What this course is about Logistics Course structure, assignments, evaluation What is expected from you What you want to know Overview Packet switching systems Topics and problems studied in this course
CSC 2203 – Packet Switch and Network Architectures3University of Toronto – Fall 2012 Outline Graduate level course Packet switching systems Internet routers, Ethernet switches Architectures Related problems This year: software defined networks Theory + Practice Switching systems are simple enough for us to prove something about them Yet they are complex enough to work in practice
CSC 2203 – Packet Switch and Network Architectures4University of Toronto – Fall 2012 Outline – Part I Introduction What is a packet switch system? Evolution of the Internet, and Internet routers Basic architectural components Some example architectures
Outline – Part II Software defined networks Innovation in computer networks Control vs. data plane Challenges and promises CSC 2203 – Packet Switch and Network Architectures5University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures6University of Toronto – Fall 2012 Outline – Part III Output Queued Switches Emphasis on deterministic analysis OQ as the simplest and ideal architecture Output queueing and shared-memory switches Packet arrival processes: ( )-constrained arrivals, leaky buckets, Bernoulli arrivals, bursty arrivals, adversaries Providing bandwidth and delay guarantees, scheduling, fairness, Fair- Queueing, Generalized Processor Sharing and Deficit Round Robin. Practical difficulties: When output queued switches are impractical; Memory bandwidth and capacity scaling Some approaches: Emulating output queued switches. Parallel packet buffers as standalone shared memory, with design examples. Routers with a single stage of buffering and constraint sets, Parallel Shared Memory Routers, Distributed Shared Memory Routers, and Parallel Packet Switches. Output link scheduling in a Distributed Shared Memory router. Combined input and output queued (CIOQ) switches, stable marriage matchings.
CSC 2203 – Packet Switch and Network Architectures7University of Toronto – Fall 2012 Outline – Part IV Input Queued Switches Emphasis on probabilistic analysis Definition of IQ switch with single FCFS queue Switching fabrics, crossbars Head of line blocking; the balls and bins model Virtual output queues and crossbar schedulers Bipartite Matchings: Maximum Sized Matchings, Maximum Weight Matchings, maximal matchings Definitions of 100% throughput Lyapunov functions When traffic is uniform: simple RR and random matchings. When traffic matrix is known: Birkhoff von Neuman decomposition. When traffic is not known: heuristics. PIM, iSLIP, WFA
CSC 2203 – Packet Switch and Network Architectures8University of Toronto – Fall 2012 Outline – Part V Other Switch Architectures Buffered crossbars Multistage switches Clos networks 2-stage switches: randomized, deterministic Miscellaneous Interesting Problems Software-defined networks Address lookup: exact matches, longest prefix matches, performance metrics, hardware and software solutions Cells switching versus packets switching Buffer sizing Small buffers Tiny buffers Packet classification
CSC 2203 – Packet Switch and Network Architectures9University of Toronto – Fall 2012 Outline – Fundamental Tools Fundamentals Introduction to probability Poisson process Discrete and Continuous-time Markov chains Basic queueing theory: M/M/1 M/G/1 Littles result PASTA
CSC 2203 – Packet Switch and Network Architectures10University of Toronto – Fall 2012 Logistics Office hours Tue. 3-4 PM, Wed. 3-4 PM, BA5238 Or by appointment Course web page Please check regularly for announcements. Class mailing list Send me an to be added to the list. Any (course-related!) question posted to the list will be answered within 48 hours.
CSC 2203 – Packet Switch and Network Architectures11University of Toronto – Fall 2012 Logistics Prerequisites Any introductory course on networking Algorithms Basic probability theory Papers URLs will be provided on class web page Please read suggested papers BEFORE class
Logistics Grading Class participation, notes, and discussions: 10% Paper presentation: 30% Final project: 60% Proposal: 5% pages Intermediate report: 10% - 3 pages Presentation: 15% - Last week of classes Final report: 30% - 6 pages Deadlines 5% mark deduction for each day of delay, up to 4 days. Exception: final report deadline is hard. CSC 2203 – Packet Switch and Network Architectures12University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures13University of Toronto – Fall 2012 Logistics Final Project Groups of two students Project topic Choose from the offered list; or Talk to me and define your own
CSC 2203 – Packet Switch and Network Architectures14University of Toronto – Fall 2012 Logistics Academic Integrity Projects Avoid plagiarism. Im interested in what YOU think. Please read Guideline for avoiding plagiarism Advice about academic offenses
Logistics Accessibility Needs The University of Toronto is committed to accessibility. If you require accommodations or have any accessibility concerns, please visit as soon as possible. CSC 2203 – Packet Switch and Network Architectures15University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures16University of Toronto – Fall 2012 Acknowledgements Special thanks to: Prof. Nick McKeown; and Prof. Balaji Prabhakar
CSC 2203 – Packet Switch and Network Architectures17University of Toronto – Fall 2012 Questions? What else do you like to know about this course?
CSC 2203 – Packet Switch and Network Architectures18University of Toronto – Fall 2012 Introduction Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture IP address lookup Packet buffering Switching
CSC 2203 – Packet Switch and Network Architectures19University of Toronto – Fall 2012 R3 A B C R1 R2 R4D E F R5 F R3E D Next HopDestination D What is Routing?
CSC 2203 – Packet Switch and Network Architectures20University of Toronto – Fall 2012 What is Routing? R3 A B C R1 R2 R4D E F R5 F R3E D Next HopDestination D Data Options (if any) Destination Address Source Address Header ChecksumProtocolTTL Fragment Offset Flags Fragment ID Total Packet LengthT.ServiceHLenVer 20 bytes
CSC 2203 – Packet Switch and Network Architectures21University of Toronto – Fall 2012 What is Routing? A B C R1 R2 R3 R4D E F R5
CSC 2203 – Packet Switch and Network Architectures22University of Toronto – Fall 2012 Points of Presence (POPs) A B C POP1 POP3 POP2 POP4 D E F POP5 POP6 POP7 POP8
Where High Performance Routers are Used CSC 2203 – Packet Switch and Network Architectures23University of Toronto – Fall 2012 R10 R11 R4 R13 R9 R5 R2 R1 R6 R3 R7 R12 R16 R15 R14 R8 (2.5 Gb/s)
CSC 2203 – Packet Switch and Network Architectures24University of Toronto – Fall 2012 What a Router Looks Like Cisco GSR 12416Juniper M160 6ft 19 2ft Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19 Capacity: 80Gb/s Power: 2.6kW
CSC 2203 – Packet Switch and Network Architectures25University of Toronto – Fall 2012 Basic Architectural Components of an IP Router Control Plane Data-path per-packet processing Switching Forwarding Table Routing Table Routing Protocols
CSC 2203 – Packet Switch and Network Architectures26University of Toronto – Fall 2012 Per-packet Processing in an IP Router 1. Accept packet arriving on an incoming link. 2. Lookup packet destination address in the forwarding table, to identify outgoing port(s). 3. Manipulate packet header: e.g., decrement TTL, update header checksum. 4. Send packet to the outgoing port(s). 5. Buffer packet in the queue. 6. Transmit packet onto outgoing link.
CSC 2203 – Packet Switch and Network Architectures27University of Toronto – Fall 2012 Generic Router Architecture Lookup IP Address Update Header Header Processing DataHdrDataHdr ~1M prefixes Off-chip DRAM Address Table Address Table IP Address Next Hop Queue Packet Buffer Memory Buffer Memory ~1M packets Off-chip DRAM
CSC 2203 – Packet Switch and Network Architectures28University of Toronto – Fall 2012 Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table DataHdrDataHdrDataHdr Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory DataHdrDataHdrDataHdr Generic Router Architecture
Why do we Need Faster Routers? 1. To prevent routers becoming the bottleneck in the Internet. 2. To increase POP capacity, and to reduce cost, size and power. CSC 2203 – Packet Switch and Network Architectures29University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures30University of Toronto – Fall 2012 Why we Need Faster Routers 1: To prevent routers from being the bottleneck 0, Fiber Capacity (Gb/s) TDMDWDM Packet processing PowerLink Speed 2x / 18 months2x / 7 months Source: SPEC95Int & David Miller, Stanford.
CSC 2203 – Packet Switch and Network Architectures31University of Toronto – Fall 2012 POP with smaller routers Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs POP with large routers Ports: Price >$100k, Power > 400W. It is common for 50-60% of ports to be for interconnection.
Why are Fast Routers Difficult to Make? Its hard to keep up with Moores Law: The bottleneck is memory speed. Memory speed is not keeping up with Moores Law. CSC 2203 – Packet Switch and Network Architectures32University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures33University of Toronto – Fall 2012 Why are Fast Routers Difficult to Make? Speed of Commercial DRAM Its hard to keep up with Moores Law: The bottleneck is memory speed. Memory speed is not keeping up with Moores Law. Moores Law 2x / 18 months 1.1x / 18 months
Why are Fast Routers Difficult to Make? Its hard to keep up with Moores Law: The bottleneck is memory speed. Memory speed is not keeping up with Moores Law. Moores Law is too slow: Routers need to improve faster than Moores Law. CSC 2203 – Packet Switch and Network Architectures34University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures35University of Toronto – Fall 2012 Router Performance Exceeds Moores Law Growth in capacity of commercial routers: Capacity 1992 ~ 2Gb/s Capacity 1995 ~ 10Gb/s Capacity 1998 ~ 40Gb/s Capacity 2001 ~ 160Gb/s Capacity 2003 ~ 640Gb/s Capacity 2007 ~ 4Tb/s Capacity 2010 ~ 16Tb/s Average growth rate: 2x / 18 months.
CSC 2203 – Packet Switch and Network Architectures36University of Toronto – Fall 2012 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching.
CSC 2203 – Packet Switch and Network Architectures37University of Toronto – Fall 2012 Route Table CPU Buffer Memory Line Interface MAC Line Interface MAC Line Interface MAC Typically <0.5Gb/s aggregate capacity Shared Backplane Line Interface CPU Memory First Generation Routers
CSC 2203 – Packet Switch and Network Architectures38University of Toronto – Fall 2012 Route Table CPU Line Card Buffer Memory Line Card MAC Buffer Memory Line Card MAC Buffer Memory Fwding Cache Fwding Cache Fwding Cache MAC Buffer Memory Typically <5Gb/s aggregate capacity Second Generation Routers
CSC 2203 – Packet Switch and Network Architectures39University of Toronto – Fall 2012 Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory Switched Backplane Line Interface CPU Memory Fwding Table Routing Table Fwding Table Typically <50Gb/s aggregate capacity Third Generation Routers
CSC 2203 – Packet Switch and Network Architectures40University of Toronto – Fall 2012 Switch Core Linecards Optical links 100s of metres Tb/s routers in development Fourth Generation Routers/Switches Optics inside a router for the first time
Software Defined Networks CSC 2203 – Packet Switch and Network Architectures41University of Toronto – Fall 2012 Data Plane Control Plane Data Plane Control Plane Data Plane Control Plane Data Plane Control Plane
Software Defined Networks – Contd CSC 2203 – Packet Switch and Network Architectures42University of Toronto – Fall 2012 Controller OpenFlow Switch PC OpenFlow Protocol SSL Data Plane
CSC 2203 – Packet Switch and Network Architectures43University of Toronto – Fall 2012 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture IP address lookup Packet buffering Switching
CSC 2203 – Packet Switch and Network Architectures44University of Toronto – Fall 2012 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table
IP Address Lookup Why its thought to be hard: 1. Its not an exact match: its a longest prefix match. 2. The table is large: about 400,000 entries today, and growing. 3. The lookup must be fast: about 30ns for a 10Gb/s line. CSC 2203 – Packet Switch and Network Architectures45University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures46University of Toronto – Fall 2012 IP Lookups find Longest Prefixes / / / / / / Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address.
IP Address Lookup Why its thought to be hard: 1. Its not an exact match: its a longest prefix match. 2. The table is large: about 400,000 entries today, and growing. 3. The lookup must be fast: about 30ns for a 10Gb/s line. CSC 2203 – Packet Switch and Network Architectures47University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures48University of Toronto – Fall 2012 Address Tables are Large Source:
IP Address Lookup Why its thought to be hard: 1. Its not an exact match: its a longest prefix match. 2. The table is large: about 400,000 entries today, and growing. 3. The lookup must be fast: about 30ns for a 10Gb/s line. CSC 2203 – Packet Switch and Network Architectures49University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures50University of Toronto – Fall 2012 Lookups Must be Fast 12540Gb/s Gb/s Gb/s Mb/s B packets (Mpkt/s) LineYear
CSC 2203 – Packet Switch and Network Architectures51University of Toronto – Fall 2012 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture IP address lookup Packet buffering Switching
Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory CSC 2203 – Packet Switch and Network Architectures52University of Toronto – Fall 2012
CSC 2203 – Packet Switch and Network Architectures53University of Toronto – Fall 2012 Fast Packet Buffers Example: 40Gb/s packet buffer Size = RTT*BW = 10Gb; 40 byte packets Write Rate, R 1 packet every 8 ns Read Rate, R 1 packet every 8 ns Buffer Manager Buffer Memory Use SRAM? + fast enough random access time, but - too low density to store 10Gb of data. Use SRAM? + fast enough random access time, but - too low density to store 10Gb of data. Use DRAM? + high density means we can store data, but - too slow (50ns random access time). Use DRAM? + high density means we can store data, but - too slow (50ns random access time).
CSC 2203 – Packet Switch and Network Architectures54University of Toronto – Fall 2012 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture IP address lookup Packet buffering Switching
CSC 2203 – Packet Switch and Network Architectures55University of Toronto – Fall 2012 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory DataHdr DataHdr DataHdr 1 2 N 1 2 N N times line rate
CSC 2203 – Packet Switch and Network Architectures56University of Toronto – Fall 2012 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory DataHdr DataHdr DataHdr 1 2 N 1 2 N DataHdr DataHdr DataHdr Scheduler
CSC 2203 – Packet Switch and Network Architectures57University of Toronto – Fall 2012 A Router with Output Queues The best that any queueing system can achieve.
CSC 2203 – Packet Switch and Network Architectures58University of Toronto – Fall 2012 A Router with Input Queues Head of Line Blocking The best that any queueing system can achieve.
CSC 2203 – Packet Switch and Network Architectures59University of Toronto – Fall 2012 Head of Line Blocking
CSC 2203 – Packet Switch and Network Architectures60University of Toronto – Fall 2012 Virtual Output Queues
CSC 2203 – Packet Switch and Network Architectures61University of Toronto – Fall 2012 A Router with Virtual Output Queues The best that any queueing system can achieve.
CSC 2203 – Packet Switch and Network Architectures62University of Toronto – Fall 2012 Theory: Practice: Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) 58% [Karol, 1987] IQ + VOQ, Maximum weight matching IQ + VOQ, Maximum weight matching IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. 100% [McKeown et al., 1995] Different weight functions, incomplete information, pipelining. Different weight functions, incomplete information, pipelining. Randomized algorithms 100% [Tassiulas, 1998] 100% [Various] Various heuristics, distributed algorithms, and amounts of speedup Various heuristics, distributed algorithms, and amounts of speedup IQ + VOQ, Maximal size matching, Speedup of two. IQ + VOQ, Maximal size matching, Speedup of two. 100% [Dai & Prabhakar, 2000] The Evolution of Switching
Generic Router Architecture CSC 2203 – Packet Switch and Network Architectures63University of Toronto – Fall 2012 Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory DataHdr DataHdr DataHdr 1 2 N 1 2 N N times line rate
CSC 2203 – Packet Switch and Network Architectures64University of Toronto – Fall 2012 Characteristics of an OQ Switch Arriving packets are immediately written into the output queue, without intermediate buffering. The flow of packets to one output does not affect the flow to another output. An OQ switch is work conserving: an output line is always busy when there is a packet in the switch for it. OQ switch have the highest throughput, and lowest average delay. We will also see that the rate of individual flows, and the delay of packets can be controlled.
CSC 2203 – Packet Switch and Network Architectures65University of Toronto – Fall 2012 R1 Link 1 Link 2 Link 3 Link 4 Link 1, ingressLink 1, egress Link 2, ingressLink 2, egress Link 3, ingressLink 3, egress Link 4, ingressLink 4, egress Link rate, R R R R R R R Simple Model of Output Queued Switch
CSC 2203 – Packet Switch and Network Architectures66University of Toronto – Fall 2012 Link 1, ingressLink 1, egress Link 2, ingressLink 2, egress Link 3, ingressLink 3, egress Link N, ingressLink N, egress A single, physical memory device R R R R R R The Shared Memory Switch
CSC 2203 – Packet Switch and Network Architectures67University of Toronto – Fall 2012 Characteristics of Shared Memory Switch
CSC 2203 – Packet Switch and Network Architectures68University of Toronto – Fall 2012 Memory Bandwidth Basic OQ Switch: Consider an OQ switch with N different physical memories, and all links operating at rate R bits/s. In the worst case, packets may arrive continuously from all inputs, destined to just one output. Maximum memory bandwidth requirement for each memory is (N+1)R bits/s. Shared Memory Switch: Maximum memory bandwidth requirement for the memory is 2NR bits/s.
CSC 2203 – Packet Switch and Network Architectures69University of Toronto – Fall 2012 How fast can we make a centralized shared memory switch? Shared Memory 200 byte bus 5ns SRAM 1 2 N 5ns per memory operation Two memory operations per packet Therefore, up to 160Gb/s In practice, closer to 80Gb/s