Download presentation
Presentation is loading. Please wait.
Published byMarian Rose Modified over 9 years ago
1
Department of Computer and IT Engineering University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri
2
What is Routing and forwarding? A B C R1 R2 R3 R4D E F R5 2
3
History … Introduction 3
4
History … And future trends! Introduction 4
5
Cisco GSR 12416 Juniper M160 6ft 19 ” 2ft Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19 ” Capacity: 80Gb/s Power: 2.6kW What a Router Looks Like 5
6
Basic network system functionality Address lookup Packet forwarding and routing Fragmentation and re-assembly Security Queuing Scheduling Packet classification Traffic measurement … Packet Processing Functions 6
7
1. Accept packet arriving on an ingress line. 2. Lookup packet destination address in the forwarding table, to identify outgoing interface(s). 3. Manipulate packet header: e.g., decrement TTL, update header checksum. 4. Send packet to outgoing interface(s). 5. Queue until line is free. 6. Transmit packet onto outgoing line. Per-packet Processing in a Router 7
8
Control Plane May be Slow “Typically in Software” Data plane ( per-packet processing) Must be fast “Typically in Hardware” Switching Arbitration Scheduling Routing Lookup Packet Classifier Routing - Routing table update (OSPF, RIP, IS-IS) - Admission Control - Congestion Control - Reservation Switching Basic Architecture of a Router How packets get forwarded How routing protocols establish routes/etc 8
9
9 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table DataHdrDataHdrDataHdr Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory DataHdrDataHdrDataHdr
10
Interconnect scheduling Route lookup TTL process ing Buffer ing Buffer ing QoS schedul ing Control plane Ingress linecard Egress linecard Interconnect Framing Data path Control path Scheduling path Functions in a Packet Switch usually multiple usage of memory (DRAM for packet buffer, SRAM for queues and tables) 10
11
Line Card Picture 11
12
Major Components of Routers: Interconnect Interconnect Input Ports to Output Ports, includes 3 modes Bus All Input ports transfer data through the shared bus. Problem : Often cause in data flow congestion. Shared Memory Input port write data into the share memory. After destination lookup is performed, the output port read data from the memory. Problem : Require fast memory read/write and management technology. Crossbar N input ports has dedicated data path to N output ports. Result in N*N switching matrix. Problem : Blocking (Input, Output, Head-of-line HOL). Max switch load for random traffic is about 59%. Bus Shared Memory Crossbar Memory 12
13
Interconnects: Two basic techniques Input Queueing Output Queueing Usually a non-blocking switch fabric (e.g. crossbar) 13
14
Output Queued (OQ) Switch How an OQ Switch Works 14
15
Input Queueing: Head of Line Blocking Delay Load 58.6% 100% 15
16
Head of Line Blocking 16
17
17
18
18
19
Virtual Output Queues (VoQ) Virtual Output Queues: At each input port, there are N queues – each associated with an output port Only one packet can go from an input port at a time Only one packet can be received by an output port at a time It retains the scalability of FIFO input-queued switches It eliminates the HoL problem with FIFO input Queues 19
20
Input Queueing: Virtual output queues 20
21
Delay Load 100% Input Queueing: Virtual output queues 21
22
The Evolution of Router Architecture First Generation Routers Modern Routers 22
23
Route Table CPU Buffer Memory Line Interface MAC Line Interface MAC Line Interface MAC First Generation Routers Shared Backplane Line Interface CPU Memory Bus-based Router Architectures with Single Processor 23
24
Based on software implementations on a single CPU. Limitations: Serious processing bottleneck in the central processor Memory intensive operations (e.g. table lookup & data movements) limits the effectiveness of processor power First Generation Routers 24
25
Second Generation Routers Route Table CPU Line Card Buffer Memory Line Card MAC Buffer Memory Line Card MAC Buffer Memory Fwding Cache Fwding Cache Fwding Cache MAC Buffer Memory Bus-based Router Architectures with Multiple Processors Bus-based Router Architectures with Multiple Processors 25
26
Architectures with Route Caching Distribute packet forwarding operations Network interface cards Processors Route caches Packets are transmitted once over the shared bus Limitations: The central routing table is a bottleneck at high-speeds Traffic dependent throughput (cache) Shared bus is still a bottleneck Second Generation Routers 26
27
Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory Switched Backplane Line Interface CPU Memory Fwding Table Routing Table Fwding Table Third Generation Routers Switch-based Architectures with Fully Distributed Processors 27
28
To avoid bottlenecks: Processing power Memory bandwidth Internal bus bandwidth Each network interface is equipped with appropriate processing power and buffer space. Data vs. control plane Data plane – line cards Control plane - processor Third Generation Routers 28
29
Switch Core Linecards Optical links 100s of metres 0.3 - 10Tb/s routers in development Fourth Generation Routers/Switches Optics inside a router for the first time 29
30
Do we still higher processing power in networking devices? Of course, YES But why? and how? Demand for More Powerful Routers 30
31
Beyond the moore’s law Demands for Faster Routers (why?) 31
32
Future applications will demand TIPS Demands for Faster Routers (why?) 32
33
Future applications will demand TIPS Power? Heat? Demands for Faster Routers (why?) 33
34
Technology push: - Link bandwidth scaling much faster than CPU and memory technology - Transistor scaling and VLSI technology help but not enough Demands for Faster Routers (summary) Application pull: - More complex applications are required - Processing complexity is defined as the number of instructions and number of memory access to process one packet 34
35
“Future applications will demand TIPS” “Think platform beyond a single processor” “Exploit concurrency at multiple levels” “Power will be the limiter due to complexity and leakage” Distribute workload on multiple cores Demands for faster routers (How?) 35
36
Symmetric multi-processors allow multi-threaded applications to achieve higher performance at less die area and power consumption than single-core processors Asymmetric multi-processors consume power and provide increased computational power only on demand Multi-Core Processors 36
37
Performance Bottlenecks Memory Bandwidth available, but access time too slow Increasing delay for off-chip memory I/O High-speed interfaces available Cost problem with optical interfaces Internal Bus Can be solved with an effective switch, allowing simultaneous transfers between network interfaces Processing power Individual cores are getting more complex Problems with access to shared resources Control processor can become bottleneck 37
38
Different Solutions ASIC FPGA NP GPP Flexibility Performance ASIC GPP FPGA NP 38
39
By: Niraj Shah Different Solutions 39
40
“It is always something (corollary). Good, Fast, Cheap: Pick any two (you can’t have all three).” RFC1925 “The Twelve Networking Truths” “It is always something (corollary). Good, Fast, Cheap: Pick any two (you can’t have all three).” RFC1925 “The Twelve Networking Truths” 40
41
High cost to develop Network processing moderate quantity market Long time to market Network processing quickly changing services Difficult to simulate Complex protocol Expensive and time-consuming to change Little reuse across products Limited reuse across versions No consensus on framework or supporting chips Requires expertise Why not ASIC? 41
42
Introduced several years ago (1999+) A way to introduce flexibility and programmability in network processing Many players were there (Intel, Motorola, IBM) Only a few players still there Network Processors 42
43
Intel IXP 2800 Initial release August 2003 43
44
CPU-level flexibility – A giant step forward compared to ASICs How? – Hardware coprocessors – Memory hierarchies – Multiple hardware threads (zero context switching overhead) – Narrow (and multiple) memory buses – Some other ad-hoc solutions for network processing, e.g., Fast switching fabric, memory accesses, etc What Was Correct With NPs? 44
45
What Was Wrong With NPs? Programmability issues – Completely new programming paradigm – Developers are not familiar with the unprecedented parallelism of the NPU, They do not know how to exploit it at best – New (proprietary) languages – Portability among different network processors families 45
46
What Happened in NP Market? Intel went out of the market in 2007 Many other small players disappeared High risk when selecting a NP maker that may disappear 46
47
Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. RFC1925 “The Twelve Networking Truths” Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. RFC1925 “The Twelve Networking Truths” 47
48
Processing in General-purpose CPUs CPUs optimized for few threads, high performance per thread – High CPU frequencies – Maximize instruction-level parallelism Pipeline Superscalar Out-of-order execution Branch prediction Speculative loads Software Routers 48
49
Aim: Low cost, flexibility and extensibility Linux on PC with a bunch of NICs Changing a functionality is as simple as a software upgrade Software Routers 49
50
RouteBricks [SOSP’09] Uses Intel Nehalem architecture Packet shader [SIGCOMM’10] GPU-Accelerated Developed in KAIST, Korea Software Routers (examples) 50
51
Intel Nehalem Architecture C0C0 L3 Common Cache C1C1 C2C2 C3C3 51
52
NUMA architecture: The latency to access the local memory is, approximately, 65 nano-seconds. The latency to access the remote memory is, approximately, 105 nano-seconds Bandwidth through of the QPI link is 12.8 GB/s Three DDR3 channels to local DRAM support a bandwidth of 31.992GB/s Intel Nehalem Architecture 52
53
Shared L3 Cache I/O controller hub IMC 3 channels DRAM PCI slots QPI PCI bus network card disk file system communication system application file system communication system application disknetwork card L2 cache QPI 2 QPI 1 Power and clock L2 cache L2 cache L2 cache Core 0 Core 1 Core 2 Core 3 Nehalem Quadcore L1-IL1-DL1-IL1-DL1-IL1-D L1-IL1-D Intel Nehalem Architecture 53
54
Other Possible Platforms Intel Westmere-EPIntel Jasper Forest 54
55
Pipeline Parallel Hybrid Workload Partitioning (parallelization) 55
56
Questions!Questions!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.