1 PC-base Software Routers: High Performance and Application Service Support Author: Raffaele Bolla, Roberto Bruschi Publisher: PRESTO’08 Presenter: Hsin-Mao Chen Date:2010/02/24
2 Outline Introduction Architectural Bottlenecks Multi-CPU/Core Enhancements Performance Evaluation
3 Introduction Linux Network boards Packet Reception or Transmission HW Interrupt (IRQ) Kernel Software IRQs (SoftIRQs) Packet Processing RAM TxRing and RxRing
4 Introduction A SoftIRQ executes two main tasks. 1.The de-allocation of already-transmitted packets placed in the TxRing. 2.All the real packet forwarding operations. The task handles the received packets in the RxRing.
5 Architectural Bottlenecks SR architecture based on a single CPU/core. 1.The SR computational capacity. 2.The bandwidth/latency of I/O busses. SR architecture based on multiprocessor. Typical performance issues may sap parallelization gain. 1.Data accessing serialization. 2.CPU/core cache coherence.
6 Architectural Bottlenecks Data accessing serialization The SoftIRQ accesses to each TxRing are serialized by a code locking procedure (LLTX lock). This lock guarantees that each TxRing can be read or modified by only one SoftIRQ at a time.
7 Architectural Bottlenecks CPU/core cache management Whenever a CPU/core loads a TxRing to its local cache, all of the other processors also cashing it must invalidate their cache copies.
8 Mulit-CPU/core Enhancements HW evolution Intel® Advanced Smart Cache: It consists of a mechanism that allows level 2 cache-sharing among all the cores in the same processor. Intel PRO 1000 adapters: It supports multiple Tx- and Rx Ring and multiple HW IRQs per network interface.
9 Mulit-CPU/core Enhancements SW architecture 1.To entirely bind all operations carried out in forwarding a packet to a single CPU. 2.To reduce LLTX lock contention as much as possible. 3.To equally distribute the computational load among all the processors/cores in the system.
10 Mulit-CPU/core Enhancements CPU/core binding to TxRing: Bind each CPU/core to a different TxRing on each output device. CPU/core binging to RxRing: Bind each RxRings to a different CPU/core. Xeon core: 1 Mpkt/s Gigabit Ethernet interface: Mpkt/s with 64B sized frames Fast Ethernet interface: pkt/s with 64B sized frames
11 Mulit-CPU/core Enhancements
12 Performance Evaluation Standard SR architecture Agilent N2X router
13 Performance Evaluation Standard SR architecture
14 Performance Evaluation Enhanced SR architecture
15 Performance Evaluation Enhanced SR architecture
16 Performance Evaluation Enhanced SR architecture
17 Performance Evaluation Multi-layer service support
18 Performance Evaluation Multi-layer service support