Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper)
Objective The authors tried to show upper bound of network application performance by specialization (Actually, not only a network stack but also an application’s implementation is specialized) A special kind of applications is chosen (Serves same content to multiple users) Sandstorm: A Web server serves static webpage Namestorm: A DNS server
Key of performance A complete zero-copy stack Aggressive amortization Pre-packetized data Batching to mitigate system-call overhead Synchronous, clocked from received packets Improves cache locality Minimize the latency of sending the first packet of response Intel’s DDIO
Network stack libnmio: Data-movement and event-notification primitives libeth: A lightweight Ethernet-layer libtcpip: An optimized TCP/IP layer libudpip: A UDP/IP layer
A complete zero-copy stack Receiving a packet Done by DMA Transmitting a packet Aggressive amortization Modify one of prepared a copy of packet and use DMA The modifications are performed in a single pass to use CPU’s L1 cache efficiently
A complete zero-copy stack pre-copy method maintain more than one copy of each packet potential to thrash CPU’s L3 cache memcpy method maintain one long-term copy and create ephemeral copies more work should be done
How the optimization works? Batching increases TCP RTT Amortizing reduces per-request processing
Intel’s DDIO Direct Data I/O When transmission When reception Pull data from the L3 cache without a detour through system memory When reception DMA can place data in processor’s L3 cache
Evaluation
Evaluation
Evaluation
DDIO Pre-copy case: DDIO pulls untouched incoming data into the cache, so the file data cannot be cached Memcopy case: CPU loads file data into the cache
Discussion mTCP vs. Sandstorm
Discussion mTCP TCP of Sandstorm Provides UNIX-like socket programming interface mTCP provides fairness TCP of Sandstorm Higher level stack does not wrap lower level stack Each stack is a stand-alone service For example, an application interacts directly with libnmio Amortization, no-queueing, inaccurate timer cannot guarantee correctness Limited applications