Multiqueue Networking David S. Miller Red Hat Inc.
Main Points ● Horizonal scaling ● Implications of high cpu arity ● Multiqueue basics ● Sun Neptune as a specific example ● Classification for queue selection ● Load balancing on transmit ● IRQ retargetting ● Effect on scheduling ● Lock splitting
Horizontal Scaling ● High numbers of everything ● CPUs ● Devices ● Pointless if underutilized
Multiqueue ● DMA channels ● Each operates independantly ● RSS (Receive Side Scaling) in Microsoft NDIS ● Every card will have this ● Virtualization ● Port to queue mapping – Simple, hashing – Complex, full classifiers ● Technology will be ubiquitous
Sun Neptune ● Multi-port and multi-queue ● PCI-E and on-chip Niagara-2 variants ● 24 transmit queues and 16 receive groups ● Logical device groups and interrupts ● Full HW classifier for queue selection ● Driver in progress
Neptune Classification ● DMA channel groups ● Classification --> group + offset within group ● MAC address selects default group ● Hashing can modify offset ● 256 entry TCAM for more explicit rules – Drop – Select group – Select offset – Select group and offset ● Channels can be used arbitrarily with ports
NIU Queues ● Arbitrary mappings ● Grouping concept
Transmit Load Balancing ● Completely in software ● Mapping from TX queue to port ● Header hashing ● Hashing on CPU number ● Hashing on VLAN ● Round robin ● Neptune does DRR over TX channels
IRQ Retargetting ● Unique IRQ per logical device is possible ● Moving IRQs is stupid ● Card features naturally load balance ● Example: – IPSEC on RX channel 0 and 1 – Web traffic hashed to channels 3 to 7 ● CFS scheduler more aggressive ● Tasks migrate to where flow traffic arrives
Lock Splitting ● Receive side mostly solved with napi_struct ● Transmit needs more work – SKB queue needs to be selected early – So packet scheduler knows what to lock – Currently multiqueue paths still need singular lock ● DMA channel resources are extremely flexible ● Independant queues mapped to virtualization guests ● Alternate MACs use different default channel ● TCAM can be used for more elaborate setups
Plans ● Publish Neptune driver ● Flesh out and merge napi_struct into ● Work with Intel developers on TX splitting ● Lots of Performance Analysis ● Considering Netfilter usage of Neptune classification features