Download presentation
Presentation is loading. Please wait.
1
Fixing Wifi Latency… Finally!
2
Low latency/Good bandwidth At all mcs rates
3
Wifi Latency A quick rant about benchmarks Fixed API Futures
Overview Wifi Latency A quick rant about benchmarks Fixed API Futures
4
Grokking Bufferbloat fixes in 20 Minutes
5
Most of your benchmarks are wrong!
Iperf3 “millabursts” Tmix PPS
6
Meet Ham, the marketing monkey
Ham likes big numbers! Ham doesn’t care if they mean anything Ham is good with winning phrases Meet Ham, the marketing monkey Ham, has been my nemesis, for a very, very long time. He’s good at a couple things – naming We have “bufferbloat”, and “fq_codel”, and BQL” Ham comes up with things like “streamboost”, and “LagBuster” and Or optimizing for what you thought was important.
7
Typical Wifi Testbench Over a wire Over the air
DUT Attenuators Over a wire Over the air Photos courtesy: Candelatech
8
Problems with WiFi Lab Bench Testing
Distance Multipath (many) Legacy Devices (many) Competing devices Reasoning from non Internet scale RTTs Treating UDP and TCP results with equal weight Rate over range without looking at latency or rate control Tests that don’t run long enough Summary statistics that don’t look at inter-station behaviors
9
Real World WiFi uses
10
What works Plots of Long duration tests Varying RTTs Packet Captures
Tracing Sampling Other tests NOT. JUST. BANDWIDTH. In particular, there isn’t just one number for RTT
11
100 stations ath10k Today Is it the average? Yea, out of the 5 flows out of a hundred flows that got started, each got 20Mbits The median? I don’t even know where to start.
12
Quick n Dirty’ Contended Web PLT Linux 4
Quick n Dirty’ Contended Web PLT Linux 4.4 txqueue (10ms base RTT + 1 sec delay) Flent -l 300 -H server –streams=12 tcp_ndown & wget -E -H -k -K -p FINISHED :43:35-- Total wall clock time: 232.8s Downloaded: 62 files, 1.8M in 0.9s (77KB/s) Connect time accounted for 99.6% of the runtime, and the real throughput was: 77KB/sec. WTF?
13
Better Benchmarks Dslreports test for bufferbloat
Flent has extensive plotting facilities & wraps netperf/ping/d-itg for rtt_fair*var: Lets you test multiple numbers of stations rrul, rrul_be: Exercises up/down and hw queues simultaneious tcp_nup, tcp_ndown: Test variable numbers of flows http rrul_voip About 90 other tests TSDE Tcptrace -G & Xplot.org Wireshark
14
100 stations Ath10k Airtime Fair (wip)
Just curious: How many people here are willing to take, I don’t know, a 50% bandwidth hit to some other application, just so your skype call, videoconference, audio stream, or ssh session stayed fast and reliable over wifi?
15
Heresy! Optimize for “time to service” stations Pack Aggregates Better
Serve Stations better Look at Quality of Experience (QOE) Voip Mean Opinion Scores (MOS) (under load) Web Page Load Time (under load) Latency under load (vs various mcs wifi rates) Long Term stability and “smoothness” Flow completion time (FCT) Loss, jitter and queuing
16
Control Queue Length Be fair to flows Be fair to stations Optimize for TXOPs ~1300/sec Into which you pour bandwidth
17
Queuing Delay Interference, slow rates, etc
Mis-interpreted – queuing delay
18
Aggregation improvements
19
Web PLT improvements under competing load
FIFO load time was 35 sec on the large page!
20
VOIP MOS scores & Bandwidth 3 stations contending (ath9k)
LEDE Pending LEDE Today LEDE Pending Linux 4.4 Today LEDE Today Linux 4.4 Today OpenWrt CC OpenWrt CC Best Effort – scheduled sanely on the AP – Better than the HW VO queue!
21
Latency below 20ms FIXME – use the c2 plot
22
Two bloat-free coffee shops
23
Fix ISP bufferbloat FIRST
We get to 22 seconds and it explodes at the end of the track. Courtesy dslreports:
24
Upload Bloat Dslreports:
25
file:///home/d/git/sites/cerowrt/content/flent/crypto_fq _bug/airtime_plot.png Airtime fair pie
file:///home/d/git/sites/cerowrt/content/flent/fq_codel_ osx/loss_sack.png Sack zoomed: file:///home/d/git/sites/cerowrt/content/flent/fq_codel_o sx/sacked_zoomed.png file:///home/d/git/sites/cerowrt/content/flent/qca fqmac35-codel-5/wonderwhattodoaboutbeacons.png
26
What we did No changes to WiFi Backoff function stays the same
Only modified the queuing and aggregation functions We added Per station queueing Removed the qdisc fq_codel per station RR FQ between stations (currently) DRR Airtime Fairness between stations (pending)
27
Overall Philosophy One aggregate in the hardware
One aggregate queued on top, ready to go One being prepared The rest of the packets being fq’d per station Total delay = 2-12ms in the layers This is plenty of time (BQL not needed) Currently max sized txops (we can cut this) Codel moderates the whole thing
28
Per Device Old way 4 sets of 1000 packet queues per SSID New way
8192 queues, 4MB memory limit
30
WiFi Queue Rework All buffering moves into the MAC80211 layer
Max 2 aggregates pending (~8ms)
31
API requirements Completion handler get_expected_throughput
32
Benefits Per device max queuing, not per SSID Minimal buffering
33
fq_codel per station Mixes flows up Controls queue length
Enables lossless congestion control (ECN)
34
API Init the softirq layer Add a callback for the completion handler
35
Knobs and Stats Aggh! We eliminated the qdisc layer!
tc qdisc show dev wlp3s0 qdisc noqueue 0: dev wlp3s0 root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Ifconfig wlp3s0 # still works! Link encap:Ethernet HWaddr 30:10:b3:77:bf:7b inet addr: Bcast: Mask: inet6 addr: fe80::3210:b3ff:fe77:bf7b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets: errors:0 dropped:0 overruns:0 frame:0 TX packets: errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes: (2.3 GB) TX bytes: (1.2 GB) Hmm… Maybe we could put sch_fq there? ev:wlp3s0# tc -s qdisc show dev wlp3s0 qdisc fq 8001: root refcnt 5 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum initial_quantum refill_delay 40.0ms Sent bytes pkt (dropped 0, overlimits 0 requeues 0) Backlog 0b 0p requeues 0 107 flows (107 inactive, 0 throttled) 0 gc, 0 highprio, 72 throttled
36
Top Level Statistics cat aqm access name value R fq_flows_cnt 4096 R fq_backlog 0 R fq_overlimit 0 R fq_overmemory 0 R fq_collisions 14 R fq_memory_usage 0 RW fq_memory_limit # 4MB on n per device not SSID RW fq_limit 2048 # packet limit IMHO too high by default RW fq_quantum 1514 # 300 currently by default, no...
37
Per station stats ev:wlp3s0/stations/80:2a:a8:18:1b:1d# cat aqm tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
38
Per Station Stats /sys/kernel/debug/ieee80211/phy0/netdev:wlp3s0/stati ons/80:2a:a8:18:1b:1d:1b:1d# cat airtime RX: us TX: us Deficit: -168 us Queued as new: 0 times Queue cleared: 0 times Dequeue failed: 0 times
39
Per SSID stats /sys/kernel/debug/ieee80211/phy0/netdev:wlp3s0/aqm AC
Backlog-bytes Backlog-packets New-flows drops marks overlimit collissions tx-bytes tx-packets 2 617594 14 617602
40
Problems Some devices don’t tell you how much they are aggregating
Some don’t have a tight callback loop Some expose insufficient rate control information Some have massively “hidden buffers” in the firmware Some have all of these issues! All of them have other bugs!
41
TSQ Issue?
42
Problem: NAS & TSQ
43
Futures Add Airtime fairness Further reduce driver latency
Improve rate controll Minimize Multicast Add more drivers/devices Figure out APIs
44
Airtime Fairness (ATF)
Ath9k driver only Switch to choosing stations based on sharing the air fairly – sum the rx and tx time to any given station, figure out which stations should be serviced next Allocate the “air” Huge benefits with a mixture of slow (or legacy) and fast stations. The fast ones pack WAY more data into their slot, the slow ones slow down slightly.
45
Explicit bandwidth/latency tradeoff
46
Credits
47
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.