Fixing Wifi Latency… Finally!

Slides:

Advertisements

Similar presentations

10/10/14 INASP: Effective Network Management Workshops Unit 6: Solving Network Problems.

Advertisements

Linux network troubleshooting If your network connection is not working..

1 Updates on Backward Congestion Notification Davide Bergamasco Cisco Systems, Inc. IEEE 802 Plenary Meeting San Francisco, USA July.

Submission doc.: IEEE /1264r0 September 2014 Dave Taht, Bufferbloat.netSlide 1 A Bit about Bufferbloat Date: Authors:

An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.

Network Management And Debugging

Linux Setting up your network. Basic Approaches Configure during installation –Disadvantage -> not able to redo easily –Advantage-> holds your hand Configure.

Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.

TCP Behavior across Multihop Wireless Networks and the Wired Internet Kaixin Xu, Sang Bae, Mario Gerla, Sungwook Lee Computer Science Department University.

S3C2 – LAN Switching Addressing LAN Problems. Congestion is Caused By Multitasking, Faster operating systems, More Web-based applications Client-Server.

System Troubleshooting TCS Network, System, and Load Monitoring TCS for Developers.

The Internet is Broken, and How to Fix It Jim Gettys Bell Labs July 27, 2012.

Linux Setting up your network. Basic Approaches Configure during installation –Disadvantage -> not able to redo easily –Advantage-> holds your hand Configure.

IPv6 – The Future Of The Internet Redbrick Networking Conference 26 March 2003 Dave Wilson DW238-RIPE.

TCP with Variance Control for Multihop IEEE Wireless Networks Jiwei Chen, Mario Gerla, Yeng-zhong Lee.

Transport Layer3-1 TCP throughput r What’s the average throughout of TCP as a function of window size and RTT? m Ignore slow start r Let W be the window.

EXPOSING OVS STATISTICS FOR Q UANTUM USERS Tomer Shani Advanced Topics in Storage Systems Spring 2013.

VLBI_UDP An application for transferring VLBI data via UDP protocol Simon Casey e-VLBI meeting, Haystack 20 Sep 2006.

BNL PDN Enhancements. Perimeter Load Balancers Scaleable Performance Fault Tolerance Server Maintainability User Convenience Perimeter Security.

Linux Operations and Administration Chapter Eight Network Communications.

CSN09101 Networked Services Week 5 : Networking

TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.

1 Fair Queuing Hamed Khanmirza Principles of Network University of Tehran.

1 Sheer volume and dynamic nature of video stresses network resources PIE: A lightweight latency control to address the buffer problem issue Rong Pan,

1 Three ways to (ab)use Multipath Congestion Control Costin Raiciu University Politehnica of Bucharest.

Fall 2011 Nassau Community College ITE153 – Operating Systems 1 Session 9 Networking & Operating Systems (part 2)

1 COMP 431 Internet Services & Protocols The IP Internet Protocol Jasleen Kaur April 21, 2016.

25/09/2016 INASP: Effective Network Management Workshops Unit 6: Solving Network Problems.

Ifconfig Kevin O'Brien Washtenaw Linux Users Group

Distributed Systems 11. Transport Layer

Linux network troubleshooting

Youngstown State University Cisco Regional Academy

Network speed tests CERN 14-dec-2010.

Instructor Materials Chapter 6: Quality of Service

Solving Real-World Problems with Wireshark

Fixing Wifi Latency… Finally!

Impact of New CC on Cross Traffic

QoS & Queuing Theory CS352.

Topics discussed in this section:

Neha Jain Shashwat Yadav

Fixing Wifi Latency… Finally!

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers

Congestion Control, Quality of Service, and Internetworking

TX bulking and qdisc layer

HCF and EDCF Simulations

MACAW: A Media Access Protocol for Wireless LAN’s

Google’s BBR Congestion control algorithm

3 | Analyzing Server, Network, and Client Health

CIS, University of Delaware

Managing the performance of multiple radio Multihop ESS Mesh Networks.

Chapter 6 Queuing Disciplines

Multipath QUIC: Design and Evaluation

EE 122: Network Applications

Open Issues in Router Buffer Sizing

Congestion Control, Internet transport protocols: udp

© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 6: Quality of Service Connecting Networks.

ECF: an MPTCP Scheduler to Manage Heterogeneous Paths

Queuing and Queue Management

Simulation Results for QoS, pDCF, VDCF, Backoff/Retry

The University of Adelaide, School of Computer Science

The Impact of Multihop Wireless Channel on TCP Performance

University of Houston Datacom II Lecture 1B Review Dr Fred L Zellner

An Integrated Congestion Management Architecture for Internet Hosts

Advanced Computer Networks

Centralized Arbitration for Data Centers

Packet Scheduling in Linux

Lecture 16, Computer Networks (198:552)

Lightweight but Powerful QoS Solution for Embedded Systems

Transport Layer: Congestion Control

Lecture 6, Computer Networks (198:552)

Presentation transcript:

Fixing Wifi Latency… Finally!

Low latency/Good bandwidth At all mcs rates

Wifi Latency A quick rant about benchmarks Fixed API Futures Overview Wifi Latency A quick rant about benchmarks Fixed API Futures

Grokking Bufferbloat fixes in 20 Minutes

Most of your benchmarks are wrong! Iperf3 “millabursts” Tmix PPS

Meet Ham, the marketing monkey Ham likes big numbers! Ham doesn’t care if they mean anything Ham is good with winning phrases Meet Ham, the marketing monkey Ham, has been my nemesis, for a very, very long time. He’s good at a couple things – naming We have “bufferbloat”, and “fq_codel”, and BQL” Ham comes up with things like “streamboost”, and “LagBuster” and Or optimizing for what you thought was important.

Typical Wifi Testbench Over a wire Over the air DUT Attenuators Over a wire Over the air Photos courtesy: Candelatech

Problems with WiFi Lab Bench Testing Distance Multipath (many) Legacy Devices (many) Competing devices Reasoning from non Internet scale RTTs Treating UDP and TCP results with equal weight Rate over range without looking at latency or rate control Tests that don’t run long enough Summary statistics that don’t look at inter-station behaviors

Real World WiFi uses

What works Plots of Long duration tests Varying RTTs Packet Captures Tracing Sampling Other tests NOT. JUST. BANDWIDTH. In particular, there isn’t just one number for RTT

100 stations ath10k Today Is it the average? Yea, out of the 5 flows out of a hundred flows that got started, each got 20Mbits The median? I don’t even know where to start.

Quick n Dirty’ Contended Web PLT Linux 4 Quick n Dirty’ Contended Web PLT Linux 4.4 txqueue 1000 (10ms base RTT + 1 sec delay) Flent -l 300 -H server –streams=12 tcp_ndown & wget -E -H -k -K -p https://www.slashdot.org FINISHED --2016-10-22 13:43:35-- Total wall clock time: 232.8s Downloaded: 62 files, 1.8M in 0.9s (77KB/s) Connect time accounted for 99.6% of the runtime, and the real throughput was: 77KB/sec. WTF?

Better Benchmarks Dslreports test for bufferbloat Flent has extensive plotting facilities & wraps netperf/ping/d-itg for rtt_fair*var: Lets you test multiple numbers of stations rrul, rrul_be: Exercises up/down and hw queues simultaneious tcp_nup, tcp_ndown: Test variable numbers of flows http rrul_voip About 90 other tests TSDE Tcptrace -G & Xplot.org Wireshark

100 stations Ath10k Airtime Fair (wip) Just curious: How many people here are willing to take, I don’t know, a 50% bandwidth hit to some other application, just so your skype call, videoconference, audio stream, or ssh session stayed fast and reliable over wifi?

Heresy! Optimize for “time to service” stations Pack Aggregates Better Serve Stations better Look at Quality of Experience (QOE) Voip Mean Opinion Scores (MOS) (under load) Web Page Load Time (under load) Latency under load (vs various mcs wifi rates) Long Term stability and “smoothness” Flow completion time (FCT) Loss, jitter and queuing

Control Queue Length Be fair to flows Be fair to stations Optimize for TXOPs ~1300/sec Into which you pour bandwidth

Queuing Delay Interference, slow rates, etc Mis-interpreted – queuing delay

Aggregation improvements

Web PLT improvements under competing load FIFO load time was 35 sec on the large page!

VOIP MOS scores & Bandwidth 3 stations contending (ath9k) LEDE Pending LEDE Today LEDE Pending Linux 4.4 Today LEDE Today Linux 4.4 Today OpenWrt CC OpenWrt CC Best Effort – scheduled sanely on the AP – Better than the HW VO queue!

Latency below 20ms FIXME – use the c2 plot

Two bloat-free coffee shops

Fix ISP bufferbloat FIRST We get to 22 seconds and it explodes at the end of the track. Courtesy dslreports: http://www.dslreports.com/speedtest/5408767 http://www.dslreports.com/speedtest/5448121

Upload Bloat Dslreports: http://www.dslreports.com/speedtest/results/bufferbloat?up=1

file:///home/d/git/sites/cerowrt/content/flent/crypto_fq _bug/airtime_plot.png Airtime fair pie file:///home/d/git/sites/cerowrt/content/flent/fq_codel_ osx/loss_sack.png Sack zoomed: file:///home/d/git/sites/cerowrt/content/flent/fq_codel_o sx/sacked_zoomed.png file:///home/d/git/sites/cerowrt/content/flent/qca-10.2- fqmac35-codel-5/wonderwhattodoaboutbeacons.png

What we did No changes to WiFi Backoff function stays the same Only modified the queuing and aggregation functions We added Per station queueing Removed the qdisc fq_codel per station RR FQ between stations (currently) DRR Airtime Fairness between stations (pending)

Overall Philosophy One aggregate in the hardware One aggregate queued on top, ready to go One being prepared The rest of the packets being fq’d per station Total delay = 2-12ms in the layers This is plenty of time (BQL not needed) Currently max sized txops (we can cut this) Codel moderates the whole thing

Per Device Old way 4 sets of 1000 packet queues per SSID New way 8192 queues, 4MB memory limit

WiFi Queue Rework All buffering moves into the MAC80211 layer Max 2 aggregates pending (~8ms)

API requirements Completion handler get_expected_throughput

Benefits Per device max queuing, not per SSID Minimal buffering

fq_codel per station Mixes flows up Controls queue length Enables lossless congestion control (ECN)

API Init the softirq layer Add a callback for the completion handler

Knobs and Stats Aggh! We eliminated the qdisc layer! tc qdisc show dev wlp3s0 qdisc noqueue 0: dev wlp3s0 root refcnt 2 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Ifconfig wlp3s0 # still works! Link encap:Ethernet HWaddr 30:10:b3:77:bf:7b inet addr:172.22.224.1 Bcast:172.22.224.255 Mask:255.255.255.0 inet6 addr: fe80::3210:b3ff:fe77:bf7b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2868196 errors:0 dropped:0 overruns:0 frame:0 TX packets:2850840 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2331543636 (2.3 GB) TX bytes:1237631655 (1.2 GB) Hmm… Maybe we could put sch_fq there? root@nemesis:/sys/kernel/debug/ieee80211/phy0/netd ev:wlp3s0# tc -s qdisc show dev wlp3s0 qdisc fq 8001: root refcnt 5 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140 refill_delay 40.0ms Sent 16061543 bytes 10915 pkt (dropped 0, overlimits 0 requeues 0) Backlog 0b 0p requeues 0 107 flows (107 inactive, 0 throttled) 0 gc, 0 highprio, 72 throttled

Top Level Statistics root@nemesis:/sys/kernel/debug/ieee80211/phy0# cat aqm access name value R fq_flows_cnt 4096 R fq_backlog 0 R fq_overlimit 0 R fq_overmemory 0 R fq_collisions 14 R fq_memory_usage 0 RW fq_memory_limit 4194304 # 4MB on 802.11n per device not SSID RW fq_limit 2048 # packet limit IMHO too high by default RW fq_quantum 1514 # 300 currently by default, no...

Per station stats root@nemesis:/sys/kernel/debug/ieee80211/phy0/netd ev:wlp3s0/stations/80:2a:a8:18:1b:1d# cat aqm tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets 0 2 0 0 975208 1 14 0 0 992492788 1713258 1 3 0 0 0 0 0 0 0 0 0 2 3 0 0 0 0 0 0 0 0 0 3 2 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 5 1 0 0 3276 0 0 0 0 386568 3276 6 0 0 0 528 0 0 0 0 106306 528 7 0 0 0 1587 0 0 0 0 287247 1587 8 2 0 0 0 0 0 0 0 0 0 9 3 0 0 0 0 0 0 0 0 0 10 3 0 0 0 0 0 0 0 0 0 11 2 0 0 0 0 0 0 0 0 0 12 1 0 0 0 0 0 0 0 0 0 13 1 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0

Per Station Stats /sys/kernel/debug/ieee80211/phy0/netdev:wlp3s0/stati ons/80:2a:a8:18:1b:1d:1b:1d# cat airtime RX: 471104269 us TX: 92761799 us Deficit: -168 us Queued as new: 0 times Queue cleared: 0 times Dequeue failed: 0 times

Per SSID stats /sys/kernel/debug/ieee80211/phy0/netdev:wlp3s0/aqm AC Backlog-bytes Backlog-packets New-flows drops marks overlimit collissions tx-bytes tx-packets 2 617594 14 1940667 617602

Problems Some devices don’t tell you how much they are aggregating Some don’t have a tight callback loop Some expose insufficient rate control information Some have massively “hidden buffers” in the firmware Some have all of these issues! All of them have other bugs!

TSQ Issue?

Problem: NAS & TSQ

Futures Add Airtime fairness Further reduce driver latency Improve rate controll Minimize Multicast Add more drivers/devices Figure out APIs

Airtime Fairness (ATF) Ath9k driver only Switch to choosing stations based on sharing the air fairly – sum the rx and tx time to any given station, figure out which stations should be serviced next Allocate the “air” Huge benefits with a mixture of slow (or legacy) and fast stations. The fast ones pack WAY more data into their slot, the slow ones slow down slightly.

Explicit bandwidth/latency tradeoff

Credits

Questions?