ascia.tech

Proxmox Latency Fixes

· C.M. Hobbs

For a few months I have been having trouble with the guests on one of my Proxmox hosts. Regardless of drivers and settings for virtio network interfaces or guest operating systems, I was getting a solid 900Mbps up but a sluggish 5Mbps down on all of the guests while the host had 1Gbps symmetrical. This was one of the more bizarre networking challenges I had encountered to date. As a side effect of troubleshooting this, I found (and resolved) some congestion and routing issues in some of the equipment upstream. However, that’s a post for another day.

After exhaustive tests with iperf3 and speedtest-cli, I tried several fixes on the host and guests: Different drivers, different adaptors, I checked for QoS and firewall issues, I messed with multiqueue, I checked on rate limiting, I flipped all the switches for checksum offloading on the guests, and modified queue sizes. These attempts and a raft of other random ideas didn’t resolve anything.

Finally I started digging around with ethtool on the host and noticed some offloading options there. Specifically Generic Receive Offload and Hardware Generic Receive Offload. Temporarily disabling these options was pretty straightforward:

ethtool -K MYINT rx-gro-hw off
ethtool -K MYINT gro off

This resolved the issue. I’m still not entirely sure why these were affecting traffic. As I understand it, gro combines multiple incoming packets into larger packets before handing them off to the CPU, while rx-gro-hw is a hardware assisted version of gro, performed by the NIC itself. The best I can tell is that the host is trying to optimize packets that are already being processed by the VM’s network stack and that seems to cause a bottleneck. What I don’t currently understand is why this was an issue for receive but not transmit. I also don’t understand why these defaults were applied to my specific hardware.

I have a lot of research to do but I’m glad the traffic is flowing smoothly now. I’ll be spending some quality time with the TCP offload engine page on Wikipedia.

#100daystooffload #proxmox #networking #linux

Reply to this post by email ↪