This knowledge base has moved to the documentation site. Please visit the knowledge base here for the most up to date content. This site is no longer maintained.

Slow Bandwidth/Performance Utilization while Using VXLAN



When utilizing Linux hypervisors such as KVM or Xen with Intel 10GbE NICs there can be a problem where the server gets significantly less performance (2-3Gbps) than line rate when using VXLANs and the hypervisor or dom0 as the VXLAN endpoints.


This problem is not something related to Cumulus Linux, but something that comes up often when deploying Cumulus Linux in the data center. It is a problem with specific Intel NICs (on the host/server side) while utilizing VXLAN. It relates to Linux-based offerings with Intel 82599, the x520 and the x540 adapters).

This is related to a specific design decision made by the Intel ixgbe driver maintainers. The Intel 10GbE hardware has the ability to perform RSS (Receive Side Scaling) and spread the load of performing packet reception across multiple CPUs/queues.

When a fragmented UDP frame arrives at the host, Intel made the decision that all UDP frames with the fragmentation bit set would arrive on CPU/queue 0 rather than on any of the other CPUs/queues. Because this decision may result in out of order frames (which is especially bad when streaming video), Intel decided to not perform RSS on any UDP traffic. TCP traffic will not have this restriction by default. Since STT (Stateless Transport Tunneling), another network overlay protocol, has a header that looks like TCP, these encapsulated frames will be classified as TCP by the Intel hardware and RSS will be performed without noticing any loss of throughput. The fix for this is easy as long as the hypervisor or dom0 kernel + ethtool is modern enough to contain the patches to configure this setting.


At a minimum item #1 will need to be performed. If there is no change in throughput, check item #2 also: 

Item 1

Run the following command on the hypervisor or dom0 on each boot:

ethtool -N [device] rx-flow-hash udp4 sdfn

Expect to see output like this after running the command:

enabling UDP RSS: fragmented packets may arrive out of order to the stack above

This ethtool option will remove the default restriction placed on UDP traffic and will spread this traffic across all CPUs in the hypervisor or dom0. Checking network device statistics can be done with the command:

ethtool -S [device]

This command is a good way to check that all receive queues are being used when performing a multi-stream test or verify that they are not being used before running the ethtool command to change the rx-flow-hash settings.

Item 2

If there is no change to the throughput in a multi-stream test, check to be sure that more than 1 vCPU is assigned to the dom0. A guide to changing these settings on Citrix Xenserver 6.2 can be found at


This support portal has moved

Cumulus Networks is now part of the NVIDIA Networking Business Unit! The NVIDIA Cumulus Global Support Services (GSS) team has merged its operations with the NVIDIA Mellanox support services team.

You can access NVIDIA Cumulus support content from the Mellanox support portal.

You open and update new cases on the Mellanox support portal. Any previous cases that have been closed have been migrated to the Mellanox support portal.

Cases that are still open on the Cumulus portal will continue to be managed on the Cumulus portal. Once these cases close, they will be moved to the Mellanox support portal.

Powered by Zendesk