Overview: General Hardware Routing Operation
Connected routes are installed into hardware as punt entries to ensure that a frame is punted to the CPU in order to solicit the operating system to resolve ARP/ND (address resolution protocol/neighbor discovery) for the destination IP address; on other network operating systems this is called a glean entry.
As ARP/ND cache entries for IP addresses within the connected route are resolved, they are programmed into the ASIC with proper layer 2 rewrite information. Subsequent routing (via longest prefix match) lookups in hardware then hit the ARP/ND entries since they are more specific than a subnet route. When transit routed packets hit the ARP/ND entry, they are now hardware forwarded (that is, no longer punted) as the matched entry is programmed with correct layer 2 rewrite information.
- Cumulus Linux 3.7.10 and later
- Cumulus Linux 4.0.0
- A route with an egress interface only (no next hop) is considered to be connected, as the layer 2 information for each IP address within the route needs to be directly resolved in order to forward a packet destined to an address within that route.
- While the Linux kernel can have discrete entries for a host route in two separate tables (the routing table versus the ARP/ND table), hardware considers both host routes and neighbor entries to be the same thing. As such, you can only program either a host route or a neighbor entry for the same host (/32 or /128) prefix into hardware.
switchd Operation and Limitations
switchd receives routes from the Linux kernel and determines what layer 2 rewrite information it should program into hardware.
- If the Linux kernel passes switchd a route that has a next hop, it uses the layer 2 information of the next hop (or the recursive next hop) when programming the layer 2 rewrite for the prefix.
- If the Linux kernel passes switchd a route that does not have a next hop, switchd installs the route as a punt entry, as the route meets the criteria to be considered connected.
There is no special handling in switchd to check the ARP/ND cache for layer 2 information upon receipt of a connected route.
Behavior when both a Host Route and Neighbor Entry Exist for the Same Host
switchd has a configuration option in /etc/cumulus/switchd.conf called route.route_preferred_over_neigh, which decides how to handle a collision of software entries for the same prefix (the route and ARP/ND). The value of this option determines which of the two should be programmed into hardware when the kernel has overlapping entries (both host route and neighbor entry for that route).
- When set to TRUE, switchd installs the host route instead of the ARP/ND cache entry.
- When set to FALSE, switchd installs the ARP/ND cache entry instead of the host route.
In Cumulus Linux 4.0.0, the default value was changed from FALSE to TRUE.
In situations where you have a connected host route — generally from a static route with only an egress interface — the Linux kernel resolves the ARP/ND for this host IP on its own and the software forwarding tables are completed as expected. Both of these entries are then passed to switchd, which then checks the value for route.route_preferred_over_neigh to determine whether the route or the ARP/ND cache entry should be programmed into hardware.
If route.route_preferred_over_neigh is set to TRUE then switchd programs this route into hardware instead of the ARP/ND entry. Since the route is considered connected, the route is programmed as a punt entry despite the kernel having a valid ARP/ND cache entry for the IP.
This specific situation can be avoided a couple different ways:
- By having the routes use the next hop of the prefix itself.
- By allowing the ARP/ND entry to be installed instead of the route.
Having the Routes Use the Next Hop of the Prefix Itself
Both the Linux kernel and iproute2 allow routes to use a next hop of the prefix itself.
For example, if you issue ip route add 10.0.0.1/32 via 10.0.0.1 dev vlan100, the resulting kernel route has a next hop that points to 10.0.0.1. When switchd receives this route it looks in the ARP cache for 10.0.0.1 and programs the 10.0.0.1/32 route in hardware with the layer 2 rewrite information from the ARP entry, resulting in a hardware-forwarded route entry.
Note that FRRouting accepts a static route with a next hop of the prefix itself, but the route gets marked as inactive in the RIB and does not get installed into the kernel. To use this method persistently, you need to configure a post-up script in for the interface in the /etc/network/interfaces file. For example:
auto vlan100 iface vlan100 address 172.16.0.1/24 post-up ip route add 10.0.0.1/32 via 10.0.0.1 dev vlan100
Allowing the ARP/ND Entry to Be Installed instead of the Route
Setting route.route_preferred_over_neigh to FALSE causes the ARP/ND entry to be installed instead of the route. This results in a hardware-forwarded entry as long as the ARP/ND cache entry has valid layer 2 information for the IP address.
However, there are situations where this is not preferable due to the kernel's default ARP settings. For example, gratuitous ARPs can create a new ARP entry that would take precedence over an exiting host route for the same prefix. Some kernel ARP settings, such as arp_accept and arp_ignore, may be worth adjusting to avoid this behavior. You can find more information on these settings here:
- Address Resolution Protocol - ARP in the Cumulus Linux User Guide
- Changing ARP timers in Cumulus Linux
Additionally, you can reference man 7 arp and https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt.