Redistribute Neighbor

Follow

Redistribute neighbor is an experimental feature in Cumulus Linux that provides a mechanism for IP subnets to span racks without forcing the end hosts to run a routing protocol. Cumulus Linux uses the existing concept of redistributing one protocol into another to help simplify the transition to L3 fabrics.

The fundamental premise behind redistribute neighbor is to announce individual host /32 routes — not subnet routes — in the routed fabric. You do this to compile a list of IP addresses that are hosted in the southbound L2 domain and advertising reachability to those IP addresses into the routing fabric. Other hosts on the fabric can then use this new path to access the hosts in the fabric. If multiple equal-cost paths (ECMP) are available, traffic can load balance across the available paths natively.

The challenge is to accurately compile and update this list of reachable hosts or neighbors. Luckily, existing commonly-deployed protocols are available to solve this problem. Hosts use ARP to resolve MAC addresses when sending to an IPv4 address. A host then builds an ARP cache table of known MAC addresses: IPv4 tuples as they receive or respond to ARP requests. In Linux, this is stored as a kernel-level IPv4 neighbor table. Similarly, IPv6 uses neighbor discovery (ndisc) to resolve the MAC-to-IPv6 address, storing this mapping in an IPv6 neighbor table.

In the case of a ToR switch, where the default gateway is deployed for hosts within the rack, the ARP cache table contains a list of all hosts that have ARP'd for their default gateway. In many scenarios, this table contains all the L3 information that's needed. This is where redistribute neighbor comes in, as it is a mechanism of formatting and syncing this table into the routing protocol.

{{table_of_contents}}

Availability

Note: Redistribute neighbor is an experimental feature, so it is currently stored in the testing repository. To install it, you first need to add the testing repo.

Redistribute neighbor is distributed as python-rdnbrd. The package is available in the testing repository in Cumulus Linux 2.1 and later.

Target Use Cases and Best Practices

Redistribute neighbor was created with these use cases in mind:

  • Virtualized clusters
  • Hosts with service IP addresses that migrate between racks
  • Hosts that are dual connected to two ToRs without using proprietary protocols such as MLAG
  • Anycast services needing dynamic advertisement from multiple hosts

Cumulus Networks recommends following these guidelines with redistribute neighbor:

  • Use a single logical connection from each host to each top of rack switch (ToR).
  • A host can connect to one or more ToRs. Each ToR advertizes the /32 it sees in its neighbor table. The python-rdnbrd daemon watches the neighbor table and creates /32 routes in table 10.
  • A host-bound bridge/VLAN should be local to each switch only.
  • ToR switches with redistribute neighbor enabled should be directly connected to the hosts.
    Note: Intermediate L2 switches or bridges are not recommended, as they may pass host ARPs to other hosts, where ARP caching may lead to black-holing of traffic.
  • IP addressing must be non-overlapping, as the host IPs are directly advertised into the routed fabric.
  • Run redistribute neighbor on Linux-based hosts primarily; other host operating systems may work, but Cumulus Networks has not actively tested any at this stage.

Configuring the ToR(s)

  1. Configure the host facing ports in /etc/network/interfaces:
    # The loopback network interface
    auto lo
    iface lo inet loopback
    
    auto lo:1
    iface lo:1 inet static
        address 10.1.0.253/32
    
    auto swp1
    iface swp1 inet static
        address 10.1.0.253/32
    
    auto swp2
    iface swp2 inet static
        address 10.1.0.253/32
    
    auto swp3
    iface swp3 inet static
        address 10.1.0.253/32
    
    auto swp4
    iface swp4 inet static
        address 10.1.0.253/32
    
    auto swp5
    iface swp5 inet static
        address 10.1.0.253/32
    
    auto swp6
    iface swp6 inet static
        address 10.1.0.253/32
    
    auto swp7
    iface swp7 inet static
        address 10.1.0.253/32
    
    auto swp8
    iface swp8 inet static
        address 10.1.0.253/32
  2. Add/uncomment the testing repo to your apt sources file:
    cumulus@switch:~$ sudo vi /etc/apt/sources.list
  3. Install the python-rdnbrd package:
    cumulus@switch:~$ sudo apt-get update
    cumulus@switch:~$ sudo apt-get install python-rdnbrd
    
  4. Start the daemon:
    cumulus@switch:~$ sudo service rdnbrd restart
  5. Enter the Quagga CLI and enter config mode:
    cumulus@switch:~$ sudo vtysh
    [sudo] password for cumulus:
    Hello, this is Quagga (version 0.99.22.4).
    Copyright 1996-2005 Kunihiro Ishiguro, et al.
    quagga# conf t
    quagga(config)#
  6. Use the Quagga CLI to import the ARP table:
    1. Add the table as routes into the local routing table:
      quagga(config)# ip import-table 10 distance 20
    2. Redistribute the imported routes into OSPF:
      quagga(config)# router ospf
      quagga(config-router)# redistribute table 10
    3. Exit ospf config:
      quagga(config-router)# exit
  7. Define a route-map to mask only the ARP entries from the relevant segment.
    1. In the Quagga CLI, define the route-map:
      quagga(config)# route-map rdarp permit 1
      quagga(config-route-map)# match interface swp1
      quagga(config)# route-map rdarp permit 2
      quagga(config-route-map)# match interface swp2
      quagga(config)# route-map rdarp permit 3
      quagga(config-route-map)# match interface swp3
      quagga(config)# route-map rdarp permit 4
      quagga(config-route-map)# match interface swp4
      quagga(config)# route-map rdarp permit 5
      quagga(config-route-map)# match interface swp5
      quagga(config)# route-map rdarp permit 6
      quagga(config-route-map)# match interface swp6
      quagga(config)# route-map rdarp permit 7
      quagga(config-route-map)# match interface swp7
      quagga(config)# route-map rdarp permit 8
      quagga(config-route-map)# match interface swp8
      quagga(config-route-map)# exit
      
    2. Add the route-map to the table import:
      quagga(config)# ip protocol table route-map rdarp

Configuring the Host(s)

There are a few possible host configurations that range in complexity. This article only covers the basic use case: dual-connected Linux hosts with static IP addresses assigned.

Additional host configurations will be covered in future separate knowledge base articles.

Configuring the Dual-connected Hosts

Configure one or more hosts with the same /32 IP address on its loopback (lo) and uplinks (in this example, eth0 and eth1). This is done so both TOR switches advertise the same /32 regardless of the interface. Cumulus Linux relies on ECMP (routing) to load balance across the interfaces southbound, and an equal cost static route (see the configuration below) for load balancing northbound.

Additionally, install and use ifplugd. ifplugd modifies the behavior of the Linux routing table when an interface undergoes a link transition (carrier up/down). The Linux kernel by default leaves routes up even when the physical interface is unavailable (NO-CARRIER).

The loopback hosts the primary service IP address(es) and to which you can bind services.

  1. Configure the loopback and physical interfaces:
    # The loopback network interface
    auto lo
    iface lo inet loopback
    
    auto lo:1
    iface lo:1 inet static
        address 10.1.0.1/32
        up ip route add 0.0.0.0/0 nexthop via 10.1.0.253 dev eth0 onlink nexthop via 10.1.0.254 dev eth1 onlink
    
    auto eth0
    iface eth0 inet static
        address 10.1.0.1/32
    
    auto eth1
        iface eth1 inet static
        address 10.1.0.1/32
  2. Install ifplugd on the host and modify the settings in /etc/default/ifplugd:
    root@server1:~# apt-get update  
    root@server1:~# apt-get install ifplugd
    root@server1:~# vi /etc/default/ifplugd
    

    For full instructions on installing ifplugd on Ubuntu, follow this guide.

Known Limitations

Redistribute neighbor is an experimental feature, so it is not recommended for production use at this time. We actively encourage testing in lab environments and welcome all feedback to help improve and solidify the feature.

TCAM Route Scale

This feature adds each ARP entry as a /32 host route into the routing table of all switches within a summarization domain. Take care to keep the number of hosts minus fabric routes under the TCAM size of the switch. See the Cumulus Networks datasheets for up to date scalability limits of your chosen hardware platforms. If in doubt, contact Cumulus Networks support or your Cumulus Networks CSE; they will be happy to help.

Possible Uneven Traffic Distribution

Linux uses source L3 addresses only to do load balancing on most older distributions.

Default Host ARP Behavior Leads to Forwarding Delays during Endpoint Moves

Redistribute neighbor is built around taking the ARP/IP entry for the host and mapping that into the routing tables so that the whole network knows exactly how to reach the given host's IP address. Further, every host knows to send all traffic to the ToRs to properly route it to the ultimate destination.

This fundamentally breaks down if any two hosts learn about one another due to seeing each other's ARP.

For example, if there are two hosts on the same bridge, when one of them issues a gratuitous ARP, the other will see and, depending on its configuration, may cache the information. If that first host moves, the second will not know and will continue to try and send traffic directly via the learned ARP entry. This continues until the ARP entry naturally times out.

This can occur either if bridges are configured on the ToRs or if virtual bridges are configured on the physical hardware and the hosts are VMs of that physical machine.

Modifications to the default ARP configuration can reduce the impact of this.

Redistribute Neighbor Supported with BGP Unnumbered Interfaces and OSPF Only

If you are using BGP for routing and your interfaces are numbered, you must configure BGP unnumbered interfaces with the v6only option. For example:

neighbor swp49 interface v6only

For information on configuring BGP unnumbered interfaces, read the Cumulus Linux user guide.

Support for BGP numbered interfaces will be in a later release.

Silent Hosts Never Receive Traffic

Freshly provisioned hosts that have never sent traffic may not ARP for their default gateways. Typically, host OSes will issue a gratuitous ARP, but since this architecture heavily relies on this mechanism, steps should be taken to ensure that the ARP is sent out.

Support for IPv4 Only

This release of redistribute neighbor supports IPv4 only. IPv6 support will be added in a future release.

Feedback

We would highly appreciate any detailed feedback on use case, bugs, improvements or general comments on redistribute neighbor. Please share it with your account team: Account Manager (AM) and Solutions Engineer (SE). If you purchased Cumulus Linux via a channel partner, they can direct you to the appropriate resources.

You may also submit feedback via support, but please note that this is an experimental feature, so your support representative may not be completely familiar with it at this early stage.

Detailed feedback will directly help improve this feature's robustness, so we strongly encourage you to be as candid and detailed in your feedback as possible.

Have more questions? Submit a request

Comments

Powered by Zendesk