Cumulus Linux 3.3.0 Release Notes

Follow

Overview

These release notes support Cumulus Linux 3.3.0 and describe currently available features and known issues. 

Stay up to Date 

  • Please sign in and click Follow above so you can receive a notification when we update these release notes.
  • Subscribe to our product bulletin mailing list to receive important announcements and updates about issues that arise in our products.
  • Subscribe to our security announcement mailing list to receive alerts whenever we update our software for security issues.

{{table_of_contents}}

What's New in Cumulus Linux 3.3.0

Cumulus Linux 3.3.0 includes the following features and improvements:

  • New 25G platform: Expanding the 25G portfolio with the Quanta IX2.
  • Network Command Line Utility: Adds coverage for DNS, NTP, syslog, VRF, EVPN, 802.1X that will give network operators a single tool to configure and operate their Cumulus Linux switches. You can see the list of changes made in this release here.
  • Buffer monitoring: Proactively detect congestion events that result in latency and jitter by monitoring traffic patterns to identify bottlenecks early and effective plan for capacity.
  • PIM-SSM: Source-Specific Multicast for more efficient multicast traffic segmentation and higher scalability.
  • EVPN: Includes ARP suppression, static/sticky MAC and interoperability with Cisco NXOS.
  • 802.1X interfaces: Authenticate clients over wired media.
  • DHCP relay service improvements: Cumulus Linux 3.3.0 includes support in the DHCP relay service for detecting the physical interface a packet was received from, and using this value to populate the circuit-id field. This can be done with the --use-pif-circut-id option. This option is disabled by default.
  • Faster installation times.

Early Access Features

The following early access features are included in Cumulus Linux 3.3.0:

  • Cumulus Express 10128: Based on the Facebook Backpack, the first open chassis that comes pre-loaded with Cumulus Linux.

Note: The EA version of netq is not supported under Cumulus Linux 3.3.

Licensing

Cumulus Linux is licensed on a per-instance basis. Each network system is fully operational, enabling any capability to be utilized on the switch with the exception of forwarding on switch panel ports. Only eth0 and console ports are activated on an un-licensed instance of Cumulus Linux. Enabling front panel ports requires a license.

You should have received a license key from Cumulus Networks or an authorized reseller. To install the license, read the Cumulus Linux Quick Start Guide.

Installing Version 3.3.0

If you are upgrading from version 3.0.0 or later, use apt-get to update the software.

  1. Run apt-get update.
  2. Run apt-get upgrade.
  3. Reboot the switch.

New Install or Upgrading from Versions Older than 3.0.0

If you are upgrading from a version older than 3.0.0, or installing Cumulus Linux for the first time, download the Cumulus Linux 3.3.0 installer for Broadcom or Mellanox switches from the Cumulus Networks website, then use ONIE to perform a complete install, following the instructions in the quick start guide.

Note: This method is destructive; any configuration files on the switch will not be saved, so please copy them to a different server before upgrading via ONIE.

Important! After you install, run apt-get update, then apt-get upgrade on your switch to make sure you update Cumulus Linux to include any important or other package updates.

Updating a Deployment that Has MLAG Configured

If you are using MLAG to dual connect two switches in your environment, and those switches are still running Cumulus Linux 2.5 ESR or any other release earlier than 3.0.0, the switches will not be dual-connected after you upgrade the first switch. To ensure a smooth upgrade, follow these steps:

  1. Run cl-img-select -fr to boot the switch in the secondary role into ONIE, then reboot the switch.
  2. Install Cumulus Linux 3.3.0 onto the secondary switch using ONIE. At this time, all traffic is going to the switch in the primary role.
  3. After the install, copy the license file and all the configuration files you backed up, then restart the switchd, networking and Quagga services. All traffic is still going to the primary switch.
    cumulus@switch:~$ sudo systemctl restart switchd.service
    cumulus@switch:~$ sudo systemctl restart networking.service
    cumulus@switch:~$ sudo systemctl restart quagga.service
  4. Run cl-img-select -fr to boot the switch in the primary role into ONIE, then reboot the switch. Now, all traffic is going to the switch in the secondary role that you just upgraded to version 3.3.0.
  5. Install Cumulus Linux 3.3.0 onto the primary switch using ONIE. 
  6. After the install, copy the license file and all the configuration files you backed up.
  7. Disable clagd in the /etc/network/interfaces file (set clagd-enable to no), then restart the switchd, networking and Quagga services.
    cumulus@switch:~$ sudo systemctl restart switchd.service
    cumulus@switch:~$ sudo systemctl restart networking.service
    cumulus@switch:~$ sudo systemctl restart quagga.service
  8. Enable clagd again in the /etc/network/interfaces file (set clagd-enable to yes), then run ifreload -a.
    cumulus@switch:~$ sudo ifreload -a
  9. Now the two switches are dual-connected again and traffic flows to both switches.

 SNMP Not Supported in Quagga

There is no SNMP support for Quagga in Cumulus Linux. However, it's possible to get it via SNMP by:

  • Using Nagios
  • Writing a pass persist script in Perl or Python by filling in the OSPF or BGP (rfc) MIBs manually.
  • Creating your own private MIB for the information you need.

Due to this circumstance, you must remove all references to smux in each of the following configuration files. You must also remove these references before upgrading Cumulus Linux using apt-get. If the smux entries are present in the configuration files, the daemons in the 2.5 packaged version of Quagga will not start.

  1. cd /etc/quagga
  2. grep smux *
  3. Delete all lines in the config files containing the smux keyword.

The references to smux that must be removed are:

  • In bgpd.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.2 quagga_bgpd
  • In ospf6d.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.6 quagga_ospf6d
  • In ospfd.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.5 quagga_ospfd
  • In zebra.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.1 quagga_zebra

 Perl, Python and BDB Modules

Any Perl scripts that use the DB_File module or Python scripts that use the bsddb module won't run under Cumulus Linux 3.3.0.

Documentation

You can read the technical documentation here.

Issues Fixed in Cumulus Linux 3.3.0 Update 2017-05-09

Cumulus Networks has made important package updates available for Cumulus Linux 3.3.0 that resolve the issues listed below. These fixes were applied to the Cumulus Networks repository on May 9, 2017.

Cumulus Networks strongly recommends you upgrade your Cumulus Linux distribution to avoid these issues (do not do a binary install). Follow these steps:

  1. Run apt-get update.
  2. Run apt-get upgrade
Release Note ID Summary Description
  
RN-607 (CM-16168)
NCLU: netd crashes if LLDP neighbor does not have "SysName"

The net show interface output displays the LLDP hostname for the neighbor at the other end of the link. If a link is up and has LLDP information but does not have a "SysName" field, netd crashes.

This issue has been fixed in the May 9 update to Cumulus Linux 3.3.0.

Issues Fixed in Cumulus Linux 3.3.0

The following is a list of issues fixed in Cumulus Linux 3.3.0 from earlier versions of Cumulus Linux.

Release Note ID Summary Description

RN-383 (CM-7196)
admin down of link deletes IPv6 nexthop static route entry, but not for IPv4

When a link is admin down and carrier is on, the IPv4 nexthop entry is marked dead, but the IPv6 nexthop entry is deleted, and will not be restored when the link is admin up. However, if carrier is off, the IPv6 nexthop is marked dead and not deleted. The nexthop will now not be deleted if ignore_rules_with_linkdown is set.

This issue is fixed in Cumulus Linux 3.3.0.


RN-390 (CM-9055)
If you’re logged into the serial console and type reboot, the system may hang indefinitely

In Cumulus Linux, if the serial console terminal settings are changed and clocal (modem carrier) is turned off,  systemd may do all of the following:

  • Block and stop handling systemctl changes
  • Fail to start or restart services
  • Prevent reboot and shutdown

For example, running the /usr/bin/reset command may cause this problem when run on the serial console. Once in this state, a new login session will not be started on the serial console (/dev/ttyS0 or ttyS1) after logout, and systemd will not respond to systemctl commands, including reboot.

To work around this issue, whenever you are logged in to the serial console and run reset, run stty clocal afterwards.

This issue is fixed in Cumulus Linux 3.3.0.


RN-538 (CM-13106)
Link issue between Edge-Core AS4610-54P and Supermicro server with embedded NIC x552 2x10G SFP+

The Supermicro server reports via dmesg that when the link starts to come up, it begins flapping, and eventually comes up after 15-45 minutes. The link remains stable if no link down event occurs.

If the link is brought down (due to cable disconnect, or link set down/up), then the flap occurs again the next time the link tries to come up.

Writing to the SERDESDIGITAL_MISC2 register was found to cause a link glitch, and a fix was implemented to correct the issue.


RN-553 (CM-14700)
Some MSDP peer receivers do not receive packets after source stops and starts again

Within a PIM-SM network, if a source stops sending multicast packets for over 3 minutes and the first hop router (FHR) adds a new downstream prior to the source starting again, MSDP peers may stop receiving SA updates, and receivers not using the source's selected registration RP will lose traffic for the length of the multicast stream.

Due to a specific state transition during a quiet period for source, the FHR stops sending registration packets. MSDP does not consider a source active if registration packets are not being sent to the RP.

This issue is fixed in Cumulus Linux 3.3.0.


RN-554 (CM-14692)
TestIpmcBondVlanCfgFhr failure when untagged traffic is sent on a bond subinterface

Due to a software defect, if a bond VLAN subinterface is created as a router interface before the first slave port is added to the bond, packets forwarded out of the bond VLAN subinterface are incorrectly sent without a VLAN tag. This problem only occurs on switches with the Mellanox Spectrum ASIC, and only depends on the interface creation sequence. It is independent of runtime link state change.

To work around this issue, flap the base bond interface by running ifdown then ifup. For example:

cumulus@switch:~$ sudo ifdown bond0; ifup bond0

This issue is fixed in Cumulus Linux 3.3.0.


RN-570 (CM-14499)
apt-get upgrade overwrites edits to TCAM and buffering profiles in datapath.conf without prompting

If you changed the buffering or TCAM profiles in either of the following files, the changes will be lost when you upgrade the cumulus-tools package:

  • /usr/lib/python2.7/dist-packages/cumulus/__chip_config/bcm/datapath.conf
  • /usr/lib/python2.7/dist-packages/cumulus/__chip_config/mlx/datapath.conf

Since the files are not marked as configuration files, they get overwritten without warning.

If you have changed either or both of these files, make sure to back them up before running apt-get upgrade or otherwise upgrading the cumulus-tools package, then re-apply your changes to the newly installed files after the upgrade.

This issue is fixed in Cumulus Linux 3.3.0.


RN-572 (CM-14844)
Invalid locale settings can prevent apt-get upgrade from completing

In some cases, if your locale information (language and/or character set) are invalid for Linux, you may encounter errors like the following when running apt-get upgrade when the upgrade snapshot is taken:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Creating pre-apt snapshot... Failed to set locale. Fix your system.
ERROR:/usr/lib/cumulus/apt-snapshot-hook: Unable to create pre snapshot
E: Problem executing scripts DPkg::Pre-Invoke '/usr/lib/cumulus/apt-snapshot-hook pre-invoke'
E: Sub-process returned an error code

This is an issue with the snapper application, which takes snapshots of the Cumulus Linux NOS. Cumulus Networks intends to update snapper in the future so this issue will not cause an error. 

To work around this error, set your locale information to valid settings, such as the following:

export LC_CTYPE=en_US.UTF-8

Then run apt-get upgrade again.

This issue is fixed in Cumulus Linux 3.3.0.


RN-576 (CM-14908)
TACACS sends authentication requests out of the default VRF, not the management VRF

If a management VRF if configured, TACACS won't send authentication requests out of the management VRF. Instead, it sends these requests out of the default VRF.

To work around this issue, run the following commands, which restrict inbound SSH to only the management VRF interface and disable inbound SSH via the switch ports. Note that using SSH via the front panel ports is not a workaround.

cumulus@switch:~$ sudo systemctl disable ssh.service
cumulus@switch:~$ sudo systemctl stop ssh.service
cumulus@switch:~$ sudo systemctl enable ssh@mgmt.service
cumulus@switch:~$ sudo systemctl start ssh@mgmt.service

This issue is fixed in Cumulus Linux 3.3.0.


RN-580 (CM-15577)
Fix for CVE-2017-6964/DSA-3823: eject command doesn't check errors from dropping privilege

The following Debian security advisory was fixed in Cumulus Linux 3.3.0:

* -------------------------------------------------------------------------

Debian Security Advisory DSA-3823-1 security@debian.org
https://www.debian.org/security/ Salvatore Bonaccorso
March 28, 2017 https://www.debian.org/security/faq
* -------------------------------------------------------------------------

Package : eject
CVE ID : CVE-2017-6964
Debian Bug : 858872

Ilja Van Sprundel discovered that the dmcrypt-get-device helper used to
check if a given device is an encrypted device handled by devmapper, and
used in eject, does not check return values from setuid() and setgid()
when dropping privileges.

For the stable distribution (jessie), this problem has been fixed in
version 2.1.5+deb1+cvs20081104-13.1+deb8u1.

For the unstable distribution (sid), this problem has been fixed in
version 2.1.5+deb1+cvs20081104-13.2.

We recommend that you upgrade your eject packages.


RN-582 (CM-15889)
Fix for CVE-2016-10229: remotely exploitable udp MSG_PEEK vulnerability in linux kernel

 

CVE ID: CVE-2016-5195
https://nvd.nist.gov/vuln/detail/CVE-2016-10229
https://security-tracker.debian.org/tracker/CVE-2016-10229

Description: An error in the UDP checksum on receiving packets allows remote attackers to execute arbitrary code via UDP traffic that triggers an unsafe second checksum calculation during execution of a recv system call with the MSG_PEEK flag.

This bug has been present since the 2.6.1 Linux kernel, and is present in all Cumulus Linux releases since version 2.5.0.

It is fixed in Cumulus Linux 3.3.0.


RN-585 (CM-15120)
switchd stops unexpectedly when VXLAN encapsulation is configured over a subinterface

switchd can stop unexpectedly in environments where VXLAN-encapsulated traffic is configured to egress a subinterface.

VXLAN encapsulation over a subinterface is not supported on both Broadcom and Mellanox switches.

This issue regarding switchd stopping unexpectedly has been fixed in Cumulus Linux 3.3.0.


RN-586 (CM-15201)
In NCLU, VLANs reported as disabled on interface

The output of net show interface INTERFACE shows VLANs in a disabled state. However, the STP state is correct and ping works.

This issue is fixed in Cumulus Linux 3.3.0.


RN-587 (CM-15906)
net show interface fails with certain MSTP treeprio values

Configuring any of the following MSTP treeprio values causes an error when running net show interface: 40960, 45056, 49152, 53248, 57344, 61440.

This issue is fixed in Cumulus Linux 3.3.0.


RN-588 (CM-15925)
Cannot configure interface cost under OSPF using NCLU

Users could not configure cost under an OSPF interface.

This issue is fixed in Cumulus Linux 3.3.0.


RN-589 (CM-15200)
On Mellanox switches, setting link-speed in interfaces file requires flapping link to bring link up

A customer reported that when link-speed was configured under an interface in /etc/network/interfaces, the link still didn't come up. After flapping the interface (by running ip link set down then ip link set up), the interface came up as expected.

This issue is fixed in Cumulus Linux 3.3.0.


RN-590 (CM-15545)
Cannot install IPv6 ECMP routes from BGP learned over a BGP VRF instance

IPv6 prefixes learned over BGP VRF are not installed as expected, causing the nexthop count to be incorrectly calculated for any instance other than the default VRF.

This issue is fixed in Cumulus Linux 3.3.0.


RN-591 (CM-15942)
Configuring PFC and ECN on a Mellanox switch results in switchd failing to start

When configuring both priority flow control and explicity noise congestion on a Mellanox switch running Cumulus Linux, switchd failed to start.

This issue is fixed in Cumulus Linux 3.3.0.


RN-592 (CM-15908)
ACL fp_range table fills when enabling non-atomic mode

After enabled non-atomic mode for an ACL, the fp_range table fills up, causing an error to occur the next time a unique range is added to the rule:

error: hw sync failed (INGRESS filter V4MAC table: Entry match add failed: Out of L4 dest port range resource)

This issue is fixed in Cumulus Linux 3.3.0.


RN-593 (CM-15758)
After upgrading to Cumulus Linux 3.2.1, two services try to run poed

Upgrading Cumulus Linux 3.1.2 to version 3.2.1 results in two Power over Ethernet services running on the switch: poed.service and cumulus-poe.service. As a result, the poed.service fails and the system is reported in a degraded state.

This issue arose when PoE was moved into its own package, and the name of its service file changed from poed.service to cumulus-poe.service. The upgrade resulted in the old poed.service remaining active.

This issue is fixed in Cumulus Linux 3.3.0.


RN-594 (CM-16081)
ZTP returns the serial number in dmidecode, instead of decode-syseeprom

When using zero touch provisioning to configure a switch, ZTP, which provides the switch serial number, returns it in demidecode, instead of decode-syseeprom. This results in "NO DIMM" being returned as the serial number.

This issue is fixed in Cumulus Linux 3.3.0.

Known Issues in Cumulus Linux 3.3.0

The following issues are open and affect the current release.

Release Note ID Summary Description

RN-52 (CM-997,
CM-1013)
Parameters like the router ID and DR priority cannot be changed while OSPFv2/v3 is running Router ID and DR priority can only be changed by shutting down OSPFv2/v3, changing the ID, and restarting the OSPF process.

A change to the DR priority may not properly be reflected in the LSAs that are still aging out.

RN-56 (CM-343)
IPv4/IPv6 forwarding disabled mode not recognized

If either of the following is configured:

net.ipv4.ip_forward == 0 

or:

net.ipv6.conf.all.forwarding == 0 

The hardware still forwards packets if there is a neighbor table entry pointing to the destination.


RN-77 (CM-265)
New routes/ECMPs can evict existing/installed Cumulus Linux syncs routes between the kernel and the switching silicon. If the required resource pools in hardware fill up, new kernel routes can cause existing routes to move from being fully allocated to being partially allocated.

In order to avoid this, routes in the hardware should be monitored and kept below the ASIC limits.

For example, on systems with Trident+ chips, the limits are as follows:
routes: 16384 <<<< if all routes are ipv4 
 long mask routes 256 <<<< i.e., routes with a mask longer 
       than the route mask limit 
 route mask limit 64
 host_routes: 8192 
 ecmp_nhs: 4044 
 ecmp_nhs_per_route: 52 
That translates to about 77 routes with ECMP NHs, if every route has the maximum ECMP NHs.

Monitoring this in Cumulus Linux is performed via the cl-resource-query command:
cumulus@switch:~$ sudo cl-resource-query
 hosts : 3 
 all routes : 29 
 IP4 routes : 17 
 IP6 routes : 12 
 nexthops : 3 
 ecmp_groups : 0
 ecmp_nexthops : 0
 mac entries : 0 / 131072 
 bpdu entries : 500 / 512
The resource to monitor is the ecmp_nexthops. If this count is close to 4044, new ECMPs may evict existing routes.

RN-120 (CM-477)
ethtool LED blinking does not work with switch ports Linux uses ethtool -p to identify the physical port backing an interface, or to identify the switch itself. Usually this identification is by blinking the port LED until ethtool -p is stopped.

This feature does not apply to switch ports (swpX) in Cumulus Linux.

RN-121 (CM-2123)
PTMD: When a physical interface is in a PTM FAIL state, its subinterface still exchanges information Issue:
When PTMD is incorrectly in a failure state and the Zebra interface is enabled, PIF BGP sessions are not establishing the route, but the subinterface on top of it does establish routes.

If the subinterface is configured on the physical interface and the physical interface is incorrectly marked as being in a PTM FAIL state, routes on the physical interface are not processed in Quagga, but the subinterface is working.

Steps to reproduce:
cumulus@switch:$ sudo vtysh -c 'show int swp8' 
Interface swp8 is up, line protocol is up 
PTM status: fail
index 10 metric 1 mtu 1500 
 flags: <UP,BROADCAST,RUNNING,MULTICAST>
 HWaddr: 44:38:39:00:03:88 
 inet 12.0.0.225/30 broadcast 12.0.0.227 
 inet6 2001:cafe:0:38::1/64 
 inet6 fe80::4638:39ff:fe00:388/64 
cumulus@switch:$ ip addr show | grep swp8 
 10: swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc pfifo_fast state UP qlen 500 
  inet 12.0.0.225/30 brd 12.0.0.227 scope global swp8 
 104: swp8.2049@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.229/30 brd 12.0.0.231 scope global swp8.2049 
 105: swp8.2050@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.233/30 brd 12.0.0.235 scope global swp8.2050 
 106: swp8.2051@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.237/30 brd 12.0.0.239 scope global swp8.2051 
 107: swp8.2052@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.241/30 brd 12.0.0.243 scope global swp8.2052 
 108: swp8.2053@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP>
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.245/30 brd 12.0.0.247 scope global swp8.2053 
 109: swp8.2054@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.249/30 brd 12.0.0.251 scope global swp8.2054
 110: swp8.2055@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP>
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.253/30 brd 12.0.0.255 scope global swp8.2055
cumulus@switch:$ bgp sessions: 
 12.0.0.226 ,4 ,64057 , 958 , 1036 , 0 , 0 , 0 ,15:55:42, 0, 10472 
 12.0.0.230 ,4 ,64058 , 958 , 1016 , 0 , 0 , 0 ,15:55:46, 187, 10285
 12.0.0.234 ,4 ,64059 , 958 , 1049 , 0 , 0 , 0 ,15:55:40, 187, 10285 
 12.0.0.238 ,4 ,64060 , 958 , 1039 , 0 , 0 , 0 ,15:55:45, 187, 10285 
 12.0.0.242 ,4 ,64061 , 958 , 1014 , 0 , 0 , 0 ,15:55:46, 187, 10285 
 12.0.0.246 ,4 ,64062 , 958 , 1016 , 0 , 0 , 0 ,15:55:46, 187, 10285 
 12.0.0.250 ,4 ,64063 , 958 , 1029 , 0 , 0 , 0 ,15:55:43, 187, 10285 
 12.0.0.254 ,4 ,64064 , 958 , 1036 , 0 , 0 , 0 ,15:55:44, 187, 10285 

RN-125 (CM-1576)
Network LSA with an old router ID isn't flushed out by the originator
When the router ID is changed, the router should remove the previous network LSA (link-state advertisement) that it generated based on the IP address on the interface in the Network LSA.

Cumulus Networks doesn't remove this LSA, so it will be naturally aged out.

RN-198 (CM-3290)
Port LEDs behave differently on different switch models

It's been observed that port LEDs behave differently depending upon the make and model of the switch. For example:

  • Agema AG-7448CU: the LED is off when the link is up. It blinks on briefly when there is traffic.
  • Edge-Core AS4600-54T: the LED is off when the link is up. It blinks on briefly when there is traffic.
  • QuantaMesh T3048-LY2R: the LED is on when the link is up. It blinks off briefly when there is traffic.

Cumulus Networks is currently working to fix this issue.


RN-199 (CM-2624)
When a Quagga route-map is modified, the switch could use the partial map before edits are completed

Cumulus Linux triggers a route-map update before the user finishes editing the route map, resulting in an incorrect route map being used. The route-map update trigger should only occur when user finishes editing the map.

Cumulus Networks is working to fix this issue.


RN-221 (CM-3926, CM-4501)
BGP graceful restart, including helper mode, not fully supported If you encounter issues with this, please submit a support request and include the output from cl-support with your ticket.

RN-227 (CM-3388)
BGP dynamic capability is not supported BGP peer sessions with dynamic capability are not supported under any version of Cumulus Linux at this time.

RN-322 (CM-7387)
Interfaces disabled using iproute2 become enabled after restarting Quagga By default, all interfaces have a "no shutdown" associated with them in Quagga. Thus, when you restart Quagga, it enables the interfaces. This is expected behavior in Quagga. There is no workaround at this time.

RN-327 (CM-4290)
Changing the route-map parameter of the redistribute command in OSPF and BGP doesn't affect the state of the resulting redistribution in those protocols

To work around this issue, remove any old redistribute command configurations before adding a new one with or without route-map as a parameter.

For example, if OSPF has a redistribute configuration such as redistribute bgp route-map redist-map-name, you would enable redistribution without a route-map by following these steps in OSPF configuration mode:

  1. no redistribute bgp
  2. redistribute bgp

You would perform a similar sequence of commands for redistribution changes in BGP as well.


RN-355 (CM-7994)
OSPFv2 Area ID being implicitly translated from Integer format to dotted decimal format

While OSPF area ID configuration in Quagga allows for the value to be specified in either dotted decimal format, or as an integer, values specified as an integer will be converted into dotted decimal format when displayed, causing potential confusion for the operator.

This issue does not impact OSPF functionality; only the display output. However, it is recommended that the OSPF area ID is specified in dotted decimal format for consistency.

 

RN-382 (CM-6692)
Quagga: Removing bridge via ifupdown2 does not remove it from Quagga Removing a bridge using ifupdown2 does not remove it from the Quagga configuration files. This issue is being investigated; however, restarting Quagga will successfully remove the bridge.

RN-384 (CM-7684)
Keeping VXLAN single-connected devices up on MLAG secondary node In the current MLAG secondary design, if the VXLAN device is not dual-connected, it is kept in a protodown state. You can keep them up with individual IP addresses rather than anycast IPs when the peerlink is down, so that all single-connected hosts will have connectivity. Further investigation regarding this issue is underway.

RN-387 (CM-8163)
Quagga appears to not honor passive interfaces if VRR is active

In a VRR configuration, any interface-specific routing configuration (e.g., OSPF mode of operation) specified on the subinterface having a virtual IP address does not take effect. This is because when an operator has specified a virtual IP on a bridge, the system creates another internal interface bridge with the virtual IP and MAC. These two interfaces are treated distinctly by Quagga, so any interface-specific routing configuration on the bridge does not get carried over to the second bridge.

In a VRR deployment needing any interface-specific routing configuration on the interface with a virtual IP address, the routing configuration has to be specified against the internally-created virtual interface also.


RN-389 (CM-8410)
switchd supports only port 4789 as the UDP port for VXLAN packets

switchd currently allows only the standard port 4789 as the UDP port for VXLAN packets. There are cases where a hypervisor could be using non-standard UDP port, which would cause VXLAN exchanges with the hardware VTEP to not work. In such a case, packets would not be terminated and encapsulated packets would be sent out on UDP port 4789.


RN-391 (CM-9631)
Dell S4048 unresponsive after TX Unit Hang detected

After booting a Dell S4048 switch, the switch becomes unresponsive and errors like the following appear in the console log:

[ 1206.440277] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1206.440277]   Tx Queue             <0>
[ 1206.440277]   TDH                  <2d>
[ 1206.440277]   TDT                  <2e>
[ 1206.440277]   next_to_use          <2e>
[ 1206.440277]   next_to_clean        <2d>
[ 1206.440277] buffer_info[next_to_clean]
[ 1206.440277]   time_stamp           <1000dcd20>
[ 1206.440277]   next_to_watch        <ffff88007d81b2d0>
[ 1206.440277]   jiffies              <1000dd5d4>
[ 1206.440277]   desc.status          <300000>
[ 1208.439856] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1208.439856]   Tx Queue             <0>
[ 1208.439856]   TDH                  <2d>
[ 1208.439856]   TDT                  <2e>
[ 1208.439856]   next_to_use          <2e>
[ 1208.439856]   next_to_clean        <2d>
[ 1208.439856] buffer_info[next_to_clean]
[ 1208.439856]   time_stamp           <1000dcd20>
[ 1208.439856]   next_to_watch        <ffff88007d81b2d0>
[ 1208.439856]   jiffies              <1000ddda4>
[ 1208.439856]   desc.status          <300000>
[ 1210.439414] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1210.439414]   Tx Queue             <0>
[ 1210.439414]   TDH                  <2d>
[ 1210.439414]   TDT                  <2e>
[ 1210.439414]   next_to_use          <2e>
[ 1210.439414]   next_to_clean        <2d>
[ 1210.439414] buffer_info[next_to_clean]
[ 1210.439414]   time_stamp           <1000dcd20>
[ 1210.439414]   next_to_watch        <ffff88007d81b2d0>
[ 1210.439414]   jiffies              <1000de574>
[ 1210.439414]   desc.status          <300000>
[ 1212.438966] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1212.438966]   Tx Queue             <0>
[ 1212.438966]   TDH                  <2d>
[ 1212.438966]   TDT                  <2e>
[ 1212.438966]   next_to_use          <2e>
[ 1212.438966]   next_to_clean        <2d>
[ 1212.438966] buffer_info[next_to_clean]
[ 1212.438966]   time_stamp           <1000dcd20>
[ 1212.438966]   next_to_watch        <ffff88007d81b2d0>
[ 1212.438966]   jiffies              <1000ded44>
[ 1212.438966]   desc.status          <300000>
[ 1212.490329] igb 0000:00:14.0 eth0: Reset adapter

Rebooting the switch again stops the behavior.


RN-404 (CM-4407)
Aggregating routes in BGP with as-set can result in high CPU usage

When BGP is configured with aggregate addresses with as-set configuration and there are many routes to be aggregated, the BGP process gets into high CPU usage.

To work around this issue, do not specify the as-set parameter for the aggregate-address configuration.


RN-406 (CM-9895)
Mellanox SN2700 power off issues

On the Mellanox SN2700 and SN2700B switches, if any of the following occur:

  • A shutdown or poweroff command is executed
  • A temperature sensor hits a critical value and shuts down the box

Once a PDU power cycle is issued, the box appears to be dead for at least 3 minutes.


RN-409 (CM-10054)
BGP may show an inaccessible path as the best path

Existing BGP issues caused peering between a VRF device and a loopback BGP session to stay up if the loopback session doesn’t advertise its local address.

This issue will be fixed in a future release.


RN-446 (CM-10513)
Redistribute neighbor does not work with more than 1024 interfaces

The rdnbrd service crashes because it cannot work with more than 1024 interfaces.

This issue should be fixed in a future release of Cumulus Linux.


RN-448 (CM-11302)
Using the json option in the "show ip bgp" command causes peer session flaps

This issue causes peer session flaps on Penguin Arctica 4806XP and Supermicro SSE-X3648S switches. It occurs with 16K IPv4 prefixes and only when you run show ip bgp json.

However, on switches with Tomahawk ASICs, with 61K IPv4 prefixes and default timers, the same show ip bgp json command causes all peer sessions to go down.

This is a known issue that should be fixed in a future release of Cumulus Linux.


RN-450 (CM-12252)
802.1p remark in traffic.conf behaves differently on Mellanox vs. Broadcom switches

The 802.1p remark defined in traffic.conf acts differently on a Mellanox switch when compared to a Broadcom switch.

On the Mellanox platform, the remark defined in the traffic.conf file takes precedence even if there is an ACL rule that is matched.

On the Broadcom platform, the ACL rule takes precedence over the remark defined in the traffic.conf file.


RN-451 (CM-12344)
Mellanox switch rejects SPAN ACL rule for an output interface that is a subinterface

This is a known issue at this time.


RN-455 (CM-12578)
The cumulus-poe package is not installed on ARM switches after upgrading to version 3.1

After upgrading Cumulus Linux from a binary image installation of version 3.1 or earlier, the cumulus-poe package does not get installed on ARM-based switches. THe following message appears after you reboot the switch:

[ OK ] Stopped Cumulus Linux POE Daemon.
Starting Cumulus Linux POE Daemon...
[FAILED] Failed to start Cumulus Linux POE Daemon.

In order to use Power over Ethernet (PoE) on an ARM switch, you need to install the cumulus-poe package:

cumulus@switch:~$ sudo apt-get update
cumulus@switch:~$ sudo apt-get install cumulus-poe
cumulus@switch:~$ sudo apt-get upgrade

RN-525 (CM-12715)
On a Quanta IX1 switch, 100G OSI LR4 module (QSFP28-LR4-OSI) doesn't advertise 40G in transceiver codes, so cannot be used at 40G speed

Cumulus Linux can only support speeds on modules that are advertised in each module's transceiver codes. You can determine this by running ethtool -m:

cumulus@switch:~$ sudo ethtool -m
...

    Transceiver codes                       : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    Transceiver type                        : 100G Ethernet: 100G Base-LR4

...

The example above shows that this module supports only 100G; it cannot support 40G speeds. Similarly, if a module advertises 40G support only, it cannot support 100G speeds.


RN-526 (CM-13037)
Upgrading the clag package fails when logrotate contains an invalid date

While upgrading from Cumulus Linux 3.0.1 to 3.1.z, the clag package does not update if the logrotate file contains an invalid date. This can occur due to bad batteries in the switch or RTC clock chip issues. The error may look like the following:

error: bad year 1929 for file /var/log/boot.log in state file /var/lib/logrotate/status
dpkg: error processing package clag (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 clag
E: Sub-process /usr/bin/dpkg returned an error code (1)

You can work around this issue by removing the /var/lib/logrotate/status file, and forcing a logrotate. After you do this, the upgrade should be successful.


RN-536 (CM-12628)
Port LEDs on ARM 1G-T platforms do not show link status (green), but blink on activity

On the DNI-3048UP, Edge-Core 4610-54P and HP 4610-54T, there is no link LED for 1G-T ports. The SFP+ ports correctly show the link LEDs.

The Dell S3048-ON, which has a Helix4 ASIC, correctly shows the link LED.

This issue is currently being investigated.


RN-537 (CM-12967)
Pause frames sent by a Tomahawk switch are not honored by the upstream switch

An issue exists when link pause or priority flow control (PFC) is enabled on a Broadcom Tomahawk-based switch, and there is over-subscription on a link, where the ASIC sends pause frames aggressively, causing the upstream switch not to throttle enough.

If you need link pause or PFC functionality, then to work around this issue, you must use a switch that does not use the Tomahawk ASIC.


RN-540 (CM-13428)
Quagga reload fills FIB after renaming the table map

Trying to rename a table map — essentially deleting the current table map and adding a new one — causes Quagga to try to install all the routes again, which fills the FIB. The table map begins functioning again on its own after some time and without any intervention, and the FIB usage returns to a normal level.

This issue is currently being investigated.


RN-542 (CM-13461)
Polling the BGP RIB with "show ip bgp" causes the peer to flap if the RIB has more than 600K entries

This is a known issue that's currently being investigated. The Quagga log shows these commands taking a very long to execute.

To work around this issue, Cumulus Networks recommends you use larger keepalive/hold timers — 60 and 180 seconds, respectively.


RN-544 (CM-13739)
Speed 1000 breaks porttab on Edge-Core AS5812-54X, changing the interface name from xe to ge When the port speed is changed from 10G/40G to 1G, the SDK dport interface name changes from xe to ge. This causes switchd APIs that use the BCP dport name to fail when the logical port is not generated. There is currently no fix for a persistent configuration.

RN-545 (CM-13800)
OSPFv3 redistribute connected with route-map broken at reboot (or ospf6d start) This issue only affects OSPFv3 (IPv6) and is being investigated at this time.

RN-548 (CM-14061)
ethtool --show-fec does not reflect correct state The ethtool --show-fec command does not accurately report the current link state. The cause of this issue is being investigated.

RN-549 (CM-14106)
For a 100G capable port and cable, enabling auto-negotiation fails when the port speed is set to 40G in ports.conf An issue is present on 100G capable ports and cables, where auto-negotiation fails when set to 40G in the /etc/cumulus/ports.conf file. This issue is currently being investigated.

RN-552 (CM-14549)
After setting interface speed to 40G in ports.conf on a Mellanox switch, ethtool still shows interface as 100G

This is a known issue whereby ethtool does not update after restarting switchd, so it continues to display the outdated port speed.

To correctly set the port speed, use NCLU or ethtool to set the speed instead of hand editing the ports.conf file.

For example, to set the speed to 40G using NCLU:

cumulus@switch:~$ net add interface swp1 link speed 40000 

Or using ethtool:

cumulus@switch:~$ sudo ethtool -s swp1 speed 40000

RN-579 (CM-15058)
Operating switch ports in Cumulus Linux at 1G speeds on Trident II+ platforms unreliable 

For Cumulus Linux running on Broadcom Trident II+ platforms, operation at 1000Mbps speed setting is not currently reliable. Impacted platforms include:

  • Dell S4048T-ON
  • Edge-Core AS5812-54T
  • Edge-Core AS5812-54x
  • HPE Altoline 6921T
  • HPE Altoline 6921
  • Cumulus Express CX 4048T
  • Cumulus Express CX 4048S

 


RN-595 (CM-15934)
Egress ACL statistics not accurate on Broadcom-based switches

The counters for egress ACLs may not accurately represent the amount of packets matching the rule.

This issue is being investigated.


RN-597 (CM-15705)
sFlow doesn't generate flow samples to sflowd on Tomahawk-based switches At this time, sFlow is not supported on switches with Tomahawk ASICs. This is a known issue. 

RN-598 (CM-15575)
CLAGD process restarts when updating backup-ip

An error was found when an accidental change was made to the backup IP, and then corrected. ifreload -a would restart the clagd process to invoke the daemon with the new backup IP, rather than updating the backup IP with the change.

This issue is being investigated.


RN-599 (CM-15949)
DHCRELAY automatically binds to eth0 when not specified in the configuration dhcrelay listens for all interfaces that have an IP, even if not configured to listen for that interface. This causes dhcrelay to bind to unspecified ports.

This behavior is expected, due to upstream configuration. The packet is dropped later in the process, as it is not coming from a configured port.


RN-601 (CM-15926)
VRR breaks redistribute neighbor When a neighbor is learned on an interface running VRR, a duplicate /32 entry is created in table 10, and Quagga stops redistributing it. However, restarting Quagga causes the routes to show up. This is caused by an expectation that there will be only one RIB entry from a route source for any prefix, and will be fixed in the next Cumulus Linux release.

RN-604 (CM-15959)
ARP suppression does not work well with VxLAN A-A

In some instances, ARP requests do not get suppressed (when they ought to be) in a VxLAN A-A scenario, but instead get flooded over VxLAN tunnels. This issue is caused because there is no "control plane" syncing the snooped local neighbor entries between the CLAG pair; CLAG does not perform this sync, and neither does EVPN.

This issue is being investigated.


RN-605 (CM-15515)
Unable to change the bond-modes using ifup or ifreload When the bond mode is changed from 802.3ad to balance-xor or vice versa using ifup bondx or ifreload -a, the bond-mode does not change, and the following error is produced:
2017-03-23 21:39:37,495:  DEBUG:      autolib.netobjects: [cumulus@127.0.0.1:1042] sudo: ('ifup bond1',)
2017-03-23 21:39:37,926:  DEBUG:      autolib.netobjects: warning: error writing to file /sys/class/net/bond1/bonding/mode([Errno 39] Directory not empty)

This issue is being addressed in a later release.


RN-606 (CM-6366)
BGP: MD5 password is not enforced for dynamic neighbors

It was determined that the MD5 password configured against a BGP listen-range peer-group (used to accept and create dynamic BGP neighbors) is not enforced. This means that connections are accepted from peers that don't specify a password; and only if they don't.

This issue is being investigated.


RN-608 (CM-16145)
Buffer monitoring default port group discards_pg only accepts packet collection type

The default port group discards_pg does not accept packet_extended or packet_all collection types.

This issue is currently being investigated.


RN-617 (CM-16413)
Complete loss of traffic on bond subinterface when member goes down

On a switch where a bond and a subinterface of that bond are configured, when a member of that bond goes down, all unicast IP traffic destined to the switch is not terminated.

This issue is currently being investigated.


RN-653 (CM-17856) 
Enabling PFC on Mellanox switches may cause switchd to crash

On Cumulus Linux versions 3.3.0 and later, enabling priority flow control (PFC) on Mellanox Spectrum-based switches may cause the switchd process to crash.

To work around this issue, populate the unlimited_egress_buffer_port_set parameter in the /etc/cumulus/datapath/traffic.conf file. The default range should be "swp<a>-swp<z>", where "swp<a>" is the first front panel port in /var/lib/cumulus/porttab and "swp<z>" is the last front panel port in the porttab file. For example:

# priority flow control
pfc.port_group_list = [pfc_port_group]
pfc.pfc_port_group.cos_list = [0]
pfc.pfc_port_group.port_set = swp1-swp5
pfc.pfc_port_group.port_buffer_bytes = 25000
pfc.pfc_port_group.xoff_size = 10000
pfc.pfc_port_group.xon_delta = 2000
pfc.pfc_port_group.tx_enable = true
pfc.pfc_port_group.rx_enable = true
pfc.pfc_port_group.unlimited_egress_buffer_port_set = swp1-swp16

RN-658 (CM-17338) 
Power cycling a connected host may result in control plane traffic failure on a 10G BASE-T Trident II+ switch

Switches with the Trident II+ chipset running Cumulus Linux 3.3.0 or later may experience a failure to transmit frames from the control plane following a power-cycle of a device connected via 10GBASE-T. This can result in complete loss of connectivity from the switch control plane to connected devices.

To work around this issue, restart switchd with sudo systemctl restart switchd. A fix is currently under investigation.

Have more questions? Submit a request

Comments

Powered by Zendesk