Cumulus VX 3.0.1 Release Notes

Follow

Overview

Cumulus VX is a community-supported virtual environment for cloud and network administrators to test the latest technology from Cumulus Networks, removing all organizational and economic barriers to getting started with open networking in your own time, at your own pace, and within your own environment.

The environment can be used to learn about, and evaluate, Cumulus Linux, anytime and anywhere, producing sandbox environments for prototype assessment, pre-production rollouts, and script development.

These release notes support Cumulus VX 3.0.1 and describe its features and known issues.

Stay up to date: Click Follow above so you can receive a notification when we update these release notes.

{{table_of_contents}}

What's New

Cumulus VX 3.0.1 includes bug fixes only.

Cumulus Linux 3.0.z is a significant departure from earlier releases. See the user guide for details on new behaviors and functionality.

Downloading Cumulus VX

You can download any of the of the four Cumulus VX images: 

  • An OVA disk image for use with VirtualBox.
  • A VMware-specific OVA disk image.
  • A qcow2 disk image for use with KVM.
  • A Box image for use with Vagrant.

Configuration Notes

Keep in mind the following issues when you are running your Cumulus VX virtual machine.

SNMP Not Supported in Quagga

There is no SNMP support for Quagga in Cumulus VX (see RN 88 below). Due to this circumstance, you must remove all references to smux in each of the following configuration files. If the smux entries are present in the configuration files, the daemons in the 2.5 packaged version of Quagga will not start.

  1. cd /etc/quagga
  2. grep smux *
  3. Delete all lines in the config files containing the smux keyword.

The references to smux that must be removed are:

  • In bgpd.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.2 quagga_bgpd
  • In ospf6d.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.6 quagga_ospf6d
  • In ospfd.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.5 quagga_ospfd
  • In zebra.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.1 quagga_zebra

 Perl, Python and BDB Modules

Any Perl scripts that use the DB_File module or Python scripts that use the bsddb module won't run under Cumulus VX.

Documentation

You can read the technical documentation here.

Community Support

If you have any questions or feedback about Cumulus VX, visit the Cumulus VX community for further support. 

Issues Fixed in Cumulus VX 3.0.1

The following is a list of issues fixed in Cumulus Linux 3.0.1 from earlier versions of Cumulus Linux.

Release Note ID Summary Description

RN-366 (CM-8894)
A policer on a traditional bridge is supported and functional; however, once a bridge has a VNI associated, the policer is no longer matched

Policers were not enabled on VXLAN-enabled bridges. This meant that while a policer on a traditional bridge was supported and functional, once a the bridge had a VNI associated, the policer no longer matched. By adding vpn as a match field in the IFP qset, policers are now enabled on VXLAN bridges, correcting the issue.


RN-431 (CM-11386)
Default route is not resolved for management VRF on a Vagrant KVM setup for cldemo-vagrant

A default route in the main routing table may be installed via eth0 when DHCP configures the interface while using management VRF. To work around this issue, run the following commands:

cumulus@switch:~$ ifdown eth0
cumulus@switch:~$ ifup eth0

This issue is fixed in Cumulus VX 3.0.1.


RN-436 (CM-11441)
Aborting the ZTP process does not unmount the partition if the process didn't exit properly

If you abort the zero touch provisioning (ZTP) process, and ZTP does not exit properly, the partition remains mounted. If you run the process again again, ZTP looks for an unmounted partition, and cannot find the USB partition again.

This issue has been fixed in Cumulus VX 3.0.1.


RN-439 (CM-11286)
ssh does not start when ListenAddress flag is set in sshd_config and VRF configured

In Cumulus VX 3.0.0, sshd was unable find a bind address, and failed to start, when eth0 was configured in a management VRF. This was because SSH was not listed for VRF to generate a service file. SSH has been added to the list of services, and the error is now corrected.

This issue has been fixed in Cumulus VX 3.0.1.


RN-440 (CM-11597)
The ntp (VRF) service does not auto start after reboot although ntp@mgmt service is enabled

In Cumulus VX 3.0.0, the vrf command cleans up old remnants of a VRF prior to configuring it, including stopping systemd processes in the VRF. However, the start command was not included, causing it to not auto start on boot. Boot mode check has been added for clean up, and ntpd/snmpd now start as expected.

This issue has been fixed in Cumulus VX 3.0.1.

Known Issues in Cumulus Linux 3.0.1

Issues are categorized for easy review. Some issues are fixed but will be available in a later release.

Release Note ID Summary Description

RN-52 (CM-997,
CM-1013)
Parameters like the router ID and DR priority cannot be changed while OSPFv2/v3 is running Router ID and DR priority can only be changed by shutting down OSPFv2/v3, changing the ID, and restarting the OSPF process.

A change to the DR priority may not properly be reflected in the LSAs that are still aging out.

RN-56 (CM-343)
IPv4/IPv6 forwarding disabled mode not recognized

If either of the following is configured:

net.ipv4.ip_forward == 0 

or:

net.ipv6.conf.all.forwarding == 0 

The hardware still forwards packets if there is a neighbor table entry pointing to the destination.


RN-58 (CM-747)
IPv6 route is installed and active in the routing table when the associated interface is down If an IPv6 address is assigned to a "down" interface, the associated route is still installed into the route table.

Also, the type of IPv6 address doesn't matter. Link local, site local, and global all exhibit the same problem.

If the interface is bounced up and down, then the routes are no longer in the route table.

RN-64 (CM-1153)
Configuring route-reflector-client requires specific order In configuring a route to be a route reflector client, the Quagga configuration must be specified in a specific order; otherwise, the router will not be a route reflector client.

The "neighbor <IPv4/IPV6> route-reflector-client" command must be done after the "neighbor <IPV4/IPV6> Activate" command; otherwise, the route-reflector-client command is ignored.

Sample configuration:
router bgp 65000
 bgp router-id 0.0.0.4 
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 bgp cluster-id 0.0.0.4 
 bgp bestpath as-path multipath-relax 
 redistribute connected 
 neighbor 14.0.0.1 remote-as 65000 
 neighbor 14.0.0.1 route-reflector-client 
 neighbor 14.0.0.1 activate 
 neighbor 14.0.0.1 next-hop-self 
 neighbor 14.0.0.9 remote-as 65000 
 neighbor 14.0.0.9 activate 
 neighbor 14.0.0.9 next-hop-self 
 neighbor 2001:ded:beef::1 remote-as 65000 
 neighbor 2001:ded:beef:2::1 remote-as 65000 
 maximum-paths 4 
 maximum-paths ibgp 4 
 ! 
 address-family ipv6 
 redistribute connected 
 neighbor 2001:ded:beef::1 activate 
 neighbor 2001:ded:beef::1 next-hop-self 
 neighbor 2001:ded:beef:2::1 route-reflector-client 
 neighbor 2001:ded:beef:2::1 activate 
 neighbor 2001:ded:beef:2::1 next-hop-self 
 maximum-paths 4 
 maximum-paths ibgp 4 
 exit-address-family 

At runtime:
cumulus@switch:$ show ip bgp neighbor 14.0.0.1 
 BGP neighbor is 14.0.0.1, remote AS 65000, local AS 65000, internal link
 BGP version 4, remote router ID 0.0.0.6 
 BGP state = Established, up for 00:23:49
 Last read 23:31:36, hold time is 180, keepalive interval is 60 seconds 
 Neighbor capabilities: 
 4 Byte AS: advertised and received 
 Route refresh: advertised and received(old & new)
 Address family IPv4 Unicast: advertised and received 
 Message statistics: 
 Inq depth is 0 
 Outq depth is 0
 Sent Rcvd 
 Opens: 2 0
 Notifications: 0 0
 Updates: 1 1
 Keepalives: 25 24 
 Route Refresh: 0 0
 Capability: 0 0 
 Total: 28 25 
 Minimum time between advertisement runs is 5 seconds
 For address family: IPv4 Unicast 
 >>>>>>>>>>>>>>>>>>>>>> ROUTE REFLECTOR CLIENT NOT DISPLAYED 
 NEXT_HOP is always this router 
 Community attribute sent to this neighbor(both) 
 6 accepted prefixes 
 Connections established 1; dropped 0 
 Last reset never 
 Local host: 14.0.0.2, Local port: 179 
 Foreign host: 14.0.0.1, Foreign port: 40290 
 Nexthop: 14.0.0.2 
 Nexthop global: 2001:ded:beef::2 
 Nexthop local: fe80::202:ff:fe00:4
 BGP connection: non shared network
 Read thread: on Write thread: off 
 cumulus@switch:$ 

Workaround:
Define in following order 
 address-family ipv4 unicast
 neighbor 14.0.0.9 activate 
 neighbor 14.0.0.9 next-hop-self
 neighbor 14.0.0.9 route-reflector-client >>> Must be after Activate 
 exit-address-family 
 neighbor 2001:ded:beef:2::1 remote-as 65000
 address-family ipv6 unicast 
 redistribute connected
 maximum-paths 4 
 maximum-paths ibgp 4 
 neighbor 2001:ded:beef:2::1 activate 
 neighbor 2001:ded:beef:2::1 next-hop-self 
 neighbor 2001:ded:beef:2::1 route-reflector-client >>> Must be after activate 
 exit-address-family 
 Runtime status after change: 

cumulus@switch:$ show ip bgp neighbors 14.0.0.9 
 BGP neighbor is 14.0.0.9, remote AS 65000, local AS 65000, internal link 
 BGP version 4, remote router ID 0.0.0.7 
 BGP state = Established, up for 00:13:59
 Last read 22:35:13, hold time is 180, keepalive interval is 60 seconds 
 Neighbor capabilities: 
 4 Byte AS: advertised and received 
 Route refresh: advertised and received(old & new) 
 Address family IPv4 Unicast: advertised and received 
 Message statistics: 
 Inq depth is 0 
 Outq depth is 0
 Sent Rcvd 
 Opens: 1 1
 Notifications: 0 0 
 Updates: 2 1 
 Keepalives: 15 14
 Route Refresh: 0 0
 Capability: 0 0 
 Total: 18 16 
 Minimum time between advertisement runs is 5 seconds
 For address family: IPv4 Unicast 
 Route-Reflector Client >>>>>>>>>> PLEASE NOTE ME 
 NEXT_HOP is always this router 
 Community attribute sent to this neighbor(both) 
 6 accepted prefixes 
 Connections established 1; dropped 0 
 Last reset never 
 Local host: 14.0.0.10, Local port: 38813 
 Foreign host: 14.0.0.9, Foreign port: 179
 Nexthop: 14.0.0.10 
 Nexthop global: 2001:ded:beef:2::2 
 Nexthop local: fe80::202:ff:fe00:6 
 BGP connection: non shared network 
 Read thread: on Write thread: off 
 cumulus@switch:$

RN-70 (CM-1166)
ACL: Bridge traffic that matches a LOG ACTION rule is not logged in syslog For example, a bridge with switch ports swp1, swp2, swp3 as bridge members is configured. ACL rules to LOG and DROP for icmp traffic are configured.

Ping requests are sent from host1 on swp1 to host3 on swp3, and the following was observed:
* Counters for both LOG and DROP ACL rules are incrementing properly, but the packets are not showing up on /var/log/syslog.
* Packets that are copied to the CPU from hardware for the LOG rule are dropped due to the check in kernel to disable software bridging for hardware bridged packets.

RN-77 (CM-265)
New routes/ECMPs can evict existing/installed Cumulus Linux syncs routes between the kernel and the switching silicon. If the required resource pools in hardware fill up, new kernel routes can cause existing routes to move from being fully allocated to being partially allocated.

In order to avoid this, routes in the hardware should be monitored and kept below the ASIC limits.

For example, on systems with Trident+ chips, the limits are as follows:
routes: 16384 <<<< if all routes are ipv4 
 long mask routes 256 <<<< i.e., routes with a mask longer 
       than the route mask limit 
 route mask limit 64
 host_routes: 8192 
 ecmp_nhs: 4044 
 ecmp_nhs_per_route: 52 
That translates to about 77 routes with ECMP NHs, if every route has the maximum ECMP NHs.

Monitoring this in Cumulus Linux is performed via the cl-resource-query command:
cumulus@switch:~$ sudo cl-resource-query
 hosts : 3 
 all routes : 29 
 IP4 routes : 17 
 IP6 routes : 12 
 nexthops : 3 
 ecmp_groups : 0
 ecmp_nexthops : 0
 mac entries : 0 / 131072 
 bpdu entries : 500 / 512
The resource to monitor is the ecmp_nexthops. If this count is close to 4044, new ECMPs may evict existing routes.

RN-88 (CM-1200)
SNMP support for Quagga is NOT provided in Cumulus Linux

Cumulus Linux does not provide SNMP support for Quagga.

You can get this information via Nagios

However, it's possible to get it via SNMP by:

  • Writing a pass persist script in Perl or Python by filling in the OSPF or BGP (rfc) MIBs manually
  • Creating your own private MIB for the information you need.

RN-120 (CM-477)
ethtool LED blinking does not work with switch ports Linux uses ethtool -p to identify the physical port backing an interface, or to identify the switch itself. Usually this identification is by blinking the port LED until ethtool -p is stopped.

This feature does not apply to switch ports (swpX) in Cumulus Linux.

RN-121 (CM-2123)
PTMD: When a physical interface is in a PTM FAIL state, its subinterface still exchanges information Issue:
When PTMD is incorrectly in a failure state and the Zebra interface is enabled, PIF BGP sessions are not establishing the route, but the subinterface on top of it does establish routes.

If the subinterface is configured on the physical interface and the physical interface is incorrectly marked as being in a PTM FAIL state, routes on the physical interface are not processed in Quagga, but the subinterface is working.

Steps to reproduce:
cumulus@switch:$ sudo vtysh -c 'show int swp8' 
Interface swp8 is up, line protocol is up 
PTM status: fail
index 10 metric 1 mtu 1500 
 flags: <UP,BROADCAST,RUNNING,MULTICAST>
 HWaddr: 44:38:39:00:03:88 
 inet 12.0.0.225/30 broadcast 12.0.0.227 
 inet6 2001:cafe:0:38::1/64 
 inet6 fe80::4638:39ff:fe00:388/64 
cumulus@switch:$ ip addr show | grep swp8 
 10: swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc pfifo_fast state UP qlen 500 
  inet 12.0.0.225/30 brd 12.0.0.227 scope global swp8 
 104: swp8.2049@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.229/30 brd 12.0.0.231 scope global swp8.2049 
 105: swp8.2050@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.233/30 brd 12.0.0.235 scope global swp8.2050 
 106: swp8.2051@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.237/30 brd 12.0.0.239 scope global swp8.2051 
 107: swp8.2052@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.241/30 brd 12.0.0.243 scope global swp8.2052 
 108: swp8.2053@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP>
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.245/30 brd 12.0.0.247 scope global swp8.2053 
 109: swp8.2054@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> 
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.249/30 brd 12.0.0.251 scope global swp8.2054
 110: swp8.2055@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP>
  mtu 1500 qdisc noqueue state UP 
  inet 12.0.0.253/30 brd 12.0.0.255 scope global swp8.2055
cumulus@switch:$ bgp sessions: 
 12.0.0.226 ,4 ,64057 , 958 , 1036 , 0 , 0 , 0 ,15:55:42, 0, 10472 
 12.0.0.230 ,4 ,64058 , 958 , 1016 , 0 , 0 , 0 ,15:55:46, 187, 10285
 12.0.0.234 ,4 ,64059 , 958 , 1049 , 0 , 0 , 0 ,15:55:40, 187, 10285 
 12.0.0.238 ,4 ,64060 , 958 , 1039 , 0 , 0 , 0 ,15:55:45, 187, 10285 
 12.0.0.242 ,4 ,64061 , 958 , 1014 , 0 , 0 , 0 ,15:55:46, 187, 10285 
 12.0.0.246 ,4 ,64062 , 958 , 1016 , 0 , 0 , 0 ,15:55:46, 187, 10285 
 12.0.0.250 ,4 ,64063 , 958 , 1029 , 0 , 0 , 0 ,15:55:43, 187, 10285 
 12.0.0.254 ,4 ,64064 , 958 , 1036 , 0 , 0 , 0 ,15:55:44, 187, 10285 

RN-125 (CM-1576)
Network LSA with an old router ID isn't flushed out by the originator
When the router ID is changed, the router should remove the previous network LSA (link-state advertisement) that it generated based on the IP address on the interface in the Network LSA.

Cumulus Networks doesn't remove this LSA, so it will be naturally aged out.

RN-132 (CM-2272)
You must run "apt-get update" before running any apt-get commands or after changing sources.list

Before running any apt-get commands or after changing the source.list file in /etc/apt, you need to run apt-get update.


RN-133 (CM-2273)
Interface names in Cumulus Linux cannot exceed 15 characters

Device names, including interface names, in Cumulus Linux cannot exceed 16 characters – including the terminator. Cumulus Linux truncates longer interface names.

To avoid this issue, do not assign long names to your interfaces.

The following example configuration reproduces this issue:

cumulus@switch:/sys/class/net$ grep 'iface br' /etc/network/interfaces 
iface br2-pubmgmt inet static
iface br3-prvmgmt inet manual
iface br400-quarantine inet manual
iface br401-peering-1k5 inet manual
iface br402-peering-9k inet manual
iface br500-pi-exa inet manual
iface br501-akamai-exa inet manual
iface br502-exa-internetfactory inet manual
cumulus@switch:/sys/class/net$ brctl show | grep br
bridge name	bridge id	 STP enabled	interfaces
br2-pubmgmt	 8000.089e01cebe37	no	 bond0.2
br3-prvmgmt	 8000.089e01cebe3a	no	 bond0.3
br400-quarantin	 8000.089e01cebe37	no	 bond0.400
br401-peering-1	 8000.089e01cebe3a	no	 bond0.401 <<<

RN-134 (CM-2303)
Installing Chef under Cumulus Linux

The Cumulus Linux repository contains two versions of Chef, the automation tool: 11.6.2 (the current version) and 10.30.4.

To install the latest version, connect to the switch and use apt-get:

cumulus@switch:~$ sudo apt-get install chef

To install 10.30.4, connect to the switch and use apt-get:

cumulus@switch:~$ sudo apt-get install chef=10.30.4-0.debian.7.3 

RN-163 (CM-2499)
VXLAN: ovsdb-server cannot select loopback interface as source IP address, causing TOR registration to the controller to fail

In a VXLAN using VMware NSX, ovsdb-server cannot select the loopback interface as the source IP address. This causes TOR registration to the controller to fail.

To work around this issue, run:

cl-bgp redistribute add connected

RN-179 (CM-3410)

10GTek 10G SR cables exhibit high rate of errors on Penguin Arctica 4804X switch

Some PHY-less Penguin Arctica 4804X platforms using 10GTek 10G MM SR cables exhibit high rates of errors and low bandwidth one direction.


RN-196 (CM-2499)
For a VXLAN in NSX, ovsdb-server cannot select a loopback interface as the SRC IP

As a result, the TOR registration to the controller fails.

To work around this issue, run:

cl-bgp redistribute add connected

RN-198 (CM-3290)
Port LEDs behave differently on different switch models

It's been observed that port LEDs behave differently depending upon the make and model of the switch. For example:

  • Agema AG-7448CU: the LED is off when the link is up. It blinks on briefly when there is traffic.
  • Edge-Core AS4600-54T: the LED is off when the link is up. It blinks on briefly when there is traffic.
  • QuantaMesh T3048-LY2R: the LED is on when the link is up. It blinks off briefly when there is traffic.

Cumulus Networks is currently working to fix this issue.


RN-199 (CM-2624)
When a Quagga route-map is modified, the switch could use the partial map before edits are completed

Cumulus Linux triggers a route-map update before the user finishes editing the route map, resulting in an incorrect route map being used. The route-map update trigger should only occur when user finishes editing the map.

Cumulus Networks is working to fix this issue.


RN-217 (CM-4207)
LNV: Network restart removes vxsnd anycast IP address from loopback interface

Given the following conditions:

  • You have not configured a loopback anycast IP address in /etc/network/interfaces
  • You enabled the vxsnd (service node daemon) log to automatically add anycast IP addresses

When you restart networking (with service networking restart), the anycast IP address gets removed from the loopback interface.

To prevent this issue from occurring, you should specify an anycast IP address for the loopback interface in both /etc/network/interfaces and vxsnd.conf. This way, in case the vxsnd fails, you can withdraw the IP address.


RN-218 (CM-4432)
On Quanta T5048-LY8 and T3048-LY9 switches, "Operation timed out" error occurs while removing and reinserting QSFP module The QSPFx2 module cannot be removed while the switch is powered on, as it is not hot-swappable.

RN-221 (CM-4501)
BGP graceful restart, including helper mode, not fully supported If you encounter issues with this, please submit a support request and include the output from cl-support with your ticket.

RN-227 (CM-3388)
BGP dynamic capability is not supported BGP peer sessions with dynamic capability are not supported under any version of Cumulus Linux at this time.

RN-229 (CM-4433)
When a bond subinterface that is part of a traditional bridge is brought down, it flaps that bridge This issue has been encountered in environments where both VLAN-aware and traditional bridges are in use, where a traditional bridge has a subinterface of a bond that is present as a normal interface in a VLAN-aware bridge.

RN-275 (CM-5794)
BGP import-check fails for IPv6 route if static routes to null0 are used

The path that Cumulus Linux originates should not be invalid since there is a matching route in the RIB. The import check works fine for IPv4 routes.


RN-281 (CM-5118)
Default route not removed on ifup after removing gateway statement from eth0 configuration

If you try to remove the default route from eth0 (either by commenting out or removing the gateway statement in the eth0 configuration), the route remains after running ifup.

To work around this issue, first run ifdown, then ifup on the interface via the console. After this, the route disappears.


RN-282 (CM-4837)
Dell S3048-ON has a limit of 24576 MAC address entries, instead of 32K

Other 1G switches support 32K (32768) entries. This is a known issue that should be fixed in a future release of Cumulus Linux.


RN-315 (CM-7318)
Dell S4048-ON: Various power off commands render the switch unusable

Issuing a poweroff or shutdown command with certain options shuts down the switch, but it cannot be powered on again. The options that cause this issue are:

  • shutdown -h, shutdown -P
  • poweroff -n, poweroff -d, poweroff -f, poweroff -i, poweroff -h, poweroff -p
  • halt -n, halt -d, halt -f, halt -i, halt -h, halt -p
  • init 0

Issuing a reboot command, or using other options, does not trigger this issue.

This issue affects only Dell S4048-ON switches with BIOS version 3.21.0.2 or earlier. To determine the BIOS version of the switch, run:

cumulus@switch:~$ sudo dmidecode -s system-version
3.21.0.2

This is a known issue, and Dell is issuing a fix soon.


RN-318 (CM-7574)
In VXLAN tunnel processing, for certain traffic patterns when cut-though is enabled and link pause is asserted, you experience line errors like overflow and underflow

With cut-though mode enabled and link pause is asserted, Cumulus Linux generates an TOVR and TUFL ERROR; certain error counters increment on given physical port.

cumulus@switch:~$ sudo ethtool -S swp49 | grep Error
HwIfInDot3LengthErrors: 0
HwIfInErrors: 0
HwIfInDot3FrameErrors: 0
SoftInErrors: 0
SoftInFrameErrors: 0
HwIfOutErrors: 35495749
SoftOutErrors: 0

cumulus@switch:~$ sudo ethtool -S swp50 | grep Error
HwIfInDot3LengthErrors: 3038098
HwIfInErrors: 297595762
HwIfInDot3FrameErrors: 293710518

To work around this issue, disable link pause or disable cut-through in /etc/cumulus/datapath/traffic.conf.

To disable link pause, comment out the link_pause* section in /etc/cumulus/datapath/traffic.conf:

cumulus@switch:~$ sudo nano /etc/cumulus/datapath/traffic.conf 
#link_pause.port_group_list = [port_group_0]
#link_pause.port_group_0.port_set = swp45-swp54
#link_pause.port_group_0.rx_enable = true
#link_pause.port_group_0.tx_enable = true

To enable store and forward switching, set cut_through_enable to false in /etc/cumulus/datapath/traffic.conf:

cumulus@switch:~$ sudo nano /etc/cumulus/datapath/traffic.conf 
cut_through_enable = false

RN-322 (CM-7387)
Interfaces disabled using iproute2 become enabled after restarting Quagga By default, all interfaces have a "no shutdown" associated with them in Quagga. Thus, when you restart Quagga, it enables the interfaces. This is expected behavior in Quagga. There is no workaround at this time.

RN-324 (CM-7228)
On Edge-Core AS4610-54P, the Power over Ethernet poectl service reports 5V difference in power consumption The voltage reported by the poectl -i command and measured through a power meter connected to the device varies by 5V. The current and power readings are correct and no difference is seen for them.

RN-327 (CM-4290)
Changing the route-map parameter of the redistribute command in OSPF and BGP doesn't affect the state of the resulting redistribution in those protocols

To work around this issue, remove any old redistribute command configurations before adding a new one with or without route-map as a parameter.

For example, if OSPF has a redistribute configuration such as redistribute bgp route-map redist-map-name, you would enable redistribution without a route-map by following these steps in OSPF configuration mode:

  1. no redistribute bgp
  2. redistribute bgp

You would perform a similar sequence of commands for redistribution changes in BGP as well.


RN-337 (CM-7623)
Adding IPv6 default route with src address on eth0 fails without adding delay

Attempting to install an IPv6 default route on eth0 with a source address fails at reboot or when running ifup on eth0. 

The first execution of  ifup -dv returns this warning and does not install the route:

cumulus@switch:~$ sudo ifup -dv eth0
warning: eth0: post-up cmd '/sbin/ip route add default via 2001:620:5ca1:160::1 /
src 2001:620:5ca1:160::45 dev eth0' failed (RTNETLINK answers: Invalid argument)<<<<<<<<<<

Running ifup a second time on eth0 successfully installs the route. 

There are two ways you can work around this issue. 

  1. Add a sleep 2 to the eth0 stanza in /etc/network/interfaces:
    iface eth0 inet6 static
        address 2001:620:5ca1:160::45/64
        post-up /bin/sleep 2s
        post-up /sbin/ip route add default via 2001:620:5ca1:160::1 src 2001:620:5ca11
    :160::45 dev eth0
    
  2. Exclude the src parameter to the ip route add that causes the need for the delay. If the src parameter is removed, the route is added correctly.
    iface eth0 inet6 static
        address 2001:620:5ca1:160::45/64
       post-up /sbin/ip route add default via 2001:620:5ca1:160::1 dev eth0
    
    cumulus@switch:~$ ifdown eth0
    Stopping NTP server: ntpd.
    Starting NTP server: ntpd.
    cumulus@switch:~$ ip -6 r s
    cumulus@switch:~$ ifup eth0
    Stopping NTP server: ntpd.
    Starting NTP server: ntpd.
    cumulus@switch:~$ ip -6 r s
    2001:620:5ca1:160::/64 dev eth0  proto kernel  metric 256 
    fe80::/64 dev eth0  proto kernel  metric 256 
    default via 2001:620:5ca1:160::1 dev eth0  metric 1024 
    cumulus@switch:~$

RN-351 (CM-7829)
Installing LNV

The LNV packages are not installed when you upgrade Cumulus Linux. You can get the latest version of LNV for this release of Cumulus Linux in one of two ways:

  • Do a full binary image install of Cumulus Linux, using cl-img-install
  • Install the LNV packages for the registration and service node daemons using apt-get install vxfld-vxrd and/or apt-get install vxfld-vxsnd, depending upon how you intend to use LNV

RN-355 (CM-7994)
OSPFv2 Area ID being implicitly translated from Integer format to dotted decimal format

While OSPF area ID configuration in Quagga allows for the value to be specified in either dotted decimal format, or as an integer, values specified as an integer will be converted into dotted decimal format when displayed, causing potential confusion for the operator.

This issue does not impact OSPF functionality; only the display output. However, it is recommended that the OSPF area ID is specified in dotted decimal format for consistency.

 

RN-380 (CM-6110)
ifupdown2: adjust VLAN subinterface MTU based on MTU settings specified under lowerdev by the user

The following kernel error occurs when the MTU is specified under a subinterface rather than under the VLAN interface:

root@dell-s3000-04:~# ifreload -a -X eth0
warning: failed to execute cmd 'ip -force -batch - [addr add 1.1.4.1/24 dev swp52.100
link set dev swp52.100 mtu 9000 
]'(RTNETLINK answers: Numerical result out of range
Command failed -:2)

This issue is being investigated.


RN-381 (CM-6307)
Implement IPv6 initial neighbor discovery process to speed peer startup If the IPv6 nd ra-interval <interval> command is not run, the default max value of 600 seconds is used. This can delay peer discovery for up to 10 minutes for some peers. The ra-interval must be set to avoid this issue.

RN-382 (CM-6692)
Quagga: Removing bridge via ifupdown2 does not remove it from Quagga Removing a bridge using ifupdown2 does not remove it from the Quagga configuration files. This issue is being investigated; however, restarting Quagga will successfully remove the bridge.

RN-383 (CM-7196)
admin down of link deletes IPv6 nexthop static route entry, but not for IPv4

When a link is admin down and carrier is on, the IPv4 nexthop entry is marked dead, but the IPv6 nexthop entry is deleted, and will not be restored when the link is admin up. However, if carrier is off, the IPv6 nexthop is marked dead and not deleted. This inconsistency in admin down behavior is being investigated.


RN-384 (CM-7684)
Keeping VXLAN single-connected devices up on MLAG secondary node In the current MLAG secondary design, if the VXLAN device is not dual-connected, it is kept in a protodown state. You can keep them up with individual IP addresses rather than anycast IPs when the peerlink is down, so that all single-connected hosts will have connectivity. Further investigation regarding this issue is underway.

RN-387 (CM-8163)
Quagga appears to not honor passive interfaces if VRR is active

In a VRR configuration, any interface-specific routing configuration (e.g., OSPF mode of operation) specified on the subinterface having a virtual IP address does not take effect. This is because when an operator has specified a virtual IP on a bridge, the system creates another internal interface bridge with the virtual IP and MAC. These two interfaces are treated distinctly by Quagga, so any interface-specific routing configuration on the bridge does not get carried over to the second bridge.

In a VRR deployment needing any interface-specific routing configuration on the interface with a virtual IP address, the routing configuration has to be specified against the internally-created virtual interface also.


RN-389 (CM-8410)
switchd supports only port 4789 as the UDP port for VXLAN packets

switchd currently allows only the standard port 4789 as the UDP port for VXLAN packets. There are cases where a hypervisor could be using non-standard UDP port, which would cause VXLAN exchanges with the hardware VTEP to not work. In such a case, packets would not be terminated and encapsulated packets would be sent out on UDP port 4789.


RN-390 (CM-9055)
If you’re logged into the serial console and type reboot, the system may hang indefinitely

In Cumulus Linux 3.0.1, systemd may block and stop handling systemctl changes, and fail to start or restart services, if the serial console columns or rows are changed. For example, running stty rows 30 columns 96 may cause this. In this state, a new login session will not be started on the serial console (/dev/ttyS0) after logout.

To verify whether systemd is hung in this manner, run cat /proc/1/stack; you should see output similar to the following:

[<ffffffff81466f5d>] tty_port_block_til_ready+0x1bd/0x330
[<ffffffff81093e90>] ? wait_woken+0x90/0x90
[<ffffffff8147abbb>] ? uart_startup.part.16+0xbb/0x1f0
[<ffffffff8147adef>] uart_open+0xff/0x180
[<ffffffff8145f025>] tty_open+0xf5/0x660
[<ffffffff811a3ad8>] chrdev_open+0xa8/0x1a0
[<ffffffff811a3a30>] ? cdev_put+0x30/0x30
[<ffffffff8119cc79>] do_dentry_open.isra.15+0x159/0x310
[<ffffffff8119e183>] vfs_open+0x53/0x60
[<ffffffff811acfd9>] do_last+0x249/0x11f0
[<ffffffff811af7c0>] path_openat+0x80/0x5f0
[<ffffffff811ec90e>] ? locks_dispose_list+0x3e/0x50
[<ffffffff811edae0>] ? __posix_lock_file+0xe0/0x630
[<ffffffff811b129a>] do_filp_open+0x3a/0xb0
[<ffffffff813cb8aa>] ? find_next_zero_bit+0x1a/0x30
[<ffffffff811bd8fe>] ? __alloc_fd+0x7e/0x120
[<ffffffff8119e52c>] do_sys_open+0x12c/0x220
[<ffffffff8119e63e>] SyS_open+0x1e/0x20
[<ffffffff816fdc57>] system_call_fastpath+0x12/0x6a

At this point it is necessary to power cycle or otherwise reset the switch to recover. A reboot or shutdown command will block, because systemd is hung.


RN-391 (CM-9631)
Dell S4048 unresponsive after TX Unit Hang detected

After booting a Dell S4048 switch, the switch becomes unresponsive and errors like the following appear in the console log:

[ 1206.440277] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1206.440277]   Tx Queue             <0>
[ 1206.440277]   TDH                  <2d>
[ 1206.440277]   TDT                  <2e>
[ 1206.440277]   next_to_use          <2e>
[ 1206.440277]   next_to_clean        <2d>
[ 1206.440277] buffer_info[next_to_clean]
[ 1206.440277]   time_stamp           <1000dcd20>
[ 1206.440277]   next_to_watch        <ffff88007d81b2d0>
[ 1206.440277]   jiffies              <1000dd5d4>
[ 1206.440277]   desc.status          <300000>
[ 1208.439856] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1208.439856]   Tx Queue             <0>
[ 1208.439856]   TDH                  <2d>
[ 1208.439856]   TDT                  <2e>
[ 1208.439856]   next_to_use          <2e>
[ 1208.439856]   next_to_clean        <2d>
[ 1208.439856] buffer_info[next_to_clean]
[ 1208.439856]   time_stamp           <1000dcd20>
[ 1208.439856]   next_to_watch        <ffff88007d81b2d0>
[ 1208.439856]   jiffies              <1000ddda4>
[ 1208.439856]   desc.status          <300000>
[ 1210.439414] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1210.439414]   Tx Queue             <0>
[ 1210.439414]   TDH                  <2d>
[ 1210.439414]   TDT                  <2e>
[ 1210.439414]   next_to_use          <2e>
[ 1210.439414]   next_to_clean        <2d>
[ 1210.439414] buffer_info[next_to_clean]
[ 1210.439414]   time_stamp           <1000dcd20>
[ 1210.439414]   next_to_watch        <ffff88007d81b2d0>
[ 1210.439414]   jiffies              <1000de574>
[ 1210.439414]   desc.status          <300000>
[ 1212.438966] igb 0000:00:14.0: Detected Tx Unit Hang
[ 1212.438966]   Tx Queue             <0>
[ 1212.438966]   TDH                  <2d>
[ 1212.438966]   TDT                  <2e>
[ 1212.438966]   next_to_use          <2e>
[ 1212.438966]   next_to_clean        <2d>
[ 1212.438966] buffer_info[next_to_clean]
[ 1212.438966]   time_stamp           <1000dcd20>
[ 1212.438966]   next_to_watch        <ffff88007d81b2d0>
[ 1212.438966]   jiffies              <1000ded44>
[ 1212.438966]   desc.status          <300000>
[ 1212.490329] igb 0000:00:14.0 eth0: Reset adapter

Rebooting the switch again stops the behavior.


RN-394 (CM-10155)
syslog error: systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument

The following message gets logged to /var/log/syslog when you run systemctl daemon-reload and during system boot:

systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument

This message is harmless, and can be ignored. It is logged when systemd attempts to change cgroup attributes that are read only. The upstream version of systemd has been modified to not log this message by default.

The systemctl daemon-reload command is often issued when Debian packages are installed, so the message may be seen multiple times when upgrading packages.


RN-402 (CM-9627)
smond PSU warnings on Supermicro X3648S

The Supermicro X3648S uses a different PSU (DPS-550) than the equivalent Penguin Arctica 4806XP switch (DPS-460). Cumulus Linux was written to work with the DPS-460, but the DPS-550 behaves differently.

On the Penguin Arctica 4806XP, a PWM value is written to the DPS-460 fan, and the power supply fan remains at that setting. But on the Supermicro X3648S, when a PWM value is written to the DPS-550 fan, the power supply fan initially goes to the setting; however, after five seconds, it may revert back to its internal setting.

As a result, you will see smond warning messages that the fan input RPM is lower than expected, which can be safely ignored.


RN-403 (CM-10968)
default.target is set to graphical.target

The default.target for systemd is mistakenly set to graphical.target in this release, instead of multi-user.target. You may see this message in the journal or syslog at system boot:

systemd[1]: Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory.

You should ignore this message, as the correct systemd state is reached, since the multi-user.target is a prerequisite of graphical.target.

This issue will be fixed in a future release of Cumulus Linux.


RN-404 (CM-4407)
 

When BGP is configured with aggregate addresses with as-set configuration and there are many routes to be aggregated, the BGP process gets into high CPU usage.

To work around this issue, do not specify the as-set parameter for the aggregate-address configuration.


RN-405 (CM-8720)
ifupdown2: IP address scope is not working; all addresses considered global

ifupdown2 does not honor the configured IP address scope setting in /etc/network/interfaces, and it does not report an error. Consider this example configuration:

auto swp2
iface swp2
    address 35.21.30.5/30
    address 3101:21:20::31/80
    scope link

When you run ifreload -a on this configuration, ifupdown2 considers all IP addresses as global.

cumulus@switch:~$ ip addr show swp2
5: swp2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:82 brd ff:ff:ff:ff:ff:ff
inet 35.21.30.5/30 scope global swp2
valid_lft forever preferred_lft forever
inet6 3101:21:20::31/80 scope global 
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6282/64 scope link 
valid_lft forever preferred_lft forever

To work around this issue, configure the IP address scope using post-up ip address add <address> dev <interface> scope <scope>. In continuing with the previous example:

auto swp6
iface swp6
    post-up ip address add 71.21.21.20/32 dev swp6 scope site

Now, when you run ifreload -a on this configuration, it has the correct scope:

cumulus@switch:~$ ip addr show swp6
9: swp6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:86 brd ff:ff:ff:ff:ff:ff
inet 71.21.21.20/32 scope site swp6
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6286/64 scope link 
valid_lft forever preferred_lft forever

RN-406 (CM-9895)
Mellanox SN2700 power off issues

On the Mellanox SN2700 and SN2700B switches, if any of the following occur:

  • A shutdown or poweroff command is executed
  • A temperature sensor hits a critical value and shuts down the box

Once a PDU power cycle is issued, the box appears to be dead for at least 3 minutes.


RN-407 (CM-11103)
When the sx_sdk service is restarted manually or during a package upgrade, switchd receives "Invalid Handle" errors

When the sx_sdk service is restarted manually, or when the sx_sdk Debian package is upgraded, switchd receives "Invalid Handle" errors:

2016-05-20T20:05:15.144736+00:00 mlx-2700-01 switchd[14396]: hal_mlx.c:4620 [SX_API_INTERNAL     ]: Invalid handle: handle is not valid.
2016-05-20T20:05:15.145099+00:00 mlx-2700-01 switchd[14396]: hal_mlx_port.c:1789 ERR port_pfc_stats_get failed for lid 0x13c00 prio 2: Invalid Handle
2016-05-20T20:05:15.145460+00:00 mlx-2700-01 switchd[14396]: hal_mlx.c:4620 [SX_API_INTERNAL     ]: Invalid handle: handle is not valid.
2016-05-20T20:05:15.145847+00:00 mlx-2700-01 switchd[14396]: hal_mlx_port.c:1789 ERR port_pfc_stats_get failed for lid 0x13c00 prio 3: Invalid Handle

These error messages are being investigated, but in the meantime, you should restart switchd when you restart sx-sdk. This ensures that the services recognize the other has been restarted manually.


RN-408 (CM-9815)
IPv6 route advertisement from IPv6 BGP unnumbered advertises default route

If the default router lifetime in the generated IPv6 route advertisements (RA) is set to 0, the receiving Quagga instance will drop the RA if it is on a Cumulus Linux 2.5.z switch.

To work around this issue, either:

  • Explicitly configure the switch to advertise a router lifetime of 0, unless a value is specifically set by the operator — with the assumption that the host is running Cumulus Linux 3.0 version of Quagga. When hosts see an IPv6 RA with a router lifetime of 0, they won't make that router a default router.
  • Use the sysctl on the host — net.ipv6.conf.all.accept_ra_defrtr. However, this requires applying this setting on all hosts, which may mean many hosts, especially if Quagga is run on the hosts.

RN-409 (CM-10054)
BGP may show an inaccessible path as the best path

Existing BGP issues caused peering between a VRF device and a loopback BGP session to stay up if the loopback session doesn’t advertise its local address.

This issue will be fixed in a future release.


RN-410 (CM-10215)
Mellanox SN2700 breakout cables always report errors/packets in pause

On a Mellanox SN2700 switch, any port set to breakout mode on boot generates errors or sets packets in pause on the counters. The cable does not need to be plugged in, just set to breakout mode, then restart switchd.

This issue will be fixed in the next Mellanox SDK update.


RN-411 (CM-10859)
On Mellanox switches, some CoPP rules counters do not increment

On Mellanox Spectrum switches, the mechanism used to track CPU-bound traffic counters returns inaccurate data. Thus you should not rely on this data to draw any conclusions.

This issue will be fixed in a future release of Cumulus Linux.


RN-412 (CM-11162)
Spectrum switches have a CPU RX burst size of 128 packets per queue

On switches with Spectrum ASICs — currently, the Mellanox SN2700 and SN2700B — the burst size for packets received at the CPU is limited to 128 packets per queue. Larger bursts can result in packet drops until packets are drained from the queue.

 


RN-413 (CM-11048)
SPAN is not supported on Mellanox switches

SPAN is not supported on Mellanox Spectrum switches. You should use ERSPAN instead.


RN-414 (CM-11175)
Kernel source not added to Cumulus Networks repository

Kernel source (linux<vers>.orig.tar.xz and linux<vers>.debian.tar.xz) files are not being added properly to the Cumulus Networks repository.

You can retrieve these packages manually with apt-get source <package name>. For example, to retrieve the Cumulus Networks linux-image source package, run:

cumulus@switch:~$ sudo apt-get install linux-source-4.1

The archive file is stored at /usr/src/linux-source-4.1.tar.xz.


RN-425 (CM-11308)
While configuring LNV, incomplete LNV configuration cause hang during boot up

If you configure the LNV registration node in /etc/network/interfaces for a switch running Cumulus Linux 3.0 then reboot the switch, the switch does not boot again.

To work around this issue, configure the registration node in /etc/vxrd.conf instead.


RN-426 (CM-5970)
isc-dhcp-relay must be restarted after flapping an interface or if logical interface is down

There are two known issues regarding isc-dhcp-relay:

  • If isc-dhcp-relay is already running and a host-facing interface is flapped (brought down then up using ifupdown2), the DHCP relay will not work for that interface until isc-dhcp-relay is restarted.
  • If a logical interface defined in /etc/default/isc-dhcp-relay is down when isc-dhcp-relay is started or restarted, then isc-dhcp-relay will not start; thus, the DHCP relay will not work for any interfaces.

To work around the issue, apply the following configuration to each interface specified in INTERFACES from /etc/default/isc-dhcp-relay to the corresponding iface in /etc/network/interfaces:

post-up test -e /var/run/boot.done && service isc-dhcp-relay restart

RN-431 (CM-11386)
Default route is not resolved for management VRF on a Vagrant KVM setup for cldemo-vagrant

A default route in the main routing table may be installed via eth0 when DHCP configures the interface while using management VRF. To work around this issue, run the following commands:

cumulus@switch:~$ ifdown eth0
cumulus@switch:~$ ifup eth0

This issue is fixed in Cumulus VX 3.0.1.

Have more questions? Submit a request

Comments

Powered by Zendesk