Cumulus Linux 2.2.1 Release Notes

Follow

Overview

These release notes support Cumulus Linux 2.2.1 and describe currently available features and known issues.

Licensing

Cumulus Linux is licensed on a per-instance basis. Each network system is fully operational, enabling any capability to be utilized on the switch with the exception of forwarding on switch panel ports. Only eth0 and console ports are activated on an un-licensed instance of Cumulus Linux. Enabling front panel ports requires a license.

You should have received a license key from Cumulus Networks or an authorized reseller. To install the license, read the Cumulus Linux quick start guide.

Installing Version 2.2.1

If you are upgrading from version 2.2.0, use apt-get to update the software:

  1. Run apt-get update.
  2. Run apt-get upgrade.
  3. Reboot the switch.

Caution: While this method doesn't overwrite the target image slot, the disk image does occupy a lot of disk space used by both Cumulus Linux image slots.

New Install or Upgrading from Versions Older than 2.2.0

If you are upgrading from a version older than 2.2.0, or installing Cumulus Linux for the first time, choose one of the following methods. They are ordered from the most recommended method to least recommended.

  • Download Cumulus Linux 2.2.1 from the Downloads page of the Cumulus Networks website, then use cl-img-install to install the software.

    Warning: This method overwrites the target image slot, so if you want to preserve your configuration, you should create a persistent configuration on /mnt/persist.

  • Download Cumulus Linux 2.2.1 from the Downloads page of the Cumulus Networks website, then use ONIE to perform a complete install, following the instructions in the quick start guide.

    Warning: This method is destructive; any configuration files on the switch will not be saved, so please copy them to a different server before upgrading via ONIE.

Enabling Quagga

There is no SNMP support for Quagga in this release (see RN 88 below). Due to this circumstance, you must remove all references to smux in each of the following configuration files. You must also remove these references before upgrading Cumulus Linux using apt-get. If the smux entries are present in the configuration files, the daemons in the 2.2.1 packaged version of Quagga will not start.

  1. cd /etc/quagga
  2. grep smux *
  3. Delete all lines in the config files containing the smux keyword.

The references to smux that must be removed are:

  • In bgpd.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.2 quagga_bgpd
  • In ospf6d.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.6 quagga_ospf6d
  • In ospfd.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.5 quagga_ospfd
  • In zebra.conf, remove this line:
    smux peer 1.3.6.1.4.1.3317.1.2.1 quagga_zebra

What's New in Cumulus Linux 2.2.1

Cumulus Linux 2.2.1 supports these new hardware platforms:

  • Penguin Arctica 3200XLP: 32x40G-QSFP+ (Trident II-based switch, x86 CPU)
  • QuantaMesh BMS T3048-LY8: 48x10G-SFP+ and 6x40G-QSFP+ (Trident II-based switch, x86 CPU)

For a presentation highlighting the major changes from Cumulus Linux 2.1.x, see Cumulus Linux 2.2: What's New and Different.

Experimental Features

The following experimental features are included in Cumulus Linux 2.2.1:

Documentation

You can read the technical documentation here.

Issues Fixed in Cumulus Linux 2.2.1

The following is a list of issues fixed in or no longer apply to Cumulus Linux 2.2.1. These issues were reported in earlier versions of Cumulus Linux.

Release Note ID Summary
RN-48 Agema 48x10GE switch eth0 driver reports eth0 as running even when PHY link is down
RN-62 Attributes of a BGP aggregate route may not be RFC-compliant
RN-63 BGP4 recursive route not supported
RN-141

ACL counters incorrect

RN-143 TERR/RFCS seen when switching direction from 40G ingress to 10G egress port, when same share pool uses different egress CoS queue.
RN-147 PSU status is not properly represented by LEDs on Quanta LY2 switches
RN-184

BGP attribute-unchanged next-hop flag does not work for IPv6 peers

RN-189

Fan on QuantaMesh LB9 marked as absent

RN-190

clagd unhandled traceback exception: "OSError: [Errno 2] No such file or directory: '/var/run/clagd.pid'" after running service networking restart then clagd restart

RN-193

On Edge-Core AS5610 switches, SFP+ (all optics) link stays down

RN-195

Security Update for apt and bash packages: Shellshock bug fix

RN-205 Running the cl-support script kills switchd
RN-313 High memory utilization by snmpd following MIB walks

Known Issues in Cumulus Linux 2.2.1

Issues are categorized for easy review. Some issues are fixed but will be available in a later release.

Release Note ID Summary Description
RN-4 ifup/ifdown must be used for interfaces with IPv6 addresses defined in /etc/network/interfaces, otherwise the IPv6 interface will go down Two scenarios are shown below; one with ifup/ifdown, the other with ifconfig down.

With ifup/ifdown:
 swp1 Link encap:Ethernet HWaddr 44:38:39:00:01:81
 inet addr:11.0.0.2 Bcast:11.0.0.255 Mask:255.255.255.0
 inet6 addr: fe80::4638:39ff:fe00:181/64 Scope:Link
 inet6 addr: fec0:1000:1000:1000::2/10 Scope:Site
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:4231 errors:0 dropped:0 overruns:0 frame:0
 TX packets:4342 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:500
 RX bytes:412115 (402.4 KiB) TX bytes:425688 (415.7 KiB)

cumulus@switch$ sudo ifdown swp1
cumulus@switch$ sudo ifconfig swp1 swp1 Link encap:Ethernet HWaddr 44:38:39:00:01:81 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:4248 errors:0 dropped:0 overruns:0 frame:0 TX packets:4356 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:413990 (404.2 KiB) TX bytes:427074 (417.0 KiB)
cumulus@switch$ sudo ifconfig swp1 swp1 Link encap:Ethernet HWaddr 44:38:39:00:01:81 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:4248 errors:0 dropped:0 overruns:0 frame:0 TX packets:4356 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:413990 (404.2 KiB) TX bytes:427074 (417.0 KiB) cumulus@dni-7448-13$ sudo ifup swp1 ADDRCONF(NETDEV_UP): swp1: link is not ready cumulus@switch$ sudo ifconfig swp1ADDRCONF(NETDEV_CHANGE): swp1: /
link becomes ready swp1 Link encap:Ethernet HWaddr 44:38:39:00:01:81 inet addr:11.0.0.2 Bcast:11.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::4638:39ff:fe00:181/64 Scope:Link inet6 addr: fec0:1000:1000:1000::2/10 Scope:Site UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4250 errors:0 dropped:0 overruns:0 frame:0 TX packets:4362 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:414178 (404.4 KiB) TX bytes:427610 (417.5 KiB)
cumulus@switch$

With ifconfig down:
 sudo ifconfig swp1
 swp1 Link encap:Ethernet HWaddr 44:38:39:00:01:81
 inet addr:11.0.0.2 Bcast:11.0.0.255 Mask:255.255.255.0
 inet6 addr: fe80::4638:39ff:fe00:181/64 Scope:Link 
 inet6 addr: fec0:1000:1000:1000::2/10 Scope:Site
 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
 RX packets:98 errors:0 dropped:0 overruns:0 frame:0
 TX packets:111 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:500 
 RX bytes:13310 (12.9 KiB) TX bytes:12786 (12.4 KiB)

cumulus@switch$ sudo ifconfig swp1 down
cumulus@switch$ sudo ifconfig swp1 swp1 Link encap:Ethernet HWaddr 44:38:39:00:01:81 inet addr:11.0.0.2 Bcast:11.0.0.255 Mask:255.255.255.0 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:126 errors:0 dropped:0 overruns:0 frame:0 TX packets:138 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:16998 (16.5 KiB) TX bytes:15998 (15.6 KiB)
cumulus@switch$ sudo ifconfig swp1 up ADDRCONF(NETDEV_UP): swp1: link is not ready
cumulus@switch$ sudo ifconfig swp1ADDRCONF(NETDEV_CHANGE): swp1: link becomes ready swp1 Link encap:Ethernet HWaddr 44:38:39:00:01:81 inet addr:11.0.0.2 Bcast:11.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::4638:39ff:fe00:181/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:130 errors:0 dropped:0 overruns:0 frame:0 TX packets:149 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:17474 (17.0 KiB) TX bytes:17154 (16.7 KiB)
RN-10 cl-phy-update doesn't support aggregated ports Ports can be aggregated into a larger interface in Cumulus Linux. Unfortunately support for aggregated ports is not yet supported when running cl-phy-update.

If there are any ganged ports during a SW upgrade it is recommended to ungang these ports
RN-52 Parameters like the router ID and DR priority cannot be changed while OSPFv2/v3 is running Router ID and DR priority can only be changed by shutting down OSPFv2/v3, changing the ID, and restarting the OSPF process.

A change to the DR priority may not properly be reflected in the LSAs that are still aging out.
RN-56 ipv4/ipv6 forwarding disabled mode not recognized

If either of the following is configured:

 net.ipv4.ip_forward == 0 

or:

 net.ipv6.conf.all.forwarding == 0 

The hardware still forwards packets if there is a neighbor table entry pointing to the destination.

RN-58 IPv6 route is installed and active in the routing table when the associated interface is down If an IPv6 address is assigned to a "down" interface, the associated route is still installed into the route table.

Also, the type of IPv6 address doesn't matter. Link local, site local, and global all exhibit the same problem.

If the interface is bounced up and down, then the routes are no longer in the route table.
RN-61 BGP4 notifications missing for several conditions In certain conditions, Quagga bgpd silently closes the peering without sending a notification. For example, if BGP receives a message with an invalid message type or invalid message length.

Ideally on any one of these cases, bgpd should send out a notification message to the peer.

General functionality of BGP4 is not affected.
RN-64 Configuring route-reflector-client requires specific order In configuring a route to be a route reflector client, the Quagga configuration must be specified in a specific order; otherwise, the router will not be a route reflector client.

The "neighbor <IPv4/IPV6> route-reflector-client" command must be done after the "neighbor <IPV4/IPV6> Activate" command; otherwise, the route-reflector-client command is ignored.

Sample configuration:
 router bgp 65000
 bgp router-id 0.0.0.4 
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 bgp cluster-id 0.0.0.4 
 bgp bestpath as-path multipath-relax 
 redistribute connected 
 neighbor 14.0.0.1 remote-as 65000 
 neighbor 14.0.0.1 route-reflector-client 
 neighbor 14.0.0.1 activate 
 neighbor 14.0.0.1 next-hop-self 
 neighbor 14.0.0.9 remote-as 65000 
 neighbor 14.0.0.9 activate 
 neighbor 14.0.0.9 next-hop-self 
 neighbor 2001:ded:beef::1 remote-as 65000 
 neighbor 2001:ded:beef:2::1 remote-as 65000 
 maximum-paths 4 
 maximum-paths ibgp 4 
 ! 
 address-family ipv6 
 redistribute connected 
 neighbor 2001:ded:beef::1 activate 
 neighbor 2001:ded:beef::1 next-hop-self 
 neighbor 2001:ded:beef:2::1 route-reflector-client 
 neighbor 2001:ded:beef:2::1 activate 
 neighbor 2001:ded:beef:2::1 next-hop-self 
 maximum-paths 4 
 maximum-paths ibgp 4 
 exit-address-family 

At runtime:
 cumulus@switch:$ show ip bgp neighbor 14.0.0.1 
 BGP neighbor is 14.0.0.1, remote AS 65000, local AS 65000, internal link
 BGP version 4, remote router ID 0.0.0.6 
 BGP state = Established, up for 00:23:49
 Last read 23:31:36, hold time is 180, keepalive interval is 60 seconds 
 Neighbor capabilities: 
 4 Byte AS: advertised and received 
 Route refresh: advertised and received(old & new)
 Address family IPv4 Unicast: advertised and received 
 Message statistics: 
 Inq depth is 0 
 Outq depth is 0
 Sent Rcvd 
 Opens: 2 0
 Notifications: 0 0
 Updates: 1 1
 Keepalives: 25 24 
 Route Refresh: 0 0
 Capability: 0 0 
 Total: 28 25 
 Minimum time between advertisement runs is 5 seconds
 For address family: IPv4 Unicast 
 >>>>>>>>>>>>>>>>>>>>>> ROUTE REFLECTOR CLIENT NOT DISPLAYED 
 NEXT_HOP is always this router 
 Community attribute sent to this neighbor(both) 
 6 accepted prefixes 
 Connections established 1; dropped 0 
 Last reset never 
 Local host: 14.0.0.2, Local port: 179 
 Foreign host: 14.0.0.1, Foreign port: 40290 
 Nexthop: 14.0.0.2 
 Nexthop global: 2001:ded:beef::2 
 Nexthop local: fe80::202:ff:fe00:4
 BGP connection: non shared network
 Read thread: on Write thread: off 
 cumulus@switch:$ 

Workaround:
 Define in following order 
 address-family ipv4 unicast
 neighbor 14.0.0.9 activate 
 neighbor 14.0.0.9 next-hop-self
 neighbor 14.0.0.9 route-reflector-client >>> Must be after Activate 
 exit-address-family 
 neighbor 2001:ded:beef:2::1 remote-as 65000
 address-family ipv6 unicast 
 redistribute connected
 maximum-paths 4 
 maximum-paths ibgp 4 
 neighbor 2001:ded:beef:2::1 activate 
 neighbor 2001:ded:beef:2::1 next-hop-self 
 neighbor 2001:ded:beef:2::1 route-reflector-client >>> Must be after activate 
 exit-address-family 
 Runtime status after change: 

cumulus@switch:$ show ip bgp neighbors 14.0.0.9 BGP neighbor is 14.0.0.9, remote AS 65000, local AS 65000, internal link BGP version 4, remote router ID 0.0.0.7 BGP state = Established, up for 00:13:59 Last read 22:35:13, hold time is 180, keepalive interval is 60 seconds Neighbor capabilities: 4 Byte AS: advertised and received Route refresh: advertised and received(old & new) Address family IPv4 Unicast: advertised and received Message statistics: Inq depth is 0 Outq depth is 0 Sent Rcvd Opens: 1 1 Notifications: 0 0 Updates: 2 1 Keepalives: 15 14 Route Refresh: 0 0 Capability: 0 0 Total: 18 16 Minimum time between advertisement runs is 5 seconds For address family: IPv4 Unicast Route-Reflector Client >>>>>>>>>> PLEASE NOTE ME NEXT_HOP is always this router Community attribute sent to this neighbor(both) 6 accepted prefixes Connections established 1; dropped 0 Last reset never Local host: 14.0.0.10, Local port: 38813 Foreign host: 14.0.0.9, Foreign port: 179 Nexthop: 14.0.0.10 Nexthop global: 2001:ded:beef:2::2 Nexthop local: fe80::202:ff:fe00:6 BGP connection: non shared network Read thread: on Write thread: off cumulus@switch:$
RN-65 Virtual links in Quagga's OSPFv2 are non-operational Cumulus Networks testing has identified too many issues with virtual link support in Quagga's OSPFv2. The feature is unsupported.
RN-68 Blackhole/Unreachable/Prohibit route addition in IPv6 returns corresponding error codes IPv6 route operations indicate the destination action via returned error codes. In the example shown below where an unreachable route is being added, the return code is:
 #define ENETUNREACH 101 /* Network is unreachable */ 
cumulus@switch:$ sudo ip addr add 9000:1000:1000:1000::1/80 dev lo cumulus@switch:$ sudo ip -6 route unreachable 9000:1000:1000:1000::/80 dev lo proto kernel metric 256 error -101
RN-70 ACL: Bridge traffic that matches a LOG ACTION rule is not logged in syslog For example, a bridge with switch ports swp1, swp2, swp3 as bridge members is configured. ACL rules to LOG and DROP for icmp traffic are configured.

Ping requests are sent from host1 on swp1 to host3 on swp3, and the following was observed:
* Counters for both LOG and DROP ACL rules are incrementing properly, but the packets are not showing up on /var/log/syslog.
* Packets that are copied to the CPU from hardware for the LOG rule are dropped due to the check in kernel to disable software bridging for hardware bridged packets.
RN-77 New routes/ECMPs can evict existing/installed Cumulus Linux syncs routes between the kernel and the switching silicon. If the required resource pools in hardware fill up, new kernel routes can cause existing routes to move from being fully allocated to being partially allocated.

In order to avoid this, routes in the hardware should be monitored and kept below the ASIC limits.

For example, on systems with Trident+ chips, the limits are as follows:
 routes: 16384 <<<< if all routes are ipv4 
 long mask routes 256 <<<< i.e., routes with a mask longer than the route mask limit 
 route mask limit 64
 host_routes: 8192 
 ecmp_nhs: 4044 
 ecmp_nhs_per_route: 52 
That translates to about 77 routes with ECMP NHs, if every route has the maximum ECMP NHs.

Monitoring this in Cumulus Linux is performed via the cl-resource-query command:
 cumulus@switch:~# sudo cl-resource-query
 hosts : 3 
 all routes : 29 
 IP4 routes : 17 
 IP6 routes : 12 
 nexthops : 3 
 ecmp_groups : 0
 ecmp_nexthops : 0
 mac entries : 0 / 131072 
 bpdu entries : 500 / 512 
The resource to monitor is the ecmp_nexthops. If this count is close to 4044, new ECMPs may evict existing routes.
RN-88 SNMP support for Quagga is NOT provided in Cumulus Linux Cumulus Linux 2.2 does not provide SNMP support for Quagga.
RN-99 cl-img-clear-overlay is disabled if kernel is upgraded using apt-get If you have upgraded the kernel using apt-get update, then cl-img-clear-overlay will be disabled. To ensure Cumulus Linux and all its contained packages are in sync, and to be able to use cl-img-clear-overlay, perform a full install of Cumulus Linux using cl-img-install.
RN-103 In a VRR environment, the server that is bonded to the VRR switches could lose packets destined to the VRR's IP addresses for up to 15 seconds.

In the following configuration:

. r1 
 . / \ 
 . vrr1------vrr2 
 . \ / 
 . host1 

The hosts have bond interfaces where one sub-interface goes to switch, vrr1, and the other goes to the other switch, vrr2.

If the link between the host and one of the VRR switches goes down, it can take up to 15 seconds of the VRR switches to send out an ARP to clear the ARP cache on the host for the IP address on the bridge interface. This is because the host might not clear the ARP cache since the bond doesn't go down. Only a sub-interface in the bond goes down.

Steps to reproduce:
1. One of the the hosts connected to the VRR switches, ping the real IP addresses of the bridge.
2. On the same host, bring the active interface down with "ip link set down" and let the backup take over.
3. Ping the real IP addresses of the VRR switch that is connected to the active interface.
RN-112 Enabling LACP support for non-L3/L4 modes Issue:
The current LACP implementation only supports srcdestip (0x6) mode.

Resolution:
In order to use srcdestmac mode, use the following commands:

First, find the bond name to hardware ID mapping:
cumulus@switch:/var/log# sudo kill -SIGRTMIN+5 `pidof switchd` 
cumulus@switch:/var/log# grep -A 1000 'Bond Info Dump Start' /
/var/log/switchd.log | grep -B 1000 'Bond Info Dump End'
1386720020.205690 2013-12-11 00:00:20 sync.c:740 Bond Info Dump Start
1386720020.205953 2013-12-11 00:00:20 sync.c:736 Kernel: bond0 HAL: 0>>>Mapping
1386720020.205981 2013-12-11 00:00:20 sync.c:743
1386720020.206005 2013-12-11 00:00:20 hal_bcm.c:4110 HAL unit: 0
1386720020.206042 2013-12-11 00:00:20 hal_bcm.c:4106 HAL: 0 ext_vlan 0
int_vlan 2000 egr_pg 1
1386720020.206225 2013-12-11 00:00:20 sync.c:745 Bond Info Dump End

Based on the mapping, run the following command, where psc id is the HAL:x:
cumulus@switch:$ sudo /usr/lib/cumulus/bcmcmd trunk psc id=1 rtag=0x3 

Notes:
1. The HAL ID is a non-persistent ID.
2. If the bond interface goes down or up, you need to do this again.

Verify the commands:
srcdestmac mode 0x3== platform dni-7448-05 
XOR DST+SRC MAC = PASS
FLOOD = PASS
RN-116 Bridge driver issues affecting IGMP snooping behavior on STP topology change Issue:
The Cumulus Linux bridge driver does not adhere to the IETF standard for IGMP snooping during an STP topology change.

Resolution:
On an STP topology change, RFC 4541, section 2.1.1, point 4 (https://tools.ietf.org/html/rfc4541, copied below) suggests what an IGMP snooping switch should do to reduce network convergence; this is not present in the bridge driver.

In addition, the bridge driver does not send a general query on receiving a global leave.

4) An IGMP snooping switch should be aware of link layer topology changes
caused by Spanning Tree operation. When a port is enabled or disabled by
Spanning Tree, a General Query may be sent on all active non-router ports
in order to reduce network convergence time. Non-Querier switches should be
aware of whether the Querier is in IGMPv3 mode. If so, the switch should not
spoof any General Queries unless it is able to send an IGMPv3 Query that
adheres to the most recent information sent by the true Querier. In no case
should a switch introduce a spoofed IGMPv2 Query into an IGMPv3 network, as
this may create excessive network disruption.

If the switch is not the Querier, it should use the 'all-zeros' IP Source Address
in these proxy queries (even though some hosts may elect to not process queries
with a 0.0.0.0 IP Source Address). When such proxy queries are received, they must
not be included in the Querier election process.
RN-119 LLDP frames being reported as software RX drops when received on bridge interfaces Issue:
RX drops have been reported on interfaces (using cl-netstat) that are not reflected in hardware, when they are actually received LLDP frames.

Steps to reproduce:
#! /bin/bash 
mz eth3.1032 -c 0 -d 100m "01:80:c2:00:00:0e 02:00:00:00:00:01 88:cc 02:07
04:90:e2:ba:21:a9:1c:04:07:03:90:e2:ba:21:a9:1c:06:02:00:78:0a:27:78:6b:76:
6d:31:30:33:34:31:2e:70:33:72:32:2e:6d:61:73:73:65:66:66:65:63:74:2e:64:68:
63:6f:6d:70:75:74:65:2e:6e:65:74:0c:5d:55:62:75:6e:74:75:20:31:32:2e:30:34:
2e:31:20:4c:54:53:0a:20:4c:69:6e:75:78:20:33:2e:32:2e:30:2d:32:39:2d:67:65:
6e:65:72:69:63:20:23:34:36:2d:55:62:75:6e:74:75:20:53:4d:50:20:46:72:69:20:
4a:75:6c:20:32:37:20:31:37:3a:30:33:3a:32:33:20:55:54:43:20:32:30:31:32:20:
78:38:36:5f:36:34:0e:04:00:1c:00:00:10:0c:05:01:0a:41:10:30:02:00:00:00:04:
00:08:04:65:74:68:33:fe:09:00:12:0f:03:01:00:00:00:00:fe:09:00:12:0f:01:00:
80:00:00:21:fe:06:00:12:0f:04:00:00:00:00"
RN-120 ethtool LED blinking does not work with switch ports Linux uses ethtool -p to identify the physical port backing an interface, or to identify the switch itself. Usually this identification is by blinking the port LED until ethtool -p is stopped.

This feature does not apply to switch ports (swpX) in Cumulus Linux.
RN-121 PTMD: When a physical interface is in a PTM FAIL state, its subinterface still exchanges information Issue:
When PTMD is incorrectly in a failure state and the Zebra interface is enabled, PIF BGP sessions are not establishing the route, but the subinterface on top of it does establish routes.

If the subinterface is configured on the physical interface and the physical interface is incorrectly marked as being in a PTM FAIL state, routes on the physical interface are not processed in Quagga, but the subinterface is working.

Steps to reproduce:
cumulus@switch:$ sudo vtysh -c 'show int swp8' 
Interface swp8 is up, line protocol is up
PTM status: fail
index 10 metric 1 mtu 1500
flags: <UP,BROADCAST,RUNNING,MULTICAST>
HWaddr: 44:38:39:00:03:88
inet 12.0.0.225/30 broadcast 12.0.0.227
inet6 2001:cafe:0:38::1/64
inet6 fe80::4638:39ff:fe00:388/64
cumulus@switch:$ ip addr show | grep swp8
10: swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 500
inet 12.0.0.225/30 brd 12.0.0.227 scope global swp8
104: swp8.2049@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 12.0.0.229/30 brd 12.0.0.231 scope global swp8.2049
105: swp8.2050@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 12.0.0.233/30 brd 12.0.0.235 scope global swp8.2050
106: swp8.2051@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 12.0.0.237/30 brd 12.0.0.239 scope global swp8.2051
107: swp8.2052@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 12.0.0.241/30 brd 12.0.0.243 scope global swp8.2052
108: swp8.2053@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 12.0.0.245/30 brd 12.0.0.247 scope global swp8.2053
109: swp8.2054@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 12.0.0.249/30 brd 12.0.0.251 scope global swp8.2054
110: swp8.2055@swp8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
inet 12.0.0.253/30 brd 12.0.0.255 scope global swp8.2055
cumulus@switch:$
bgp sessions:
12.0.0.226 ,4 ,64057 , 958 , 1036 , 0 , 0 , 0 ,15:55:42, 0, 10472
12.0.0.230 ,4 ,64058 , 958 , 1016 , 0 , 0 , 0 ,15:55:46, 187, 10285
12.0.0.234 ,4 ,64059 , 958 , 1049 , 0 , 0 , 0 ,15:55:40, 187, 10285
12.0.0.238 ,4 ,64060 , 958 , 1039 , 0 , 0 , 0 ,15:55:45, 187, 10285
12.0.0.242 ,4 ,64061 , 958 , 1014 , 0 , 0 , 0 ,15:55:46, 187, 10285
12.0.0.246 ,4 ,64062 , 958 , 1016 , 0 , 0 , 0 ,15:55:46, 187, 10285
12.0.0.250 ,4 ,64063 , 958 , 1029 , 0 , 0 , 0 ,15:55:43, 187, 10285
12.0.0.254 ,4 ,64064 , 958 , 1036 , 0 , 0 , 0 ,15:55:44, 187, 10285
RN-125 Network LSA with an old router ID isn't flushed out by the originator Issue:
When the router ID is changed, the router should remove the previous network LSA (link-state advertisement) that it generated based on the IP address on the interface in the Network LSA.

Resolution:
Cumulus Networks isn't removing this LSA, so it will be naturally aged out.
RN-128 Quagga does not start by default in Cumulus Linux 2.2 To start Quagga, modify /etc/quagga/daemons to enable the corresponding daemons.
zebra=yes (* this one is mandatory to get the others up) 
bgpd=yes
ospfd=yes
ospf6d=yes
ripd=no
ripngd=no
isisd=no
babeld=no

Then, restart Quagga.
 cumulus@switch1:~# sudo /etc/init.d/quagga start 
RN-132 You must run "apt-get update" before running any apt-get commands or after changing sources.list

Before running any apt-get commands or after changing the source.list file in /etc/apt, you need to run apt-get update.

RN-133 Interface names in Cumulus Linux cannot exceed 15 characters

Device names, including interface names, in Cumulus Linux cannot exceed 16 characters – including the terminator. Cumulus Linux truncates longer interface names.

To avoid this issue, do not assign long names to your interfaces.

The following example configuration reproduces this issue:

cumulus@switch:/sys/class/net$ grep 'iface br' /etc/network/interfaces 
iface br2-pubmgmt inet static
iface br3-prvmgmt inet manual
iface br400-quarantine inet manual
iface br401-peering-1k5 inet manual
iface br402-peering-9k inet manual
iface br500-pi-exa inet manual
iface br501-akamai-exa inet manual
iface br502-exa-internetfactory inet manual
cumulus@switch:/sys/class/net$ brctl show | grep br
bridge name	bridge id	 STP enabled	interfaces
br2-pubmgmt	 8000.089e01cebe37	no	 bond0.2
br3-prvmgmt	 8000.089e01cebe3a	no	 bond0.3
br400-quarantin	 8000.089e01cebe37	no	 bond0.400
br401-peering-1	 8000.089e01cebe3a	no	 bond0.401 <<<
RN-134 Installing Chef under Cumulus Linux

The Cumulus Linux 2.2 repository contains two versions of Chef, the automation tool: 11.6.2 (the current version) and 10.30.4.

To install the latest version, connect to the switch and use apt-get:

cumulus@switch:~# sudo  apt-get install chef

To install 10.30.4, connect to the switch and use apt-get:

cumulus@switch:~# sudo apt-get install chef=10.30.4-0.debian.7.3 
RN-150 Tagged packets have their 802.1p value set to 0

All the tagged packets get their 802.1p priority value set to 0.

This is a known issue that should be fixed in a future release.

RN-153 BGP ECMP x64 topology is missing routes

Sometimes in an ECMP x64 topology, the nodes learn fewer paths to a route than the expected 64.

The issue arises because Cumulus Linux bring up peers very quickly and sometimes a peer comes up before Zebra has finished providing the OS with all connected routes (Zebra identifies all of the connected routes to BGP; BGP then sanity checks the next hops from the EBGP peers against that list).

This issue will be fixed in a future release of Cumulus Linux.

RN-161 Packets on local ports get dropped on admin state change of VXLAN instance attached to bridge

Packets between local ports of a bridge will get dropped momentarily when user changes the admin state of a VXLAN instance attached to the bridge (as in, when running "ip link set up/down"). Bridge attributes in the hardware are modified on the state change, which causes packets between member ports of the bridge to get dropped.

There is no workaround at this time; traffic should be stopped before changing the admin state of an attached VXLAN instance.

RN-162 Priority Flow Control doesn't work on Trident II switches

Priority Flow Control (PFC) configuration is not correct for switches on the Trident II platform. As a result, PFC doesn't work.

There is no workaround at this time.

RN-163 VXLAN: ovsdb-server cannot select loopback interface as source IP address, causing TOR registration to the controller to fail

In a VXLAN using VMware NSX, ovsdb-server cannot select the loopback interface as the source IP address. This causes TOR registration to the controller to fail.

To work around this issue, run:

cl-bgp redistribute add connected
RN-164 IFLA_VXLAN_SERVICE_NODE incompatible with upstream kernel

IFLA_VXLAN_SERVICE_NODE is a Cumulus Linux-specific VXLAN attribute, and the Debian kernel has had more VXLAN attributes added to it since Cumulus Linux 2.0 was released.

This issue will be fixed in a future release of Cumulus Linux.

RN-165 Quanta LY6 switch has memory parity error _soc_mem_array_sbusdma_read: L2_ENTRY.ipipe0 failed(ERR)

On a Quanta LY6 switch, you may see some memory parity errors in the switchd log that look like this:

switchd.log.1.gz:1402208356.479833 2014-06-08 06:19:16 
 sync.c:2803 IPv4 Route Summary (90) : 0 Added, 1 Deleted, 0 Updated in 30943 usecs
switchd.log.1.gz:1402208356.521537 2014-06-08 06:19:16 
 hal_acl_bcm.c:2352 ACL: installation succeeded, switched over
switchd.log.1.gz:1402208357.679543 2014-06-08 06:19:17 sync.c:2803 
 IPv4 Route Summary (91) : 1 Added, 299 Deleted, 0 Updated in 220336 usecs
switchd.log.1.gz:1402208357.719811 2014-06-08 06:19:17 
 hal_acl_bcm.c:2352 ACL: installation succeeded, switched over
switchd.log.1.gz:1402208359.486625 2014-06-08 06:19:19 sync.c:2803 
 IPv4 Route Summary (92) : 299 Added, 0 Deleted, 0 Updated in 200425 usecs
switchd.log.1.gz:1402208361.042769 2014-06-08 06:19:21 
 hal_acl_bcm.c:2352 ACL: installation succeeded, switched over
switchd.log.1.gz:1402208419.059525 2014-06-08 06:20:19 hal_bcm.c:408 
 caught a parity error of type parity data error: 0x4000001, 0x1c0043cd
switchd.log.1.gz:1402208419.059594 2014-06-08 06:20:19 switchd.c:533 
 No switchd restart: restart trigger has been disabled
switchd.log.1.gz:1402208419.061002 2014-06-08 06:20:19 hal_bcm.c:408 
 caught a parity error of type corrected data error: 0x7d6, 0x43cd
switchd.log.1.gz:1402208419.061032 2014-06-08 06:20:19 switchd.c:533 
 No switchd restart: restart trigger has been disabled
switchd.log.1.gz:1402208419.061091 2014-06-08 06:20:19 
 hal_bcm_console.c:169 WARN STATUS: 0x00000083
switchd.log.1.gz:OPCODE: 0x1c110200
switchd.log.1.gz:START ADDR: 0x04790180
switchd.log.1.gz:CUR ADDR: 0x1c004394
switchd.log.1.gz:_soc_mem_array_sbusdma_read: L2_ENTRY.ipipe0 failed(ERR)
switchd.log.1.gz:H/W received sbus nack with error bit set.
switchd.log.1.gz:Unit: 0 
switchd.log.1.gz:
switchd.log.1.gz:Mem: Parity error..
switchd.log.1.gz:Error in: SBUS transaction.
switchd.log.1.gz:Blk: 1, Pipe: 0, Address: 0x1c0043cd, base: 0x0, stage: 7, index: 17357

While troubleshooting this issue, the error occurred only once. If you encounter this error more than once, please submit a support request.

RN-176 ipv6route only shows 2K routes; causes cl-route-check to fail incorrectly

 

RN-179

10GTek 10G SR cables exhibit high rate of errors on Penguin Arctica 4804X switch

Some PHY-less Penguin Arctica 4804X platforms using 10GTek 10G MM SR cables exhibit high rates of errors and low bandwidth one direction.

RN-180

JDSU QSFP+ LR4 cable presence not detected on Edge-Core AS-6701 switch

 

RN-181

ECMP paths not inserted for directly connected unnumbered neighbors

Cumulus Linux does not insert multiple paths for directly connected ununumbered neighbors into hardware; it inserts only one, as determined by cl-route-check -V.

This may present a problem in VXLAN configurations were a VTEP's neighbors are directly adjacent (that is, the spine switch is the VTEP) and you want to use ECMP for the tunneled traffic. If only your leaf switches are the VTEPs, this issue will not occur.

RN-182 ICMP redirects occur on host while pinging bridge IP in VRR active-active topology (VRR and Host-MLAG)

Sometimes an ICMP request may go to a peer switch; as a result, the peer switch sends ICMP redirect messages. This occurs in a VRR active-active topology (VRR and Host-MLAG), when the host pings any of the bridge IP addresses, because both links are active.

For example:

cumulus@host2$ ping 12.0.1.2
PING 12.0.1.2 (12.0.1.2) 56(84) bytes of data.
64 bytes from 12.0.1.2: icmp_req=1 ttl=64 time=1.23 ms
64 bytes from 12.0.1.2: icmp_req=2 ttl=64 time=0.882 ms
64 bytes from 12.0.1.2: icmp_req=3 ttl=64 time=0.832 ms
64 bytes from 12.0.1.2: icmp_req=4 ttl=64 time=0.960 ms
From 12.0.1.3: icmp_seq=2 Redirect Host(New nexthop: 12.0.1.2)       <<<<<<<<<<<<<<<<<
From 12.0.1.3: icmp_seq=3 Redirect Host(New nexthop: 12.0.1.2)
From 12.0.1.3: icmp_seq=4 Redirect Host(New nexthop: 12.0.1.2)
64 bytes from 12.0.1.2: icmp_req=5 ttl=64 time=0.985 ms
From 12.0.1.3: icmp_seq=6 Redirect Host(New nexthop: 12.0.1.2)
64 bytes from 12.0.1.2: icmp_req=6 ttl=64 time=1.01 ms
64 bytes from 12.0.1.2: icmp_req=7 ttl=64 time=1.09 ms
^C
--- 12.0.1.2 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6022ms
rtt min/avg/max/mdev = 0.832/1.001/1.238/0.126 ms
cumulus@host2$ 

cumulus@host2$ arp
Address                  HWtype  HWaddress           Flags Mask            Iface
jump-01.cumulusnetworks  ether   52:55:c0:a8:00:03   C                     eth0
192.168.0.2              ether   52:55:c0:a8:00:02   C                     eth0
12.0.1.2                 ether   00:00:5e:00:01:02   C                     bond0
vrrp.cbbtier3.att.net    ether   00:00:5e:00:01:02   C                     bond0
12.0.1.3                 ether   00:00:5e:00:01:02   C                     bond0
cumulus@host2$ 
RN-183 Link not coming up (NO-CARRIER) on Penguin Arctica 4804xp with CAB-10GSFP-P9M 10Gtek 9 meter cable

The link does not come up on a Penguin Arctica 4804xp 720G PHY-less switch with 10Gtek 9 meter cables (CAB-10GSFP-P9M). You can determine this by running:

root@switch:~# ip link show swp15
17: swp15: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 500
    link/ether 44:38:39:00:70:a1 brd ff:ff:ff:ff:ff:ff

Cumulus Linux supports copper passive cables no longer than 7m.

RN-185

ethtool reports wrong values for QSFP alarm and warning thresholding flags on Agema 7448 switch

On Agema 7448 switches with QSFPs that have alarm and warning threshold flags and values set, ethtool does not report incorrect values.

This is because ethtool uses PHY-based EEPROM access for QSFPs on the 7448 switch.

RN-186

Cannot configure 10M or 100M speeds using ethtool on 1G copper ports

You cannot use ethtool to set 100Mbps or 10Mbps port speeds on 1G copper ports. You can use bcmcmd as a workaround. For example, to set 100Mbps port speed:

root@switch:~# /usr/lib/cumulus/bcmcmd port ge1 speed=100

To set a 10Mbps port speed:

root@switch:~# /usr/lib/cumulus/bcmcmd port ge1 speed=10
RN-187

Rolling back installation causes Debian packages to be unusable

When you run apt-get install, Cumulus Linux creates a new snapshot automatically if at least one file is added to /etc. If you roll back the installation using cl-persistify, the installer reverts the /etc directory to the specific snapshot, but does not do anything with installed packages.

As a result, these packages will be unusable because missing configuration files. The Cumulus Linux installer does not record which Debian packages changed after a rollback. As a workaround, you can try installing again.

RN-188  

If you are using the cl-ns-mgmt management namespace tool and you use apt-get upgrade to install new packages, the upgrade will fail if any of the packages need to be restarted in the default space. This occurs because apt-get needs to be run in the mgmt space.

Services, like switchd, that must be restarted in the default space do not restart.

RN-191 On Agema 7448 switch, QSFP vendor information is corrupted

You can see this when you run ethtool:

cumulus@switch:~# sudo ethtool -m swp51
7:14 Vendor name                               : __9____________`
   Vendor OUI                                : 23:1b:36
   Vendor PN                                 : _=______________
   Vendor rev                                : _
   Vendor SN                                 : ________________
7:14 root@dni-7448-32:~# ethtool -m swp50
7:14 Vendor name                               : __>___________$_
   Vendor OUI                                : 39:45:40
   Vendor PN                                 : /_O_CbQ6N_ _6S+U
   Vendor rev                                : 7
   Vendor SN                                 : ________________
7:14 All other look fine.
RN-192 PTM may crash when a large topology file has a syntax error

If there is a syntax error in a large (~ 4000 entries) topology.dot file, PTM may crash while reading/parsing this file. This can occur because the libcgraph API that PTM uses for parsing the file cannot handle the error. You should check topology.dot for syntax errors using a Graphviz tool like dot or dotty that can identify the syntax error.

RN-196 For a VXLAN in NSX, ovsdb-server cannot select a loopback interface as the SRC IP

As a result, the TOR registration to the controller fails.

To work around this issue, run:

cl-bgp redistribute add connected
RN-197 Host-MLAG: when a MAC entry is learned and shared with the dual-connected peer, an old entry from the peer switch overwrites the new entry This problem occurs in a host-MLAG configuration, where two switches (switch1 and switch2) are in the MLAG pair. When a MAC address is learned on bondX on switch1, it is synchronized to bondX on switch2. Then, if the MAC address is seen on bondY on switch2 (MAC address move) WITHOUT being seen on bondY on switch1, the MAC address will not move from bondX to bondY. If the MAC address is seen on bondY on switch1, then the MAC address will properly move.
RN-198 Port LEDs behave differently on different switch models

It's been observed that port LEDs behave differently depending upon the make and model of the switch. For example:

  • Agema AG-7448CU: the LED is off when the link is up. It blinks on briefly when there is traffic.
  • Edge-Core AS4600-54T: the LED is off when the link is up. It blinks on briefly when there is traffic.
  • QuantaMesh T3048-LY2R: the LED is on when the link is up. It blinks off briefly when there is traffic.

Cumulus Networks is currently working to fix this issue.

RN-199 When a Quagga route-map is modified, the switch could use the partial map before edits are completed

Cumulus Linux triggers a route-map update before the user finishes editing the route map, resulting in an incorrect route map being used. The route-map update trigger should only occur when user finishes editing the map.

Cumulus Networks is working to fix this issue.

RN-200 On an Edge-Core AS5610-52X switch with long reach QSFP cables, link stays down

To work around this issue, create a file in /etc/network/ called qsfphp, and populate it with the content below. Then run the following for each swp with these high-powered cables:

qsfphp swp#

The contents of qsfphp:

#!/bin/bash

usage() {
    echo "Software override power settings for QSFP cables allowing high power"
    echo "(>1.5W) operation."
    echo "Usage: $0 "
    exit -1
}

if [[ "$#" -ne 1 ]]; then
    usage
fi

interface="$1"

if [[ ${interface:0:3} != "swp" ]]; then
    usage
fi

port=${interface:3}
eeprom_dev=`grep -l '^port'${port}'$' /sys/class/eeprom_dev/*/label`

if [[ -z "$eeprom_dev" ]]; then
    echo "no such interface: $interface"
    exit -1
fi

eeprom="`dirname $eeprom_dev`/device/eeprom"

identifier=`dd if=$eeprom bs=1 count=1 2>/dev/null | hexdump -e '/1 "0x%02x"'`

if [[ "$identifier" != "0x0d" ]]; then
    echo "$interface is not a QSFP cable"
    exit -1
fi

echo -en '\x01' | dd of=$eeprom seek=93 bs=1 count=1 2>/dev/null
if [[ $? -eq 0 ]]; then
    echo done
else
    echo failed
    exit -1
fi
RN-201 On a Dell S6000-ON, running snmpwalk on the LM-SENSOR MIB times out To work around this issue, disable the LM-SENSORS MIB in /etc/default/snmpd.
RN-202 Running ip link add type bond mode 802.3ad doesn't set bond mode attribute

In order for the ip link add command to work correctly, you must specify the bond name.

cumulus@switch:~$ ip link add bondfoo type bond mode 802.3ad
[  771.581270] bonding: bondfoo is being created...
[  771.592144] bonding: bondfoo: setting mode to 802.3ad (4).
RN-203 cl-acltool doesn't read policy.d directory files in alphabetical order

To work around this issue, force the order in which the files are read by manually editing policy.conf:

cumulus@switch:/etc/cumulus/acl/policy.d$ sudo grep -v "#" ../policy.conf

include /etc/cumulus/acl/policy.d/00control_plane.rules
include /etc/cumulus/acl/policy.d/50ssh_block.rules
include /etc/cumulus/acl/policy.d/99control_plane_catch_all.rules

This results in the files being read in the expected order, and the ACL rules working as intended:

root@cs03:/etc/cumulus/acl/policy.d# cl-acltool -i
Reading rule file /etc/cumulus/acl/policy.d/00control_plane.rules ...
Processing rules in file /etc/cumulus/acl/policy.d/00control_plane.rules ...
Reading rule file /etc/cumulus/acl/policy.d/50ssh_block.rules ...
Processing rules in file /etc/cumulus/acl/policy.d/50ssh_block.rules ...
Reading rule file /etc/cumulus/acl/policy.d/99control_plane_catch_all.rules ...
Processing rules in file /etc/cumulus/acl/policy.d/99control_plane_catch_all.rules ...
Installing acl policy
done.
RN-204 QuantaMesh T3048-LY8 switch can hang on reboot

During multiple reboot cycles, it's been observed under rare conditions that the QuantaMesh LY8 hangs on boot.

Cumulus Networks implemented a workaround to significantly reduce the probability of such a boot failure.

If the LY8 does hang on boot, a subsequent reboot will result in the switch booting correctly.

RN-270 inotify support

inotify is not supported by the overlayfs root filesystem on PowerPC platforms.


RN-372 (CM-9360)
Security Update for CVE-2015-7547: glibc getaddrinfo Stack-based Buffer Overflow Vulnerability For details on this issue and how to upgrade, read this article.
Have more questions? Submit a request

Comments

Powered by Zendesk