Cumulus Networks Product Bulletin 2016-01-26: Thermal Management Issues on Dell S4048-ON or Dell S6000-ON Switches Running Cumulus Linux

Follow

Issue

A Dell S6000-ON or Dell S4048-ON switch may reboot at a temperature lower than the system is rated for, and does so without logging any messages.

Environment

Hardware:

  • Dell S4048-ON or Dell S6000-ON

Software:

  • Cumulus Linux versions 2.5.5 and earlier

Root Cause

The thermal management software did not match the thermal management specifications of the hardware platforms. The onboard hardware temperature sensors have registers for programming temperature threshold values. The Cumulus Linux interpretation of this register, consistent with the Linux driver interpretation and the thermal sensor data sheet, was to initiate user warnings when the threshold was reached. However, the hardware interpreted this register by initiating a shutdown of the switch. The consequence of this was that when a certain high temperature was reached, the thermal chip shut down the switch. Not only was this not intended, it also didn't give the switch a chance to log the warning.

Resolution

The Dell and Cumulus Networks engineering teams collaborated to fully understand software and hardware expectations of thermal management, and Cumulus Linux refactored the thermal management software to match the latest S4048-ON thermal management specifications. These sensor temperatures can be monitored in the output of smonctl, and if any sensor reaches the maximum or critical thresholds, then a log message is sent to syslog.

To fix this issue, upgrade the switch to Cumulus Linux 2.5.6 or later.

Stay Informed

Subscribe to our product bulletin mailing list to learn about these announcements as soon as they're made.

Have more questions? Submit a request

Comments

This support portal has moved

Cumulus Networks is now part of the NVIDIA Networking Business Unit! The NVIDIA Cumulus Global Support Services (GSS) team has merged its operations with the NVIDIA Mellanox support services team.

You can access NVIDIA Cumulus support content from the Mellanox support portal.

You open and update new cases on the Mellanox support portal. Any previous cases that have been closed have been migrated to the Mellanox support portal.

Cases that are still open on the Cumulus portal will continue to be managed on the Cumulus portal. Once these cases close, they will be moved to the Mellanox support portal.

Powered by Zendesk