A Dell S6000-ON or Dell S4048-ON switch may reboot at a temperature lower than the system is rated for, and does so without logging any messages.
- Dell S4048-ON or Dell S6000-ON
- Cumulus Linux versions 2.5.5 and earlier
The thermal management software did not match the thermal management specifications of the hardware platforms. The onboard hardware temperature sensors have registers for programming temperature threshold values. The Cumulus Linux interpretation of this register, consistent with the Linux driver interpretation and the thermal sensor data sheet, was to initiate user warnings when the threshold was reached. However, the hardware interpreted this register by initiating a shutdown of the switch. The consequence of this was that when a certain high temperature was reached, the thermal chip shut down the switch. Not only was this not intended, it also didn't give the switch a chance to log the warning.
The Dell and Cumulus Networks engineering teams collaborated to fully understand software and hardware expectations of thermal management, and Cumulus Linux refactored the thermal management software to match the latest S4048-ON thermal management specifications. These sensor temperatures can be monitored in the output of
smonctl, and if any sensor reaches the maximum or critical thresholds, then a log message is sent to
To fix this issue, upgrade the switch to Cumulus Linux 2.5.6 or later.
Subscribe to our product bulletin mailing list to learn about these announcements as soon as they're made.