smond error: Temp2 Outlier of 16315 C on AS-4600-54T

Follow

Issue

The following error is observed from smond in /var/log/syslog on an Edge-Core AS-4600-54T switch:

2016-01-20T20:53:58.406544+09:00 switch : /usr/sbin/smond : : Temp2(P2020 CPU die sensor): state changed from OK to BAD
2016-01-20T20:53:58.406707+09:00 switch : /usr/sbin/smond : : Temp2(P2020 CPU die sensor): Following outliers were found: [16315.0] C
2016-01-20T20:55:39.117503+09:00 switch : /usr/sbin/smond : : Temp2(P2020 CPU die sensor): state changed from BAD to OK

Similarly, high readings may occasionally be observed from the same sensor, named adt7473-i2c-1-2e, under libsensors.

Environment

  • Cumulus Linux 2.5.z, versions 2.5.3a through 2.5.9
  • Edge-Core AS-4600-54T

Resolution

This error can safely be ignored. smond filters out the reading from thermal management considerations.

Note: This issue is fixed in Cumulus Linux 2.5.10.

Root Cause

Readings of ~ 16315 C are occasionally observed from Temp2 (P2020 CPU die sensor) on the AS-4600-54T. A filter was introduced in Cumulus Linux 2.5.3a to exclude these readings from thermal management considerations, and to log the errant reading instead.

Have more questions? Submit a request

Comments

This support portal has moved

Cumulus Networks is now part of the NVIDIA Networking Business Unit! The NVIDIA Cumulus Global Support Services (GSS) team has merged its operations with the NVIDIA Mellanox support services team.

You can access NVIDIA Cumulus support content from the Mellanox support portal.

You open and update new cases on the Mellanox support portal. Any previous cases that have been closed have been migrated to the Mellanox support portal.

Cases that are still open on the Cumulus portal will continue to be managed on the Cumulus portal. Once these cases close, they will be moved to the Mellanox support portal.

Powered by Zendesk