Sensu is a monitoring system that takes information from custom system checks and passes it to a handler to perform an action. The simplest and most common action is to send a warning or error to its open source dashboard, Uchiwa. If you already use Nagios or Icinga, the principles of Sensu are similar. Sensu can also use Nagios or Icinga checks, so switching to Sensu is easy.
Cumulus Linux is monitored exactly like any other Debian-based server. Any checks used on Ubuntu or Debian boxes will work seamlessly with Cumulus Linux. If your company has been running Sensu, Icinga or Nagios for your servers, extending this to Cumulus Linux is simple and pain free. No more digging through SNMP MIB tables!
You can use Sensu to monitor the health status of Cumulus Linux switches as well as Linux-based servers and even Microsoft Windows!
Why We Monitor
Image courtesy of a back-of-the-napkin sketch by Mikey Dickerson, head of the USDS
Monitoring is the basis of all knowledge of production systems. It helps you answer questions like, "Is our site up?" You may claim that it's possible to know that without monitoring. What you cannot answer without monitoring is "How often is it up?" Or "Will it break soon?" "Does it work for your internal testers but nobody else?" Monitoring lets you understand the behavior of your system, in detail. Without it, systems operate on faith. With monitoring, it becomes science.
How Sensu Works
A series of checks are installed on the clients, which then pass the information back to the main collector. Because of this distributed check system, a Sensu installation can scale to a large number of machines with very little difficulty. Previous monitoring systems required high CPU computers to handle checks, which often caused both choke points and suboptimal monitoring (such as only doing a check every 5 minutes).
One key difference of Sensu versus a traditional monitoring system is that Sensu does not come with a pre-installed set of monitoring checks. However, this doesn't mean that you have to be a Ruby expert to take advantage of Sensu! You can take advantage of the large community of Sensu and Uchiwa users and the checks they have written. Sensu has two locations with community-submitted checks. The community plugin directory is currently replacing the original community plugin repository. The change is not yet complete, so some checks may be present only in one repository or the other. As Sensu relies on exit codes for check information, you can write checks in Ruby, Python, Bash or your favorite language.
Some common things to monitor include:
Sensu Technical Architecture
Sensu uses RabbitMQ as a queueing system in order to scale more efficiently. In contrast to Nagios, where a central host does all of the checks, in Sensu each machine performs its own checks and sends the information back to a central RabbitMQ queue. The RabbitMQ servers inspect the check result and perform an action (known as a handler). This model allows Sensu-based monitoring systems to scale much larger than Icinga and Nagios.
The Sensu server uses Redis, a key-value database, to store persistent data.
Sensu has its own native check format, or you can use Nagios and Icinga checks. You can easily write your own check in Ruby, Python or Bash, since Sensu uses the exit code of the check to determine if a check is successful or needs a warning or critical warning. You can find more information on checks in the Sensu documentation.
Image courtesy of Sensu Docs
Integrating Sensu into Your Environment
Sensu works best as a distributed monitoring agent. Sensu can scale to a very large number of switches, servers and virtual machines.
With Sensu, all of the monitoring and alerting for a team can be on a single Uchiwa dashboard. Using handlers, critical alerts can be escalated to email, Hipchat, Twitter and many other services to notify the right engineers.
In the Cumulus Workbench Sensu environment, Puppet is used to set up all of the Sensu infrastructure. You can see the Puppet manifests we use for Sensu clients here.
Handlers can also perform automated actions when a check fails. For example, if the status check for Apache fails, a handler can automatically attempt a
service apache2 start while notifying the operations team at the same time.
- Some PowerPC 1G switches have very small hard drives. Installing many packages on these machines may fill the hard drives.