Monit style alerts for Systemd

Posted on 18 Sep 2014 unix linux Systemd

TL;DR

I created a ruby gem to provide email and slack notifications for Systemd services, which gives visibility over process stops, starts, restarts and reloads. This post documents why and how.

I also need some testers, so if this is relevant to you please give it a go and send the appropriate feedback.

In an upcoming post I'm planning on explaining the benefits of Systemd for managing processes on servers. If you don't know what Systemd is, this post probably isn't for you - read the upcoming post instead.

If you use Systemd, and are the kind of person that likes to monitor server activity, you might have noticed that there's no easy way to send alerts when a unit fails. This person on this mailing list shares my surprise at the fact that there's no built in way of monitoring units.

I previously used monit, which is a great tool. However, monit not only sends notifications for failed processes but also takes on the mantle of restarting these processes. Since Systemd does that already I didn't want them stepping on eachother's toes. I simply wanted notifications on process activity.

I also use Zabbix for general server monitoring, such as processor load, disk usage, etc. Although it can monitor processes, it uses a polling mechanism. If a process fails and restarts within a few seconds, there's a very high chance that this failure wouldn't be picked up. Therefore, Zabbix and similar system monitors aren't suitable.

The requirements

The notifier should:

notify about stop, start, restart and reload states
be able to send email notifications, and have a pluggable interface for allowing other kinds of notifications (e.g. Slack, HipChat, etc.)
not use a polling mechanism. Processes can change state many times in under a second, and polling at that frequency would be far too resource-intensive
sit quietly in the background until something happens (i.e. no "busy-loops")

I couldn't find any pre-existing tool that fit these requirements, so the plan was to create one specifically for Systemd.

The solution

Follow the thread through and you see people talking about the D-Bus message system, and how Systemd uses D-Bus to send notifications. Someone even posted a Python script (that I sadly couldn't get to work), which gave a quick example of how to plug in to the Systemd messages.

However, Python is my number two language choice, after Ruby. A quick Google search revealed the ruby-dbus library, which has some great examples for getting started quickly.

Systemd sends a signal when the state of a unit changes (the PropertiesChanged signal), and provides an interface for querying specific units and their state. In our ruby code, we can register a listener which responds to a signal. When a signal is sent, we can then query the unit's state and see what's changed. A "simple" script would look something like this:

require 'dbus'

dbus            = DBus::SystemBus.instance
systemd_service = dbus.service("org.freedesktop.systemd1")

systemd_object  = systemd_service.object("/org/freedesktop/systemd1")
systemd_object.introspect  # Required, to load the API
systemd_object.Subscribe   # Required, to tell systemd to send signals

unit_def = systemd_object.GetUnit('cron.service')
unit = systemd_service.object(unit_def[0])
unit.introspect
unit.default_iface = "org.freedesktop.DBus.Properties"

# Where we register our callback
unit.on_signal("PropertiesChanged") do |iface|
  if iface == "org.freedesktop.systemd1.Unit"
    active_state = unit.Get("org.freedesktop.systemd1.Unit", "ActiveState").first
    puts active_state
  end
end

# Start the dbus loop
main = DBus::Main.new
main << dbus
main.run

If you aren't familiar with D-Bus, most of this will seem foreign to you. There's a lot of stuff in there; the concepts of interfaces and objects takes a while to digest.

To test this script simply install the ruby-dbus library and run. It will attach to the D-Bus main loop, and will wait for a change in the cron.service. Then, in another terminal, run sudo systemctl stop cron.service. You should see the script print out something like:

deactivating
inactive

Then start the service again, and you will see:

activating
active

So we can see that Systemd sends signals at a granular level. Not only does it tell us about starts and stops, but it also tells us about mid-state changes like "deactivating". This is great, but also a slight concern for building a notification system. As nice as it is to know when a unit is deactivating, it's probably only important to know that it is inactive - we don't want to be spammed with notifications.

Enter systemd_mon

From the concepts that I learnt regarding D-Bus and Systemd, I decided that a ruby gem was the best way to go for building a notifier. A quick script like the above would quickly get out of hand. I created systemd_mon, which, at the time of writing, is at version 0.0.2.

I overcame the signal granularity issue by combining states. For instance, deactivating followed by inactive would be combined into a single state, which would then be passed on to whichever notifiers are loaded.

It also keeps a short history of states. For instance, if a unit is inactive, goes to activiating and then back to inactive, that can be summarised as "still failing". The email notifier sends some tabular information showing how the state of the unit has changed recently.

Currently, two notifiers are supported: email and Slack. The plan is to extend this list, and also make it easy for people to create new notifier gems that plug in to systemd_mon.

Another feature I added was to send a notification when system_mon itself starts or stops. There is still the issue of SIGKILL signals bypassing Ruby's at_exit handler, but that's (hopefully) a fairly rare case. Also, it's recommended to add another more general level of server monitoring, e.g. Nagios or Zabbix, which can just keep a watch on systemd_mon (e.g. check it's running once a minute).

Usage

For full usage instructions check the README on the Github repository. The quick summary is that you define a YAML file containing the units that you want to monitor, plus the configuration for the notifier(s) that you want to use. You then run systemd_mon path/to/config.yml. This can easily be added as a Systemd unit, allowing it to run in the background.

Next steps

As I've said, I plan to add more notifiers. Or at least encourage others to do so.

There are a few quirks to iron out, mainly with the summarising of unit states, but I've been using it for a month now on about five different servers and have already discovered issues in our software that I wouldn't have otherwise known.

If you've made it this far in this post, I take it that this is relevant to you. The most useful thing to me at the moment is testing, particularly across different versions of Systemd (I've been using 204). So feel free to give it a try and open Github issues as needed.

Contributions are more than welcome, via pull requests.

Happy Systemd-ing...?

Jon Cairns

Share