Using journal-brief to watch for errors

It’s pretty easy to see what error messages have been logged on a machine with systemd installed: just run journalctl -p err. But this will show you all of the messages of priority “err” or higher, starting from the beginning of the log. What if you only want the most recent ones?

Again, that’s pretty easy: journalctl -p err -e will do it. The -e parameter tells it to start at the end, not the beginning.

But what if you specifically want to know about error messages that have been logged since you last looked? Enter journal-brief, which keeps a bookmark so that it will only show you messages logged since you last ran it.

This means you can run journal-brief -p err from a cron job to get emailed about error messages on a regular basis, with no missed or duplicated journal entries.

But some error messages are logged every time the machine boots or shuts down. For instance, on this system, I always get the following message logged on boot:

kernel: ERST: Failed to get Error Log Address Range.

I don’t want to be emailed about that every time the machine boots, and journal-brief allows me to fine-tune which messages to include and exclude. In fact it even has a mode to configure this automatically. When given the debrief command, journal-brief will output a configuration file instead of the journal entries.

Here’s how to use it. First run journal-brief -p err so that the bookmark is up to date with the latest journal message. After that, reboot the machine.

After logging in again, run journal-brief -p err debrief > ~/.config/journal-brief/journal-brief.conf to create a configuration file that will ignore all the error messages that occurred since just before you rebooted. Now future invocations of journal-brief -p err will not show those messages relating to booting, logging in, or shutting down that occur as a matter of course.

You can examine the file and adjust it for anything that doesn’t make sense. The format is documented on the GitHub page.

The systemd journal is structured, with each journal entry being a set of key=value records. When configuring journal-brief you can exclude all journal entries with a particular SYSLOG_IDENTIFIER value, or with CODE_FILE and CODE_FUNC both having certain values, and so on.

You can also specifically include journal entries in the same way. This means you can, for example, see all error messages, plus any messages at all from an important systemd unit you want to monitor.

As well as being able to show journal entries as short text lines (as though from a log file), journal-brief can also output JSON, with all the structured fields included. Multiple output formats can be specified at once: in fact, the default is “reboot,short” which means that a “— Reboot —” message is shown whenever the _BOOT_ID of two consecutive entries differs.

There is also a “login” output format. This behaves a little differently than the others. It performs some simple analysis on the journal entries, only showing output at the end. The code is pretty short, so let’s take a look at it.

The base class is EntryFormatter. By inheriting from this class, the output formatter is registered and will show up in the help output.

class LoginFormatter(EntryFormatter):
  """
  Show a summary of login sessions
  """

  FORMAT_NAME = "login"

This allows us to use journal-brief -o login to see the output.

  FILTER_INCLUSIONS = [
    {
      # New session
      'PRIORITY': 'info',
      'MESSAGE_ID': ['8d456...'],
      '_COMM': ['systemd-logind'],
    },
  ]

This is the inclusion filter for journal entries for this output formatter. It will only receive journal entries from systemd-login, and specifically those about new login sessions (that’s what the message ID specifies, shortened here for brevity).

  def __init__(self, *args, **kwargs):
    super(LoginFormatter, self).__init__(*args, **kwargs)
    self.login = defaultdict(int)

The constructor just sets up the data structure we need for counting user logins. This defaultdict (from the collections module) is like a normal dict, but creates new entries with value 0 as needed.

  def format(self, entry):
    if 'USER_ID' not in entry:
      return ''

    self.login[entry['USER_ID']] += 1
    return ''

The format() method is called for each journal entry. This particular journal entry has a USER_ID key indicating the user whose login session has been started. All this output formatter needs to do for each journal entry is increment number of login sessions for the USER_ID. Remember, it will only have to process login session journal entries, because that’s what it specified in its inclusion filter above.

  def flush(self):
    if not self.login:
      return ''

    ret = '\nUser logins:\n\n'
    logins = list(self.login.items())
    logins.sort()
    for user, count in logins:
      ret += '{count:>5} x {user}\n'.format(user=user,
                                            count=count)

    return ret

The flush() method is called when there are no more journal entries to process, and this is when the login output formatter shows the count of logins.

The result of all this is that journal-brief -o reboot,short,login -p err will show all the error messages since the last boot, plus or minus any messages specified in the configuration, with reboot markers when appropriate, and with a summary of logins at the end.

Thought of something this would be useful for, but it doesn’t quite do what you want? Please suggest it, or even write your own output formatter for it!


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.