SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

Nagios is from Mars, and MySQL is from Venus (Monitoring Part 2)

In my previous post about Nagios, I showed how the rich Nagios charm simplifies adding basic monitoring to Juju environments. But, we need more than that. Services know more about how to verify they are working than a monitoring system could ever guess.

So, for that, we have the ‘monitors‘ interface. Currently sparsely documented, as it is extremely new, the idea is simple.

  • Providing charms define what monitoring systems should look at generically
  • Requiring charms monitor any of the things they can, ignoring those they cannot.

This is defined through a YAML file. Here is the example.monitors.yaml included with the nagios charm:

# Version of the spec, mostly ignored but 0.3 is the current one
version: '0.3'
# Dict with just 'local' and 'remote' as parts
monitors:
    # local monitors need an agent to be handled. See nrpe charm for
    # some example implementations
    local:
        # procrunning checks for a running process named X (no path)
        procrunning:
            # Multiple procrunning can be defined, this is the "name" of it
            nagios3:
                min: 1
                max: 1
                executable: nagios3
    # Remote monitors can be polled directly by a remote system
    remote:
        # do a request on the HTTP protocol
        http:
            nagios:
                port: 80
                path: /nagios3/
                # expected status response (otherwise just look for 200)
                status: 'HTTP/1.1 401'
                # Use as the Host: header (the server address will still be used to connect() to)
                host: www.fewbar.com
        mysql:
            # Named basic check
            basic:
                username: monitors
                password: abcdefg123456

There are two main classes of monitors: local, remote. This is in reference to the service unit’s location. Local monitors are intended to be run inside the same machine/container as the service. Remote monitors are then, quite obviously, meant to be run outside the machine/container. So, above, you see remote monitors for the mysql protocol and http, and a local monitor to see if processes are running.

The MySQL charm now includes some of these monitors:

version: '0.3'
monitors:
    local:
        procrunning:
            mysqld:
                name: MySQL Running
                min: 1
                max: 1
                executable: mysqld
    remote:
        mysql:
            basic:
                user: monitors

The remote part is fed directly to nagios, which knows how to monitor mysql remotely, and so translates it into a check_mysql command. The local bits are ignored by Nagios. But, when we also relate the subordinate charm NRPE to a MySQL service, then we’ll have an agent which understands local. It actually converts those into remote monitors of type ‘nrpe’ which Nagios does understand. So upon relating NRPE to Nagios, each subordinate unit feeds its unique NRPE monitors back to Nagios and they are added to the target units’ monitors.

Honestly, this all sounds very complicated. But luckily, you don’t have to really grasp it to take advantage of it in a charm. The whole point is this: All one needs to do is write a monitors.yaml, and add the monitors relation with this joined hook to your charm:

#!/bin/bash
# .. Anything you need to do to enable the monitoring host to access ports/users/etc goes here
relation-set monitors="$(cat monitors.yaml)" target-id=${JUJU_UNIT_NAME//\//-} target-address=$(unit-get private-address)

If you have local things you want to give to a monitoring agent, you can use the ‘local-monitors’ interface, which is basically the same as monitors, but only ever used in container scoped relations required by subordinate charms such as NRPE or collectd.

Now you can easily provide monitors to any monitoring system. If Nagios doesn’t support what you want to monitor, its fairly easy to add support. And as more monitoring systems are charmed and have the monitors interface added, your charm will be more useful out of the box.

In the next post, which will wrap up this series on monitoring, I’ll talk about how to add monitors support to some other monitoring systems such as collectd, and also how to write a subordinate charm to communicate your monitors to an external monitoring service.

September 5, 2012 at 5:50 am Comments (0)

Juju and Nagios, sittin’ in a tree.. (Part 1)

Monitoring. Could it get any more nerdy than monitoring? Well I think we can make monitoring cool again…

 

If you’re using Juju, Nagios is about to get a lot easier to leverage into your environment. Anyone who has ever tried to automate their Nagios configuration, knows that it can be daunting. Nagios is so flexible and has so many options, its hard to get right when doing it by hand. Automating it requires even more thought. Part of this is because monitoring itself is a bit hard to genercise. There are lots of types of monitors. Nagios really focuses on two of these:

  • Service monitoring – Make a script that pretends to be a user and see if your synthetic monitor sees what you expect.
  • Resource monitoring – Look at the counters and metrics afforded a user of a normal system.

The trick is, the service monitoring wants to interrogate the real services from outside of the machine, while the resource monitoring wants to see things only visible with privileged access. This is why we have NRPE, or “Nagios Remote Plugin Executor” (and NSCA, and munin, but ignore those for now). NRPE is a little daemon that runs on a server and will run a nagios plugin script, returning the result when asked by Nagios. With this you get those privileged things like how much RAM and disk space is used. Normally when you want to use Nagios, you need to sit down and figure out how to tell it to monitor all of your stuff. This involves creating generic objects, figuring out how to get your list of hosts into nagios’s config files, and how to get the classifications for said hosts into nagios. Does anybody trying to make sure their pager goes off when things are broken actually want to learn Nagios? So, here’s how to get Nagios in your Juju environment. First lets assume you have deployed a stack of applications.

juju deploy mysql wikidb                # single MySQL db server
juju deploy haproxy wikibalancer        # and single haproxy load balancer
juju deploy -n 5 mediawiki wiki-app     # 5 app-server nodes to handle mediawiki
juju deploy memcached wiki-cache        # memcached
juju add-relation wikidb:db wiki-app:db # use wikidb service as r/w db for app
juju add-relation wiki-app wikibalancer # load balance wiki-app behind haproxy
juju add-relation wiki-cache wiki-app   # use wiki-cache service for wiki-app

This gives one a nice stack of services that is pretty common in most applications today, with a DB and cache for persistent and ephemeral storage and then many app nodes to scale the heavy lifting.

Now you have your app running, but what about when it breaks? How will you find out? Well this is where Nagios comes in:

juju deploy nagios                          # custom nagios charm
juju add-relation nagios wikidb             # monitor wikidb via nagios
juju add-relation nagios wiki-app           # ""
juju add-relation nagios wikibalancer       # ""

You now should have nagios monitoring things. You can check it out by exposing it and then browsing to the hostname of the nagios instance at ‘http://x.x.x.x/nagios3′. You can find out the password for the ‘nagiosadmin’ user by catting a file that the charm leaves for this purpose:

juju ssh nagios/0 sudo cat /var/lib/juju/nagios.passwd

Now, the checks are very sparse at the moment. This is because we have used the generic monitoring interface which can just monitor the basic things (SSH, ping, etc). We can add some resource monitoring by deploying NRPE:

juju deploy nrpe                          # create a subordinate NRPE service
juju add-relation nrpe wikibalancer       # Put NRPE on wikibalancer
juju add-relation nrpe wiki-app           # Put NRPE on wiki-app
juju add-relation nrpe:monitors nagios:monitors # Tells Nagios to monitor all NRPEs

Now we will get memory stats, root filesystem, etc.

You may have noticed we left off wikidb, that is because it will show you an ambiguous relation warning when you try this:

juju add-relation nrpe wikidb # Put NRPE on wikidb

ERROR Ambiguous relation 'nrpe mysql'; could refer to:
  'nrpe:general-info mysql:juju-info' (juju-info client / juju-info server)
  'nrpe:local-monitors mysql:local-monitors' (local-monitors client / local-monitors server)

This is because mysql has special support to be able to specify its own local monitors in addition to those in the usual basic group (more on this in part 2). To get around this we use:

juju add-relation nrpe:local-monitors wikidb:local-monitors

 

This is a perfect example of how Juju’s encapsulation around services pays off for re-usability. By wrapping a service like Nagios in a charm, we can start to really develop a set of best practices for using that service and collaborate around making it better for everyone.

Of course, Chef and Puppet users can get this done with existing Nagios modules. Puppet, in particular, has really great Nagios support. However, I want to take a step back and explain why I think Juju has a place along side those methods and will accelerate systems engineering in new directions.

While there is some level of encapsulation in the methods that Chef and Puppet put forth, they’re not fully encapsulated in the way that they interact with other components in a Chef or Puppet system. In most cases, you still have to edit your own service configs to add specific Nagios integration. This works for the custom case, but it does not make it easy for users to collaborate on the way to deploy well known systems. It will also be hard to swap out components for new, better methods as they emerge. Every time you mention Nagios in your code, you are pushing Nagios deeper into your system engineering.

With the method I’ve outlined above, any charmed service can be monitored for basic stats (including the 80 or so that are in the official charm store). You might ask though, what about custom Nagios plugins, or specifying more elaborate but somewhat generic service checks. That is all coming. I will show some examples in my next post about this. I will also go on later to show how Nagios + NRPE can be replaced with collectd, or some other system, without changing the charms that have implemented rich monitoring support.

So, while this at least starts to bring the official Nagios charm up to par with configuration management’s rich Nagios ability, it also sets the stage for replacing Nagios with other things. The key difference here is that as you’ll see in the next few parts, none of the charms will have to mention “Nagios”. They’ll just describe what things to monitor, and Nagios, Collectd, or whatever other system you have in place will find a way to interpret that and monitor it.

August 7, 2012 at 11:47 pm Comments (0)