OP5 Monitor ["OP5 Monitor"]
["OP5 Monitor > Slim Poller", "OP5 Monitor > Poller"]["User Guide"]

Scale up your monitoring environment

Overview

The procedures below explain how to scale up your monitoring environment by adding new servers to create a load-balanced environment, a distributed environment, or a combination of both. The server types you can add are:

  • Peers, for load-balancing or redundancy.
  • Pollers, for distributed monitoring; you can add both ordinary pollers and Slim Pollers.

For an explanation of the types of monitoring environment you can create, see Scalable monitoring.

Check cluster state information

When you add new servers to scale up an existing environment, it is important to check that the environment is stable. You can check the state of the current cluster by using OP5 Monitor's integrated command-line back-end tool mon, as follows:

mon node status

OP5 Monitor displays all known nodes, including the local node, peers, and pollers, with their current state. A properly synchronised and online cluster displays all nodes as active. It displays any problems in red.

For more information on the mon tool, see Mon command reference.

Prerequisites

In any distributed or load-balanced monitoring environment, all servers must have:

  • The same operating system version, with the same 32 or 64 bit architecture.
  • The same version of OP5 Monitor installed.

Before you begin

Before you add a new poller or peer to your environment, you must ensure that you:

  • Have suitable servers installed with the same architecture and OP5 Monitor version as the other servers in the environment, with the following configuration:
    • The following TCP ports open on the poller nodes, to allow master nodes to successfully communicate with poller nodes:
      • 22 (SSH) — used for distributing configuration from master to poller nodes.
      • 15551 (Merlin) — used for state communication, such as check results. You can encrypt connections on the Merlin port. For more information, see Set up encrypted Merlin.
    • All server names resolvable by DNS, or manually using /etc/hosts.
    • All server system clocks synchronised, preferably by NTP. For more information, see Install NTP and synchronise servers in Additional server and software setup.
    • All other mandatory and recommended configuration completed on the hosts. For more information, see Additional server and software setup.
  • Create all host groups for which each poller and Slim Poller will be responsible on the master, containing at least one host, with at least one contact and one service. For more information, see Manage hosts and services.

Tip: A simple way to set up servers with the same server configuration as other servers in the same environment is to clone them and subsequently update the host details.

Restrict shell access to nodes

When logged in to the server, an unprivileged user can read sensitive files, or even read and write to databases used by OP5 Monitor. It is also possible for unprivileged users to affect the monitoring process directly, thereby circumventing any access control by the OP5 Monitor software.

Check that only users with full administrative privileges have shell access (for example, via SSH) to ITRS OP5 Monitor nodes.

Caution: Granting user permissions for Test this host and Test this service effectively grants shell access to the monitoring host. For more information, see Update user permissions in Manage users, contacts, and permissions.

Add a new peer

You can set up peering between two nodes for load-balancing or redundancy. Note that ITRS only supports peering between nodes in the same location, and preferably on the same rack.

Caution: Do not duplicate host names. If you are merging two previously independent OP5 Monitor nodes as peers, they will by default have themselves listed as an object called monitor. Before starting this configuration, give both of these monitor objects more descriptive names. If you fail to do this, SNMPv3 monitoring will break, as one key pair will disappear when merging these identical objects. If you encounter this issue, you must resolve it by creating a new SNMP user. For guidance, see Configure an SNMPv3 user in Additional server and software setup.

Add a new peer to another OP5 Monitor server

In this procedure we will set up a load-balanced monitoring environment with two peered nodes.

Note that in the command line examples below, in scenarios where you are converting an already running standalone server to a load balanced setup:

  • peer01 is your existing OP5 Monitor server.
  • peer02 is the new peer.

Caution: It is essential to get this right to avoid pushing the new peer's empty host and service object configuration to the existing server and overwriting your configuration. If in doubt, please contact ITRS Support.

Configure the new peer

  1. Log on to the new peer as root, using SSH.
  2. Add the existing peer to the new peer's configuration:
    mon node add peer01 type=peer
  3. Set up SSH connectivity towards all the new peer's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer

Configure the existing peer

  1. Log on to the existing peer as root, using SSH.
  2. Add the new peer to the existing peer's configuration:
    mon node add peer02 type=peer
  3. Set up SSH connectivity towards all the existing peer's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer

Push the existing peer's configuration to the new peer

  1. Push the existing peer's configuration to the new peer:
    asmonitor mon oconf push peer02
  2. Restart OP5 Monitor on all nodes:
    mon node ctrl --self -- mon restart
  3. After a few minutes, check that the peers are fully connected and synchronised. For guidance, see Check cluster state information.

Copy file status.sav from the existing peer to the new peer

You only need to perform this procedure if the master was already running. It ensures that the servers are coordinated on details such host and service comments, acknowledgements, and scheduled downtimes issued on the original master before the new peer was added.

  1. Stop the monitor service on both peers:
    mon stop
  2. Log on to the existing peer and copy file status.sav to the new peer:
    scp /opt/monitor/var/status.sav peer02:/opt/monitor/var/status.sav
  3. Start OP5 Monitor on both peers:
    mon start

Add a new peer to an existing load-balanced setup

In the command line examples below:

  • peer01 and peer02 are your existing peers.
  • peer03 is the new peer.
  1. Log on to the new peer as root, using SSH.
  2. Add the previously existing peers to the new peer:
    mon node add peer01 type=peer
    mon node add peer02 type=peer
  3. Set up SSH connectivity towards all the new peer's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  4. Add the new peer to all other nodes:
    mon node ctrl --type=peer mon node add peer03 type=peer
  5. Log on to the first existing peer in as root, using SSH, and set up the SSH connectivity towards all its configured peers, including the new peer:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  6. Log on to the second existing peer as root, using SSH, and set up the SSH connectivity towards all its configured peers, including the new peer:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  7. On either existing peer, push the server configuration to the new peer:
    asmonitor mon oconf push peer03
  8. On any of the three peers, trigger a full OP5 Monitor restart on all nodes:
    mon node ctrl --self -- mon restart
  9. After a few minutes, check that the peers are fully connected and synchronised. For guidance, see Check cluster state information.

Remove a peer

In this procedure, we will remove a peer from an existing load-balanced setup.

Note that this procedure only removes the peer from the configuration of all other peers, it does not remove the peer's monitoring configuration; it will continue to monitor the same hosts and services as its former peers, but running in standalone mode.

In the command line examples below, peer01 is the peer you are removing.

  1. Log on to the peer you want to remove as root, using SSH.
  2. Remove the peer from all other peers:
    mon node ctrl --type=peer mon node remove peer01\; mon restart
  3. Remove all local configuration:
    mon node remove $(mon node list --type=peer) 
  4. Restart OP5 Monitor:
    mon restart

Add a new poller

The following procedures explain how to add a poller to an existing master in a distributed setup.

In the command line examples below:

  • master01 is your existing master.
  • poller01 is the new poller.
  • se-gbg is the host group poller01 will monitor.

The following steps explain how to add the poller to the master.

Before you begin

Before you run the master server configuration commands, ensure that any peers are fully connected and synchronised. For guidance, see Check cluster state information.

Configure the master

In a load-balanced environment with peered masters, you must perform these steps on the master and all of its peers.

  1. Log in as root, using SSH.
  2. Verify that the host group exists and print its current host members:

    mon query ls hostgroups -c members name=se-gbg

    Caution: Assigning a non-existent host group to a poller will cause OP5 Monitor to crash.

  3. Add the new poller to the configuration:

    mon node add poller01 type=poller hostgroup=se-gbg takeover=no

    Note: Before you continue, ensure that the master can resolve the poller name by DNS or manually using /etc/hosts.

  4. Set up SSH connectivity between the master and the poller:
    1. Confirm that OpenSSH's client and server parts are installed:
      rpm -qa | grep openssh
    2. If they are not installed, install them:
      yum install openssh
      
    3. Edit file /etc/ssh/sshd_config with a text editor, and check that PasswordAuthentication is set to yes:
      edit /etc/ssh/sshd_config
    4. After you have saved your changes, push the SSH configuration to the poller:
      mon sshkey push poller01 
      asmonitor mon sshkey push poller01
  5. (Optional) For increased security, ITRS recommends creating the public SSH key for the OP5 Monitor user and placing it within the authorized_keys file. This enables OP5 Monitor to securely exchange configuration data without first requiring root keys to be exchanged. To do this, log on to both the master and the poller as root, using SSH, and run these commands:
    mkdir -p /opt/monitor/.ssh
    chown monitor:root /opt/monitor/.ssh
    su monitor
    ssh-keygen -t rsa
    scp ~/.ssh/id_rsa.pub root@poller.company.com:
    mkdir -p /opt/monitor/.ssh
    chown monitor:root /opt/monitor/.ssh
    mv ~/id_rsa.pub /opt/monitor/.ssh
    su monitor
    touch ~/.ssh/authorized_keys
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    more ~/.ssh/authorized_keys
    rm ~/id_rsa.pub
  6. On the master, add the master to the poller's configuration:

    mon node ctrl poller01 mon node add master01 type=master connect=no

Push the configuration

In a load-balanced environment with peered masters, perform these steps on the master server only.

  1. Restart Naemon on the master:

    systemctl restart naemon

  2. Push the configuration from the master to the new poller:

    asmonitor mon oconf push poller01

  3. Restart OP5 Monitor on the new poller:

    mon node ctrl poller01 mon restart

  4. Restart OP5 Monitor on the master and all its peers:

    mon node ctrl --self --type=peer mon restart

Add a new Slim Poller

The Slim Poller is a scaled-down version of the poller. For more information on its contents and limitations, see Slim Poller in Scalable monitoring.

For more information on the setup, prerequisites, and features of Slim Poller see Set up Slim Poller.

Add a new host group to a poller

A poller can monitor several host groups. This procedure is the simplest way of increasing a poller's scope.

Before you begin

Before you run the master server configuration commands, ensure that:

  • All peers (if applicable) are fully connected and synchronised. For more information, see Check cluster state information.
  • The host group you are adding to the poller already exists.

Configure the master server

In a load-balanced environment with peered masters, you must perform these steps on the master and all of its peers.

  1. Log on as root, using SSH.
  2. Edit file /opt/monitor/op5/merlin/merlin.conf using a text editor. For example:

    edit /opt/monitor/op5/merlin/merlin.conf

  3. Find the configuration block related to the poller and append the new host group to the hostgroup setting value, prefixed by a comma. For example, to add a host group called op5-hg2, change the line from:
    hostgroup = op5-hg1

    to:

    hostgroup = op5-hg1,op5-hg2

    Caution: Host groups must be comma-separated only, without any white space. If you fail to do this, error Incompatible object config (sync triggered) may occur during Naemon restart. Remember that adding a non-existent host group will also cause OP5 Monitor to crash.

  4. After you have saved your changes, restart OP5 Monitor:
    mon restart

Remove a poller

In this procedure we will remove a poller called poller01.

The poller will be removed from the master's configuration, and then all distributed configuration on the poller will be removed.

In a load-balanced environment with peered masters, perform these steps on the master server only.

  1. Log on to the master as root, using SSH.
  2. Remove the poller from the configuration on all masters:

    mon node ctrl --self --type=peer mon node remove poller01

  3. Restart OP5 Monitor on all masters:

    mon node ctrl --self --type=peer mon restart

  4. Restart OP5 Monitor on the poller:

    mon node ctrl poller01 mon restart

Configure poller notifications through the master

Sending notifications directly from the poller is not always possible, for example, if the SMS or SMTP gateway does not exist or is inaccessible to the poller. In such scenarios it is possible to send notifications through the master instead.

Configure the master

In a load-balanced environment with peered masters, perform these steps on the master and all of its peers.

  1. Log on to the master as root, using SSH.
  2. Edit file /opt/monitor/op5/merlin/merlin.conf using a text editor. For example:

    edit /opt/monitor/op5/merlin/merlin.conf

  3. Find the configuration block related to the poller and insert the option notifies = no at the end of the block:
  4. poller poller01 {
    address = 192.0.2.50
    port = 15551
    takeover = no
    notifies = no
    }
  5. After you have saved your changes, restart OP5 Monitor:
  6. mon restart

Configure the poller

  1. Log on to the poller as root, using SSH.
  2. Edit the file /opt/monitor/op5/merlin/merlin.conf using a text editor. For example:

    edit /opt/monitor/op5/merlin/merlin.conf

  3. Find the module configuration block and insert the option notifies = no at the end of the block:
  4. module {
    log_file = /var/log/op5/merlin/neb.log;
    notifies = no
    }
  5. After you have saved your changes, restart OP5 Monitor:

    mon restart

Set up file and directory synchronisation

OP5 Monitor has limited support for synchronising files between peers and between masters and pollers.

For example, when you add a new user to OP5 Monitor on one of your masters, you can use this feature to automatically synchronise the user database files on all other peers and pollers.

Synchronisation types

There are two different types of synchronisation:

  • Peered masters synchronising files with each other (two-way).
  • Masters synchronising files to pollers (one-way).

You configure both types in the same way. The example and the procedure described below apply to both of these cases.

Permissions limitations

Files are synchronised using the monitor system user, not root. This means that:

  • Files and directories set up for synchronisation must be readable and owned by the monitor user. For instance, root-only readable files cannot be synchronised.
  • All file paths and their corresponding directories must be writable by the monitor user on the destination node.

How synchronisation is triggered

File and directory synchronisation occur during a configuration push, which is triggered when a new configuration is saved in the user interface. For example, when you add a new host in OP5 Monitor, saving this new configuration triggers synchronisation.

OP5 Monitor only triggers a configuration push to pollers if the new configuration affects objects on the poller. You can trigger a manual configuration push using the command:

asmonitor mon oconf push

Configure synchronisation

For file synchronisation between peers, you need to repeat the procedure on all peers.

In this example:

  • The master will synchronise files to its poller, poller01.
  • The following files will be synchronised:
    • /etc/op5/auth_users.yml
    • /etc/op5/auth_groups.yml
  • The contents of the following directory will also be synchronised:
    • /opt/plugins/custom/

To configure synchronisation:

  1. Log on to the source node (master) as root, using SSH.
  2. Edit file /opt/monitor/op5/merlin/merlin.conf using a text editor:
    edit /opt/monitor/op5/merlin/merlin.conf 
  3. Find the configuration block related to the destination node, in this case poller01. Within this block, insert a new sync sub-block, saving your changes when you are done.

    poller poller01 {
    hostgroup = se-gbg
    address = 192.0.2.50
    port = 15551
    takeover = no
    sync {
    /etc/op5/auth_users.yml
    /etc/op5/auth_groups.yml
    /opt/plugins/custom/
    }
    }

    Note: The trailing slash at the end of /opt/plugins/custom/ in the example above indicates that the contents of the directory must be synchronised, rather than the directory itself. This is the recommended way of synchronising directories.

Synchronisation commands

mon oconf fetch

It is possible to sync files between OP5 Monitor nodes.

By default, files are pushed from one node to another. However, it is also possible to fetch files from remote. For example, a poller could be set up to fetch custom plugins from a master server.

To set up file fetching, first you must set up the node to fetch from a master. Then, you need to add a sync section.

Files from the sync section are only synced when using the --sync argument with the mon oconf fetch command. An example configuration for poller.conf can be seen below:

master master {
  address = IP_ADDRESS
  port = 15551
  sync {
    /opt/plugins/custom/
  }
  object_config {
    fetch_name = poller
    fetch = mon oconf fetch --sync master
  }
}

Files are only synced when a Naemon configuration change is done. If you need to trigger it in other situations, see mon oconf remote-fetch.

mon oconf remote-fetch

Caution: This command will only work if the remote node is correctly configured to fetch from the node that calls mon oconf remote-fetch. For guidance in setting this up, see How to configure a "passive" poller to work behind NAT.

For Slim Poller, this configuration is set up automatically.

The command mon oconf remote-fetch will tell a remote node to do a fetch against the current node. This command can be useful if you want to manually trigger the poller to fetch a new file; for example, if you have added a new custom plugin.

It is possible to trigger a fetch on a specific node:

mon oconf remote-fetch poller-name

Or a type of node:

mon oconf remote-fetch type=poller

Command usage:

remote-fetch     [--type=<peer|poller> [<node>]
       Tells a specific node to fetch split configuration from this node.

       NOTE: A configuration variable called "fetch_name" is
       required in the object_config section of merlin.cfg on the
       remote node. The variable should be set to the name of the node,
       as seen by the master.

UUID identification

Beginning OP5 Monitor 8.3.x, you can enable node identification using UUID instead of using IP. This can be useful if your TCP packets have a non-unique outgoing IP address, such as when behind a NAT or if your nodes' incoming IP addresses and outgoing IP addresses differ. This is also useful for setting up multiple Slim Pollers in Kubernetes.

There are two new Merlin settings used to configure the UUID:

  • ipc_uuid — set the UUID of a specific node. This is a top-level configuration in /opt/monitor/op5/merlin.conf.

  • uuid — identify a connecting node with a given UUID. This must be set in the node configuration.

A UUID must have a length of 36 characters. To generate a well-formed UUID, you can use mon id generate.

The following are examples for a master and a poller UUID configuration.

master merlin.conf example:

....
poller op5-slim-poller-ssh-65799d958f-h9wjt {
        uuid = de5c4eb9-dc9e-4b53-831c-246d254ad39e
        hostgroup = k8s_group
        address = IP_ADDRESS
        port = 15551
}

poller merlin.conf example:

log_level = info;
use_syslog = 1;
ipc_uuid = de5c4eb9-dc9e-4b53-831c-246d254ad39e
...
master master {
  address = MASTER_IP
  port = 15551
  connect = no
}

In the above case, the master identifies the poller with its UUID. However, the poller is using regular IP identification to identify connections from the master. This is common in a passive poller mode, where the master does not do active connections to the poller.

If you want to use UUID to identify both components, you would need to add ipc_uuid to the master merlin.conf file, and the corresponding uuid setting in the master node configuration on the poller.