Back to ITRS Internal Only FAQ

Internal documentation only

This page has been marked as draft.

Message Queue issue in Azure Environment

In the Azure Test Environment, all Opsview collectors are showing as offline despite being operational. The logs reveal recurring issues with the Opsview message queue, including warnings about exceeding maximum queue size and repeated failures to send cluster-health messages. There are also errors indicating message queue connection issues due to access being refused to the vhost ‘/’ for the opsview user. This is causing multiple Opsview components to lose connection.

Problem Copied

The issue in Opsview was caused by a failure in the message queue system (RabbitMQ), specifically the vhost / becoming inaccessible to the opsview user. The collectors appeared offline, and no fresh monitoring data was visible, despite services running. From the system/backend view, logs showed recurring message queue errors, including “max queue size exceeded,” failed cluster-health messages, and lost connections across components like orchestrator, results processors, and notification center. The queue directories had unprocessed messages, and despite all services reporting as “running,” backend communication had effectively broken down.

Possible cause(s) Copied

Root Cause 1: Corrupted RabbitMQ - The vhost / became inaccessible due to corruption in the message store files

opsviewfd daemon: [opsviewfd] Messagequeue failure: ConnectOnClose(): INTERNAL_ERROR - access to vhost '/' refused for user 'opsview': vhost '/' is down

Root Cause 2: Queue Overload or Stagnation - Queues exceeded the maximum allowed size, causing repeated restarts of the outbound pump and failed message propagation. Root Cause 3: Authentication or Permission Failure - RabbitMQ denied access to the opsview user for the vhost /, possibly due to internal misconfiguration or crash.

Possible solution(s) Copied

Solution Root Cause 1: Stop RabbitMQ and remove the corrupted vhost directory to reset the message queue

/opt/opsview/messagequeue/sbin/rabbitmqctl stop_app
rm -rf /opt/opsview/messagequeue/var/mnesia/rabbit@<servername>/msg_stores/vhosts
/opt/opsview/messagequeue/sbin/rabbitmqctl start_app

Solution Root Cause 2: Purge Stuck Queues and Restart Components

/opt/opsview/messagequeue/sbin/rabbitmqctl purge_queue cluster-health
ps -ef | grep orchestrator
pkill -f orchestrator
/opt/opsview/watchdog/bin/opsview-monit restart opsview-orchestrator

Solution Root Cause 3: Verify RabbitMQ Configuration and Permissions

/opt/opsview/messagequeue/sbin/rabbitmqctl list_permissions --formatter json | python -m json.tool > /var/tmp/`date +"%Y%m%d"_mqpermissions.out`
["Geneos"] ["FAQ"]

Was this topic helpful?