Internal documentation only
This page has been marked as draft.
Message Queue issue in Azure Environment
Related to Copied
In the Azure Test Environment, all Opsview collectors are showing as offline despite being operational. The logs reveal recurring issues with the Opsview message queue, including warnings about exceeding maximum queue size and repeated failures to send cluster-health messages. There are also errors indicating message queue connection issues due to access being refused to the vhost ‘/’ for the opsview user. This is causing multiple Opsview components to lose connection.
Problem Copied
The issue in Opsview was caused by a failure in the message queue system (RabbitMQ), specifically the vhost / becoming inaccessible to the opsview user. The collectors appeared offline, and no fresh monitoring data was visible, despite services running. From the system/backend view, logs showed recurring message queue errors, including “max queue size exceeded,” failed cluster-health messages, and lost connections across components like orchestrator, results processors, and notification center. The queue directories had unprocessed messages, and despite all services reporting as “running,” backend communication had effectively broken down.
Possible cause(s) Copied
Root Cause 1: Corrupted RabbitMQ - The vhost / became inaccessible due to corruption in the message store files
opsviewfd daemon: [opsviewfd] Messagequeue failure: ConnectOnClose(): INTERNAL_ERROR - access to vhost '/' refused for user 'opsview': vhost '/' is down
Root Cause 2: Queue Overload or Stagnation - Queues exceeded the maximum allowed size, causing repeated restarts of the outbound pump and failed message propagation. Root Cause 3: Authentication or Permission Failure - RabbitMQ denied access to the opsview user for the vhost /, possibly due to internal misconfiguration or crash.
Possible solution(s) Copied
Solution Root Cause 1: Stop RabbitMQ and remove the corrupted vhost directory to reset the message queue
/opt/opsview/messagequeue/sbin/rabbitmqctl stop_app
rm -rf /opt/opsview/messagequeue/var/mnesia/rabbit@<servername>/msg_stores/vhosts
/opt/opsview/messagequeue/sbin/rabbitmqctl start_app
Solution Root Cause 2: Purge Stuck Queues and Restart Components
/opt/opsview/messagequeue/sbin/rabbitmqctl purge_queue cluster-health
ps -ef | grep orchestrator
pkill -f orchestrator
/opt/opsview/watchdog/bin/opsview-monit restart opsview-orchestrator
Solution Root Cause 3: Verify RabbitMQ Configuration and Permissions
/opt/opsview/messagequeue/sbin/rabbitmqctl list_permissions --formatter json | python -m json.tool > /var/tmp/`date +"%Y%m%d"_mqpermissions.out`
Related article(s) Copied
-
Official documentation.
https://docs.itrsgroup.com/docs/opsview/current/troubleshooting/message-queue/index.html
-
Release notes, compatibility matrix, upgrade notes, etc.
https://docs.itrsgroup.com/docs/opsview/current/support/opsview-support-policy/index.html
-
Existing FAQs.
-
External website links.