Back to OP5 Monitor FAQ

How to analyze sysload issues? Troubleshooting high runq-sz, and iops/tps values

The following table is taken from Wikipedia:

Device Type IOPS Interface
5,400 rpm SATA drives HDD ~50–80 IOPS SATA 3 Gbit/s
7,200 rpm SATA drives HDD ~75–100 IOPS SATA 3 Gbit/s-SAS 12Gbps
10,000 rpm SAS drives HDD ~125–150 IOPS SAS
15,000 rpm SAS drives HDD ~175–210 IOPS SAS

If you are experiencing high sysload coupled with low CPU usage, it’s possible you need to turn your attention to the tps value produced by:

## iostat -cd 60

This command will give you output similar to the following, every 60 seconds:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          23.45   11.40   12.74    0.35    0.02   52.03

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
xvdj              0.03         0.30         7.00     232578    5465744
xvde            112.75        39.67      2510.80   30952562 1959249512

As you can see in this example, xvde is seeing ~113 transfers per second. This amount of transfers would not be extraordinary in a case where you are frequently running thousands of checks, and the results of these are written to disk.

Another indication of this type of issue could be the output of:

## sar -q

Example:

00:00:01 runq-sz %runocc swpq-sz %swpocc
00:05:02    26.4      72     0.0       0
00:10:02    25.9      71     0.0       0
00:15:02    27.4      73     0.0       0
00:20:01    27.3      62     0.0       0
00:25:01    25.5      66     0.0       0

The common guidance for the runq-sz value seems to be:

The number of kernel threads in memory that are waiting for a CPU to run. Typically, this value should be less than 2. Consistently higher values mean that the system might be CPU-bound.

["Geneos"] ["FAQ"]

Was this topic helpful?