How to monitor Microsoft SQL-Server
Microsoft SQL Server is one of the most used databases today. Because it often has an important role in your business, you need to make sure you get alerted if there is a problem with your databases. This article describes how OP5 Monitor can monitor both important environmental parts of the operating system and how to perform queries to the database. In this document, we assume that Microsoft SQL Server is the main application running on the server.
This will be done Copied
The suggested configuration components for monitoring Microsoft SQL Server are:
- Basic checks on the server like CPU, memory, disc usage, swap usage etc.
- Advanced checks of the operating system by checking several performance counters in the operating system
- Queries to databases
- Checking backup jobs
Concerning SQL 2012 Copied
To use it with Microsoft SQL server 2012 or later, you will have to change the TDS version in “/etc/freetds.conf” to “8.0”.
Check commands Copied
Add the required check commands, if they don’t already exist in your configuration, with the import functionality in OP5 Monitor (’Configure’ -> ‘Commands’ -> ‘Check Command Import’):
List of commands Copied
| Plugin name | check_command |
|---|---|
| check_nt | check_mssql_db_file_size |
| check_mssql_log_file_size | |
| check_nt_memory_page_per_sec | |
| check_nt_physical_disk_time | |
| check_mssql_num_deadlocks | |
| check_mssql_num_user_connections | |
| negate | check_mssql_cache_hit_ratio |
| check_sql | check_mssql_backup_job |
| check_mssql_query_string_regex | |
| check_mssql_query_count | |
| check_mssql_query_reponstime |
Description of the commands Copied
In the table below are descriptions of the so-called performance check_commands used to monitor a Microsoft SQL Server.
| command_name | description |
|---|---|
| check_mssql_db_file_size and check_mssql_log_file_size | - Disk usage is something we always check. With this check_command you will be able to monitor the size of the data and log files size. - If you use ‘_total’ as $ARG1$ you will get the sum of all data or log files. - Warning and Critical is measured in KB. |
| check_nt_memory_page_per_sec | - This counter will measure pages per second that is fetched from RAM to the hard drive or vice versa. - Normally this value should be 0 - Values between 1 to 20 should not be a problem but as usual everything is relative - Values over 20, during a longer time period, mostly means that you need to add more RAM to the server |
| check_nt_physical_disk_time | - This counter measures how busy a physical array is, not any logical partition or an individual disc in the array. - It will give you a pretty good knowledge about how busy your disk array is. - The value should not be over 55% on the server. If it is over 55% during several 10 minutes periods over 24 hours it will be an indication of I/O problems on the server |
| check_mssql_num_deadlocks | - Deadlocks appear when two processes locked one “pieces of data” each, both try to lock the same “new piece” of data at the same time. Every involved process will be waiting indefinitely for the other process to release the lock. Microsoft SQL Server will detect and kill one of the processes which means that one process will lose data - OP5 Monitor will send a Critical alert as soon as a deadlock appears |
| check_mssql_num_user_connections | - This check_command will count the number of connections. - If the number of connections is too high you might run into problems and so you may need to increase the number of threads on the running server. |
| check_mssql_cache_hit_ratio | - Buffer Cache Hit Ratio counter indicates of how often your Microsoft SQL Server goes to the Buffer instead of the hard drive when searching for data. - The value should be over 90% - We may only use either Warning or Critical here so in the default config we have chosen Critical. |
The table below describes the check_commands you may use when monitoring Microsoft SQL Server by queries.
| command_name | description |
|---|---|
| check_mssql_backup_job | - This check_command is one way to monitor the internal backup jobs of your Microsoft SQL Server. State OK will be true if the string “The job succeeded.” is found in the query made to the MSDB database - State Critical will be returned if the string “The job succeeded” is not found or if the last backup job is older than 24 hours |
| check_mssql_query_string_regex | - This check_command takes a given query and will search for a given search string in the result set and alert if not found. - You may use regular expressions when searching for a string in the first cell in the first row of the result. |
| check_mssql_query_count | - This check_command takes a given query and returns the number of hits. |
| check_mssql_query_reponstime | - This check_command will report the response time of a given query. |
Adding the services Copied
Add the required services, (‘Configure’ -> ‘Host:
When you added the host you had probably already added services like CPU usage, mem usage, disk usage and so on. The table below describes services you probably want to add. (Arguments are just examples, you need to adjust them to suit your environment).
Performance services Copied
| service_description | check_command | check_command_args |
|---|---|---|
| MSSQL Services | check_nt_service | SQLSERVERAGENT,MSSQLSERVER |
| MSSQL DB File Size OP5 | check_mssql_db_file_size | 1024!1256 |
| MSSQL Log File Size OP5 | check_mssql_log_file_size | 1024!1256 |
| NT Memory Page/SecP | check_nt_memory_page_per_sec | 20!30 |
| NT Physical Disk Time | check_nt_physical_disk_time | 45!55 |
| MSSQL Cache Hit Ratio | check_mssql_cache_hit_ratio | 90!80 |
| MSSQL Num Deadlocks | check_mssql_num_deadlocks | 1 |
| MSSQL Num User Connections | check_mssql_num_user_connections | 200!250 |
Query services Copied
| service_description | check_command | check_command_args |
|---|---|---|
| MSSQL Backup job ? master | check_mssql_backup_jobn | user!passwd!job name |
| MSSQL Select String | check_mssql_query_string_regex | user!password!db!select field from table where?!.*The job succeeded..* |
| MSSQL query count | check_mssql_query_count | user!password!db!select field from table where?!10!20 |
| MSSQL respons time | check_mssql_query_reponstime | user!password!db!select field from table where?!5!10 |
Use the “Test this service” button for the services to see if they work. Once they are correct and working as they should, you may add the services to all of your Microsoft SQL Servers with the clone function.
Monitoring a cluster Copied
It is not unusual that a Microsoft SQL Server is running in a clustered environment. Of course we can monitor your Microsoft SQL Server in that kind of environment too. Please read the how-to about Monitoring Microsoft clustered servers.