Elasticsearch
Overview Copied
Elasticsearch monitoring is a Gateway configuration file that enables monitoring of Elasticsearch Cluster through the Toolkit plug-in.
Elasticsearch is a distributed, search, and analytics engine that is capable of scaling horizontally, allowing to add more nodes to the cluster. This means that it can search and analyze large scale of data.
The elements that make Elasticsearch work are defined as follows:
- Node is a running instance of Elasticsearch that is capable of knowing the location of the document.
- Cluster consists of one or more nodes with the same cluster name that can share their data and load.
Track the following key areas when using Elasticsearch monitoring:
In this Elasticsearch monitoring template, you will see these metrics in your dataview:
- Cluster health
- Indexing performance
- Search performance
- Node and resource information
- Thread pool
Intended audience Copied
This guide is intended for users who are setting up, configuring, troubleshooting and maintaining this integration. This is also intended for users who will be using Active Console to monitor data from Elasticsearch. Once the integration is set up, the samplers providing the dataviews become available to that Gateway.
As a user, you should be familiar with Linux or any other database, and with the administration of the Elasticsearch services.
Prerequisites Copied
The following requirements must be met before the installation and setup of the template:
- A machine running the Netprobe must have access to the host where the Elasticsearch instance is installed and the port Elasticsearch is listening to.
- Netprobe 4.6 or higher.
- Gateway 4.8 or higher.
- Python 2.7 or 3.6 installation on the machine where the Netprobe resides.
- Elasticsearch 6.1.2.
Key Area | Description |
---|---|
Search performance | Determine how the search function perform over time by monitoring the query operations, load or latency, field data cache and evictions. |
Indexing performance | Each shard in the index can be updated through flush and refresh process.
Shard is a container for data that can be either a primary or a replica shard. It is how the Elasticsearch distributes data in the clusters.
|
Cluster health and node availability | Monitors the current state of all clusters and nodes. |
Resource utilisation | Provides information on how the thread pool queues and rejection works in monitoring the bulk, index, merge, and operations. |
System and network metrics | Shows information about every node in the cluster, resource and memory usage, and active connections opened over time. |
Installation procedure Copied
Ensure that you have read and can follow the system requirements prior to installation and setup of this integration template.
-
Download the integration package
geneos-integration-elasticsearch-<version>.zip
from the Downloads site. -
Open Gateway Setup Editor.
-
In the Navigation panel, click Includes to create a new file.
-
Enter the location of the file to include in the Location field. In this example, it is the
include/ElasticsearchMonitoring.xml
. -
Update the Priority field. This can be any value except
1
. If you input a priority of1
, the Gateway Setup Editor returns an error. -
Expand the file location in the Include section.
-
Select Click to load.
-
Click Yes to load the new Elasticsearch include file.
-
Click Managed entities in the Navigation panel.
-
Add the Elasticsearch type to the Managed Entity section that you will use to monitor Elasticsearch.
-
ClickValidate current document to check your configuration.
-
ClickSave current document to apply the changes.
Set up the samplers Copied
These are the pre-configured samplers available to use in include/ElasticsearchMonitoring.xml
.
Configure the required fields by referring to the table below:
Samplers |
---|
Elasticsearch-ClusterHealth |
Elasticsearch-ThreadPool |
Elasticsearch-Resource |
Elasticsearch-NodeInfo |
Elasticsearch-SearchPerf-ByIndex |
Elasticsearch-SearchPerf-ByNode |
Elasticsearch-IndexingPerf-ByIndex |
Elasticsearch-IndexingPerf-ByNode |
Set up the variables Copied
The include/ElasticsearchMonitoring.xml
template provides the following variables that are set in the Environments section:
Variable | Description |
---|---|
ELASTICSEARCHMON_GROUP | Sampler group name. Default: Elasticsearch-Monitoring |
ELASTICSEARCHMON_HOST | IP/Hostname of the Elasticsearch Node. Default: localhost |
ELASTICSEARCHMON_PORT | Port assigned to the Elasticsearch HTTP service . Default: 9200 |
ELASTICSEARCHMON_PYTHON_EXE | Name of the executable script that calls the python code. |
Set up the rules Copied
The ElasticsearchMonitoring-SampleRules.xml
template also provides a separate sample rules that you can use to configure the Gateway Setup Editor.
Your configuration rules must be set in the Includes section. In the Navigation panel, click Rules.
The table below shows the included rule setup in the configuration file:
Rules | Sample Rules |
---|---|
Resource | Elasticsearch-Diskspace |
Elasticsearch-FileDesc | |
Elasticsearch-Cpu | |
ClusterHealth | Elasticsearch-ClusterStatus |
Indexing | Elasticsearch-IndexingLatency |
Elasticsearch-RefreshLatency | |
Elasticsearch-FlushLatency | |
Search | Elasticsearch-QueryLatency |
Elasticsearch-FetchLatency | |
Metrics and dataviews Copied
Elasticsearch cluster health Copied
This monitors the overall health of the cluster by indicating how it is functioning:
Column Name | Description |
---|---|
cluster | Name of the cluster. |
status |
Health status of the cluster:
|
nodeTotal | Total number of nodes in the cluster. |
nodeData | Total number of nodes in the cluster that can store data. |
shardsTotal | Total number of shards. |
shardsInitializing | Number of initialising nodes. |
shardsUnassigned | Number of unassigned shards. |
Elasticsearch indexingPerf-ByIndex Copied
This dataview monitors indexing performance by index. Data is grouped per index:
Column Name | Description |
---|---|
index | Name of the index. |
indexingIndexTotal | Total number of indexing operations. |
indexingIndexTime | Time spent in indexing. Unit: millisecond (ms) |
indexingIndexCurrent | Number of current indexing operations. |
refreshTotal | Total number of refreshes. |
refreshTime | Time spent in refresh operations. Unit: millisecond (ms) |
flushTotal | Total number of flushes. |
flushTotalTime | Time spent in flushes. Unit: millisecond (ms) |
averageIndexingLatency | Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal. Unit: millisecond (ms) per indexing operation |
averageRefreshLatency | Average time spent in refresh operations. This is computed from refreshTime / refreshTotal. Unit: millisecond (ms) per refresh |
averageFlushLatency | Average time spent in flush operations. This is computed from flushTotalTime / flushTotal. Unit: millisecond (ms) per flush |
Elasticsearch indexingPerfp-ByNode Copied
This monitors indexing performance by node. Data is grouped per node:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name of the node. |
indexingIndexTotal | Total number of indexing operations. |
indexingIndexTime | Time spent in indexing. Default: millisecond (ms) |
indexingIndexCurrent | Number of current indexing operations. |
refreshTotal | Total number of refreshes. |
refreshTime | Time spent in refresh operations. Unit: millisecond (ms) |
flushTotal | Total number of flushes. |
flushTotalTime | Time spent in flushes. Unit: millisecond (ms) |
averageIndexingLatency | Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal. Unit: millisecond (ms) per indexing operation |
averageRefreshLatency | Average time spent in refresh operations. This is computed from refreshTime / refreshTotal. Unit: millisecond (ms) per refresh |
averageFlushLatency | Average time spent in flush operations. This is computed from flushTotalTime / flushTotal. Unit: millisecond (ms) per flush |
Elasticsearch nodeInfo Copied
This displays information about the nodes in the cluster:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name of the node. |
IP | IP address. |
port | Bound transport port. |
http | Bound http address and port. |
version | Elasticsearch version. |
build | Elasticsearch build hash. |
jdk | JDK version. |
nodeRole |
Role of the node. This can have more than one value:
|
master |
Current master node in the cluster:
|
Elasticsearch resource Copied
This monitors the resources of each node in the cluster:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name of the node. |
cpu | CPU usage in percentage (%). |
heapCurrent | Current heap usage. Unit: bytes |
heapPercent | Percent used heap. |
ramCurrent | Current RAM usage. Unit: bytes |
ramPercent | Percent RAM used. |
diskUsed | Used disk space. Unit: bytes |
diskAvail | Available disk space. |
diskUsedPercent | Percent disk used. |
fileDescriptorCurrent | Number of used file descriptors. |
fileDescriptorPercent | Percent file descriptors used. |
Elasticsearch SearchPerf-ByIndex Copied
This monitors search performance by index. Data is grouped per index:
Column Name | Description |
---|---|
index | Name of the index. |
searchQueryTotal | Number of query phase operations. |
searchQueryTime | Time spent in query phase. Default: millisecond (ms) |
searchQueryCurrent | Number of current query phase operations. |
searchFetchTotal | Number of fetch phase operations. |
searchFetchTime | Time spent in fetch phase. Default: millisecond (ms) |
searchFetchCurrent | Number of current fetch phase operations. |
fielddataMemory | Used fielddata cache. |
fielddataEvictions | Used fielddata evictions. |
averageQueryLatency | Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal. Default: millisecond (ms) per query |
averageFetchLatency | Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal. Default: millisecond (ms) per fetch |
Elasticsearch searchPerf-ByNode Copied
This monitors search performance by node. Data is grouped per node:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name assigned to the node. |
searchQueryTotal | Number of query phase operations. |
searchQueryTime | Time spent in query phase. Unit: millisecond (ms) |
searchQueryCurrent | Number of current query phase operations. |
searchFetchTotal | Number of fetch phase operations. |
searchFetchTime | Time spent in fetch phase. Unit: millisecond (ms) |
searchFetchCurrent | Number of current fetch phase operations. |
fielddataMemory | Used fielddata cache. |
fielddataEvictions | Used fielddata evictions. |
averageQueryLatency | Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal. Unit: millisecond (ms) per query |
averageFetchLatency | Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal. Unit: millisecond (ms) per fetch |
Elasticsearch ThreadPool Copied
This monitors the bulk, index, and search thread pools of each node in the cluster:
Column Name | Description |
---|---|
node_id/name | Node ID/Thread Pool Name. |
node_name | Name of the node. |
name | Thread Pool name. |
type | Thread Pool Type. |
active | Number of active threads. |
queue | Number of tasks currently in queue. |
rejected | Number of rejected tasks. |
size | Number of threads. |
queue_size | Size of the queue with pending requests that have no threads to execute. |