Elasticsearch Monitoring Technical Reference
Overview
Elasticsearch monitoring is a Gateway configuration file that enables monitoring of ElasticsearchCluster through the Toolkit plug-in.
Elasticsearch is a distributed, search and analytics engine that is capable of scaling horizontally, allowing to add more nodes to the cluster. This means that it can search and analyze large scale of data.
The elements that make Elasticsearch work are defined as follows:
- Node is a running instance of Elasticsearch that is capable of knowing the location of the document.
- Cluster consists of one or more nodes with the same cluster name that can share their data and load.
Track the following key areas when using Elasticsearch monitoring:
Key Area | Description |
---|---|
Search performance | Determine how the search function perform over time by monitoring the query operations, load or latency, field data cache and evictions. |
Indexing performance | Each shard in the index can be updated through flush and refresh process.
Shard is a container for data that can be either a primary or a replica shard. It is how the Elasticsearch distributes data in the clusters.
|
Cluster health and node availability | Monitors the current state of all clusters and nodes. |
Resource utilisation | Provides information on how the thread pool queues and rejection works in monitoring the bulk, index, merge, and operations. |
System and network metrics | Shows information about every node in the cluster, resource and memory usage, and active connections opened over time. |
In this Elasticsearch monitoring template, you will see these metrics in your dataview:
- Cluster health
- Indexing performance
- Search performance
- Node and resource information
- Thread pool
Intended audience
This technical reference is intended for users who will be using Active Console to monitor data from Elasticsearch. If you are setting up the integration for the first time, see Elasticsearch Monitoring User Guide.
Metrics and dataviews
Elasticsearch cluster health
This monitors the overall health of the cluster by indicating how it is functioning:
Column Name | Description |
---|---|
cluster | Name of the cluster. |
status |
Health status of the cluster:
|
nodeTotal | Total number of nodes in the cluster. |
nodeData | Total number of nodes in the cluster that can store data. |
shardsTotal | Total number of shards. |
shardsInitializing | Number of initialising nodes. |
shardsUnassigned | Number of unassigned shards. |
Elasticsearch indexingPerf-ByIndex
This dataview monitors indexing performance by index. Data is grouped per index:
Column Name | Description |
---|---|
index | Name of the index. |
indexingIndexTotal | Total number of indexing operations. |
indexingIndexTime | Time spent in indexing. Unit: millisecond (ms) |
indexingIndexCurrent | Number of current indexing operations. |
refreshTotal | Total number of refreshes. |
refreshTime | Time spent in refresh operations. Unit: millisecond (ms) |
flushTotal | Total number of flushes. |
flushTotalTime | Time spent in flushes. Unit: millisecond (ms) |
averageIndexingLatency | Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal. Unit: millisecond (ms) per indexing operation |
averageRefreshLatency | Average time spent in refresh operations. This is computed from refreshTime / refreshTotal. Unit: millisecond (ms) per refresh |
averageFlushLatency | Average time spent in flush operations. This is computed from flushTotalTime / flushTotal. Unit: millisecond (ms) per flush |
Elasticsearch indexingPerfp-ByNode
This monitors indexing performance by node. Data is grouped per node:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name of the node. |
indexingIndexTotal | Total number of indexing operations. |
indexingIndexTime | Time spent in indexing. Default: millisecond (ms) |
indexingIndexCurrent | Number of current indexing operations. |
refreshTotal | Total number of refreshes. |
refreshTime | Time spent in refresh operations. Unit: millisecond (ms) |
flushTotal | Total number of flushes. |
flushTotalTime | Time spent in flushes. Unit: millisecond (ms) |
averageIndexingLatency | Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal. Unit: millisecond (ms) per indexing operation |
averageRefreshLatency | Average time spent in refresh operations. This is computed from refreshTime / refreshTotal. Unit: millisecond (ms) per refresh |
averageFlushLatency | Average time spent in flush operations. This is computed from flushTotalTime / flushTotal. Unit: millisecond (ms) per flush |
Elasticsearch nodeInfo
This displays information about the nodes in the cluster:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name of the node. |
IP | IP address. |
port | Bound transport port. |
http | Bound http address and port. |
version | Elasticsearch version. |
build | Elasticsearch build hash. |
jdk | JDK version. |
nodeRole |
Role of the node. This can have more than one value:
|
master |
Current master node in the cluster:
|
Elasticsearch resource
This monitors the resources of each node in the cluster:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name of the node. |
cpu | CPU usage in percentage (%). |
heapCurrent | Current heap usage. Unit: bytes |
heapPercent | Percent used heap. |
ramCurrent | Current RAM usage. Unit: bytes |
ramPercent | Percent RAM used. |
diskUsed | Used disk space. Unit: bytes |
diskAvail | Available disk space. |
diskUsedPercent | Percent disk used. |
fileDescriptorCurrent | Number of used file descriptors. |
fileDescriptorPercent | Percent file descriptors used. |
Elasticsearch SearchPerf-ByIndex
This monitors search performance by index. Data is grouped per index:
Column Name | Description |
---|---|
index | Name of the index. |
searchQueryTotal | Number of query phase operations. |
searchQueryTime | Time spent in query phase. Default: millisecond (ms) |
searchQueryCurrent | Number of current query phase operations. |
searchFetchTotal | Number of fetch phase operations. |
searchFetchTime | Time spent in fetch phase. Default: millisecond (ms) |
searchFetchCurrent | Number of current fetch phase operations. |
fielddataMemory | Used fielddata cache. |
fielddataEvictions | Used fielddata evictions. |
averageQueryLatency | Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal. Default: millisecond (ms) per query |
averageFetchLatency | Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal. Default: millisecond (ms) per fetch |
Elasticsearch searchPerf-ByNode
This monitors search performance by node. Data is grouped per node:
Column Name | Description |
---|---|
nodeID | Unique node ID. |
name | Name assigned to the node. |
searchQueryTotal | Number of query phase operations. |
searchQueryTime | Time spent in query phase. Unit: millisecond (ms) |
searchQueryCurrent | Number of current query phase operations. |
searchFetchTotal | Number of fetch phase operations. |
searchFetchTime | Time spent in fetch phase. Unit: millisecond (ms) |
searchFetchCurrent | Number of current fetch phase operations. |
fielddataMemory | Used fielddata cache. |
fielddataEvictions | Used fielddata evictions. |
averageQueryLatency | Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal. Unit: millisecond (ms) per query |
averageFetchLatency | Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal. Unit: millisecond (ms) per fetch |
Elasticsearch ThreadPool
This monitors the bulk, index, and search thread pools of each node in the cluster:
Column Name | Description |
---|---|
node_id/name | Node ID/Thread Pool Name. |
node_name | Name of the node. |
name | Thread Pool name. |
type | Thread Pool Type. |
active | Number of active threads. |
queue | Number of tasks currently in queue. |
rejected | Number of rejected tasks. |
size | Number of threads. |
queue_size | Size of the queue with pending requests that have no threads to execute. |