Hadoop
Overview Copied
Hadoop monitoring is a Gateway configuration file that enables monitoring of the Hadoop cluster, nodes, and daemons through the JMX and Toolkit plug-ins.
This Hadoop integration template consists of the following components:
- Hadoop Distributed File System (HDFS)
- Yet Another Resource Negotiator (YARN)
The Hadoop Distributed File System or HDFS provides scalable data storage that can be deployed on hardware and optimised operations for large datasets.
The other component Yet Another Resource Negotiator or YARN assigns the computation resources for executing the application:
- YARN ResourceManager - takes inventory of available and allocate resources to running applications.
- YARN NodeManagers - monitors resource usage and communicates with the ResourceManager.
Intended audience Copied
This guide is intended for users who are setting up, configuring, troubleshooting and maintaining this integration. This is also intended for users who will be using Active Console to monitor data from Hadoop. Once the integration is set up, the samplers providing the dataviews become available to that Gateway.
As a user, you should be familiar with Java or any other database, and with the administration of the Hadoop services.
Prerequisites Copied
The following requirements must be met before the installation and setup of the template:
- A machine running the Netprobe must have access to the host where the Hadoop instance is installed and the port Hadoop is listening to.
- Netprobe 4.6 or higher.
- Gateway 4.8 or higher.
- Hadoop 3.0.0 or higher.
- Python 2.7/3.6 or higher.
Installation procedure Copied
Ensure that you have read and can follow the system requirements prior to installation and setup of this integration template.
-
Download the integration package
geneos-integration-hadoop-<version>.zip
from the Downloads site. -
Open Gateway Setup Editor.
-
In the Navigation panel, click Includes to create a new file.
-
Enter the location of the file to include in the Location field. In this example, it is the
include/HadoopMonitoring.xml
. -
Update the Priority field. This can be any value except
1
. If you input a priority of1
, the Gateway Setup Editor returns an error. -
Expand the file location in the Includes section.
-
Select Click to load.
-
Click Yes to load the new Hadoop include file.
-
Click Managed entities in the Navigation panel.
-
Add the Hadoop-Cluster and Hadoop-Node types to the Managed Entity section that you will use to monitor Hadoop.
-
Click Validate current document to check your configuration.
-
Click Save current document to apply the changes.
Set up the samplers Copied
These are the pre-configured samplers available to use in HadoopMonitoring.xml
.
Configure the required fields by referring to the table below:
Samplers |
---|
Hadoop-HDFS-NamenodeInfo |
Hadoop-HDFS-NamenodeCluster |
Hadoop-HDFS-SecondaryNamenodeInfo |
Hadoop-HDFS-DatanodesSummary |
Hadoop-HDFS-DatanodeVolumeInfo |
Hadoop-YARN-ResourceManager |
Hadoop-YARN-NodeManagersSummary |
Set up the variables Copied
The HadoopMonitoring.xml
template provides the variables that are set in the Environments section.
Samplers | Description |
---|---|
HADOOP_HOST_NAMENODE | IP/Hostname where Namenode daemon is running. |
HADOOP_HOST_SECONDARYNAMENODE | IP/Hostname where Secondarynamenode daemon is running. |
HADOOP_HOST_DATANODE | IP/Hostname where the specific Datanode daemon is running. |
HADOOP_HOST_RESOURCEMANAGER | IP/Hostname where ResourceManager is running. |
HADOOP_PORT_JMX_NAMENODE | Namenode JMX port. |
HADOOP_PORT_JMX_SECONDARYNAMENODE | Secondarynamenode JMX port. |
HADOOP_PORT_WEBJMX_DATANODE | Datanode web UI port. Default: 9864 |
HADOOP_PORT_JMX_RESOURCEMANAGER | ResourceManager JMX port |
HADOOP_PORT_WEBJMX_NAMENODE | Namenode UI port . Default: 9870 |
HADOOP_PORT_WEBJMX_RESOURCEMANAGER | ResourceManager web UI port. Default: 8088 |
PYTHON_EXECUTABLE_PATH | Script that runs the Python program. |
Set up the rules Copied
The HadoopMonitoring-SampleRules.xml
template also provides a separate sample rules that you can use to configure the Gateway Setup Editor.
Your configuration rules must be set in the Includes section.
The table below shows the included rule setup in the configuration file:
Sample Rules | Description |
---|---|
Hadoop-NameNodeCluster-Disk-Remaining | Checks the remaining disk ratio of the entire Hadoop cluster. |
Hadoop-DataNode-Disk-Remaining | Checks the remaining disk ration of a single datanode HADOOP_RULE_DISK_REMAINING_THRESHOLD: Possible values 1.0 - 100. |
Hadoop-Datanodes-In-Errors | Checks the number of nodes with errors HADOOP_RULE_DATANODES_ERROR_THRESHOLD: Integer values. |
Hadoop-Blocks-In-Error | Checks the number of blocks with error HADOOP_RULE_BLOCKS_ERROR_THRESHOLD: Integer values. |
Hadoop-Nodemanager-In-Error: | Checks the number of nodemanagers with error HADOOP_RULE_NODEMANAGER_ERROR_THRESHOLD: Integer value. |
Hadoop-Applications-In-Error: | Checks the number of application with error HADOOP_RULE_APPLICATION_ERROR_THRESHOLD: Integer values. |
Hadoop-SecondaryNamenode-Status : | Checks the connection status of JMX plugin to Secondarynamenode service. |
Hadoop-NodeManager-State: | Checks the state of nodemanager HADOOP_RULE_NODEMANAGER_UNHEALTHY: Default: UNHEALTY |
Metrics and dataviews Copied
Hadoop HDFS namenode info Copied
Column Name | Description |
---|---|
Name | Name of the service. |
SoftwareVersion | Namenode service version. |
SecurityEnabled | Number security setup. |
State | Namenode service active state. |
Hadoop HDFS namenode cluster Copied
Column Name | Description |
---|---|
Name | Name of the service. |
CapacityUsedGB | Current used capacity across all datanodes. |
CapacityRemainingGB | Current remaining capacity. |
CapacityTotalGB | Current raw capacity. |
FilesTotal | Number of files and directories. |
TotalLoad | Number of connections. |
NumLiveDataNodes | Number of live datanodes. |
NumStaleDataNodes | Number of datanodes marked stale due to delayed hearbeat. |
NumDeadDataNodes | Number of dead datanodes. |
BlocksTotal | Number of allocated blocks in the system. |
BlockCapacity | Number of block capacity. |
CorruptBlocks | Number of blocks with corrupt replicas. |
UnderReplicatedBlocks | Number of blocks under replicated. |
MissingBlocks | Number of missing blocks. |
Hadoop HDFS secondaryNamenode info Copied
Column Name | Description |
---|---|
Name | Name of the service. |
CheckpointDirectories | Secondarynamenode checkpoint directories. |
CheckpointEditlogDirectories | Secondarynamenode checkpoint edit log directories. |
SoftwareVersion | Secondarynamenode service version. |
Hadoop HDFS datanodes summary Copied
The number of rows displayed is equal to the number of datanodes set in the whole cluster.
Column Name | Description |
---|---|
name | Datanode name and (dfs) port address. |
infoAddr | Datanode Web UI address. |
usedSpaceGB | Datanode used capacity. |
nonDfsUsedSpaceGB | Datanode non-dfs used capacity. |
capacityGB | Datanode raw capacity. |
remainingGB | Datanode remaining capacity. |
numBlocks | Number of blocks in the datanode. |
version | Datanode service version. |
volFails | Number of failed volumes in the datanode. |
Hadoop YARN resource manager Copied
Column Name | Description |
---|---|
Name | Name of the service. |
NumActiveNMs | Number of active nodemanagers. |
NumDecommissionedNMs | Number of decomissioned nodemanagers. |
NumLostNMs | Number of lost nodemanagers for not sending hearbeats. |
NumUnhealthyNMs | Number of unhealthy nodemanagers. |
AppsRunning | Number of running applications. |
AppsFailed | Total number of failed application. |
AllocatedMB | Current allocated memory in MB. |
AvailableMB | Available memory in MB. |
Hadoop YARN nodeManagers summary Copied
Column Name | Description |
---|---|
Hostname | Hostname where the nodemanager service is running. |
State | Current nodemanager state. |
NodeID | nodemanager Node ID. |
NodeHTTPAdress | nodemanager WEB UI address. |
NodeManagerVersion | nodemanager service version. |
HealthReport | nodemanager health report. |
Note
The number of rows displayed is equal to the number of nodemangers running.
Hadoop node metrics dataview Copied
Hadoop HDFS datanode volume info Copied
Column Name | Description |
---|---|
dir | Path of volume directory. |
numBlocks | Current number of blocks in the datanode volume. |
usedSpace | Used space in the datanode volume. |
freeSpace | Free space in the datanode volume. |
reservedSpace | Reserved space for datanode volume. |
storageType | Type of datanode volume storage. |
reservedSpaceForReplicas | Reserved space for replicas. |
Note
The number of rows displayed is equal to the number of volumes in a single datanode.