Title | Metric | Unit | Description |
Nodes | NumActiveNMs | - | Number of live NodeManagers |
| NumDecommissionedNMs | - | Number of decommissioned NodeManagers |
| NumLostNMs | - | Number of lost NodeManagers |
| NumUnhealthyNMs | - | Number of unhealthy NodeManagers |
CPU cores | AllocatedVCores | - | Number of allocated VCores in the current queue |
| ReservedVCores | - | Number of reserved VCores in the current queue |
| AvailableVCores | - | Number of available VCores in the current queue |
| PendingVCores | - | Number of pending VCores in resource requests in the current queue |
Total applications | AppsSubmitted | - | Number of submitted jobs in the current queue |
| AppsRunning | - | Number of running jobs in the current queue |
| AppsPending | - | Number of pending jobs in the current queue |
| AppsCompleted | - | Number of completed jobs in the current queue |
| AppsKilled | - | Number of killed jobs in the current queue |
| AppsFailed | - | Number of failed jobs in the current queue |
| ActiveApplications | - | Number of active jobs in the current queue |
| running_0 | - | Number of running jobs in the current queue that have run for less than 60 minutes |
| running_60 | - | Number of running jobs in the current queue that have run for 60–300 minutes |
| running_300 | - | Number of running jobs in the current queue that have run for 300–1,440 minutes |
| running_1440 | - | Number of running jobs in the current queue that have run for more than 1,440 minutes |
Memory size | AllocatedMB | MB | Amount of allocated memory in the current queue |
| AvailableMB | MB | Amount of available memory in the current queue |
| PendingMB | MB | Amount of pending memory in resource requests in the current queue |
| ReservedMB | MB | Amount of reserved memory in the current queue |
Containers | AllocatedContainers | - | Number of allocated containers in the current queue |
| PendingContainers | - | Number of pending containers in resource requests in the current queue |
| ReservedContainers | - | Number of reserved containers in the current queue |
Total allocated/released containers | AggregateContainersAllocated | - | Total number of allocated containers in the current queue |
| AggregateContainersReleased | - | Total number of released containers in the current queue |
Users | ActiveUsers | - | Number of active users in the current queue |
Memory | allocatedMB | MB | Amount of allocated memory in the cluster |
| availableMB | MB | Amount of available memory in the cluster |
| reservedMB | MB | Amount of reserved memory in the cluster |
| totalMB | MB | Total amount of memory in the cluster |
Applications | completed | - | Number of completed jobs in the cluster during the statistical period |
| failed | - | Number of failed jobs in the cluster during the statistical period |
| killed | - | Number of killed jobs in the cluster during the statistical period |
| pending | - | Number of pending jobs in the cluster during the statistical period |
| running | - | Number of running jobs in the cluster during the statistical period |
| submitted | - | Number of submitted jobs in the cluster during the statistical period |
Containers | containersAllocated | - | Number of allocated containers in the cluster |
| containersPending | - | Number of pending containers in the cluster |
| containersReserved | - | Number of reserved containers in the cluster |
Memory utilization | usageRatio | % | Current memory utilization of the cluster |
Cores | allocatedVirtualCores | - | Number of allocated CPU cores in the cluster |
| availableVirtualCores | - | Number of available CPU cores in the cluster |
| reservedVirtualCores | - | Number of reserved CPU cores in the cluster |
| totalVirtualCores | - | Total number of CPU cores in the cluster |
CPU utilization | usageRatio | % | Current CPU utilization of the cluster |
Launched AMs | AMLaunchDelayNumOps | - | Launched AMs |
Average time for RM to launch AM | AMLaunchDelayAvgTime | ms | Average time for RM to launch AM |
Total registered AMs | AMRegisterDelayNumOps | - | Total registered AMs |
Average time for AM to register with RM | AMRegisterDelayAvgTime | ms | Average time for AM to register with RM |
Queue CPU utilization | YARN.RM.QUEUE.VCORES.RATIO | - | Utilization of CPU allocated for the current queue |
Queue memory utilization | YARN.RM.QUEUE.MEM.RATIO | - | Utilization of memory allocated for the current queue |
Title | Metric | Unit | Description |
RPC authentications/authorizations | RpcAuthenticationFailures | - | Number of failed RPC authentications |
| RpcAuthenticationSuccesses | - | Number of successful RPC authentications |
| RpcAuthorizationFailures | - | Number of failed RPC authorizations |
| RpcAuthorizationSuccesses | - | Number of successful RPC authorizations |
Data received/sent by RPC | ReceivedBytes | bytes/s | Amount of data received by RPC |
| SentBytes | bytes/s | Amount of data sent by RPC |
RPC connections | NumOpenConnections | - | Current number of open connections |
RPC requests | RpcProcessingTimeNumOps | - | Number of RPC requests |
| RpcQueueTimeNumOps | - | Number of RPC requests |
RPC queue length | CallQueueLength | - | Length of the current RPC queue |
Average RPC processing time | RpcProcessingTimeAvgTime | s | Average RPC request processing time |
| RpcQueueTimeAvgTime | s | Average time of RPC in the queue |
GC count | YGC | - | Young GC count |
| FGC | - | Full GC count |
GC time | FGCT | s | Full GC time |
| GCT | s | Garbage collection time |
| YGCT | s | Young GC time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads in Terminated status |
JVM logs | LogFatal | - | Number of Fatal logs |
| LogError | - | Number of Error logs |
| LogWarn | - | Number of Warn logs |
| LogInfo | - | Number of Info logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to process |
| MemHeapUsedM | MB | Heap memory size used by process |
| MemHeapCommittedM | MB | Heap memory size committed to process |
| MemHeapMaxM | MB | Maximum heap memory size available to process |
| MemMaxM | MB | Maximum memory size available to process |
CPU utilization | ProcessCpuLoad | % | CPU utilization |
Cumulative CPU usage time | ProcessCpuTime | ms | Cumulative CPU usage time |
File descriptors | MaxFileDescriptorCount | - | Maximum number of file descriptors |
| OpenFileDescriptorCount | - | Number of opened file descriptors |
Process execution duration | Uptime | s | Process execution duration |
Worker threads | DaemonThreadCount | - | Number of daemon threads in the process |
| ThreadCount | - | Number of threads in the process |
Node status | haState | 1:Active,0:Standby | ResourceManager active/standby status |
Active/Standby switch | switchOccurred | - | ResourceManager active/standby switch |
Title | Metric | Unit | Description |
JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads in Terminated status |
JVM logs | LogFatal | - | Number of FATAL-level logs |
| LogError | - | Number of ERROR-level logs |
| LogWarn | - | Number of WARN-level logs |
| LogInfo | - | Number of INFO-level logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to process |
| MemHeapUsedM | MB | Heap memory size used by process |
| MemHeapCommittedM | MB | Heap memory size committed to process |
| MemHeapMaxM | MB | Maximum heap memory size available to process |
| MemMaxM | MB | Maximum memory size available to process |
GC count | YGC | - | Young GC count |
| FGC | - | Full GC count |
GC time | FGCT | s | Full GC time |
| GCT | s | Garbage collection time |
| YGCT | s | Young GC time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
CPU utilization | ProcessCpuLoad | % | CPU utilization |
Cumulative CPU usage time | ProcessCpuTime | ms | Cumulative CPU usage time |
File descriptors | MaxFileDescriptorCount | - | Maximum number of file descriptors |
| OpenFileDescriptorCount | - | Number of opened file descriptors |
Process execution duration | Uptime | s | Process execution duration |
Worker threads | DaemonThreadCount | - | Number of daemon threads in the process |
| ThreadCount | - | Number of threads in the process |
Title | Metric | Unit | Description |
GC count | YGC | - | Young GC count |
| FGC | - | Full GC count |
GC time | FGCT | s | Full GC time |
| GCT | s | Garbage collection time |
| YGCT | s | Young GC time |
Memory zone proportion | S0 | % | Percentage of used Survivor 0 memory |
| E | % | Percentage of used Eden memory |
| CCS | % | Percentage of used compressed class space memory |
| S1 | % | Percentage of used Survivor 1 memory |
| O | % | Percentage of used Old memory |
| M | % | Percentage of used Metaspace memory |
JVM threads | ThreadsNew | - | Number of threads in NEW status |
| ThreadsRunnable | - | Number of threads in RUNNABLE status |
| ThreadsBlocked | - | Number of threads in BLOCKED status |
| ThreadsWaiting | - | Number of threads in WAITING status |
| ThreadsTimedWaiting | - | Number of threads in TIMED WAITING status |
| ThreadsTerminated | - | Number of threads currently in TERMINATED status |
JVM logs | LogFatal | - | Number of FATAL-level logs |
| LogError | - | Number of ERROR-level logs |
| LogWarn | - | Number of WARN-level logs |
| LogInfo | - | Number of INFO-level logs |
JVM memory | MemNonHeapUsedM | MB | Non-heap memory size used by process |
| MemNonHeapCommittedM | MB | Non-heap memory size committed to process |
| MemHeapUsedM | MB | Heap memory size used by process |
| MemHeapCommittedM | MB | Heap memory size committed to process |
| MemHeapMaxM | MB | Maximum heap memory size available to process |
| MemMaxM | MB | Maximum memory size available to process |
Total containers | ContainersLaunched | - | Number of launched containers |
| ContainersCompleted | - | Number of completed containers |
| ContainersFailed | - | Number of failed containers |
| ContainersKilled | - | Number of killed containers |
| ContainersIniting | - | Number of containers being initialized |
| ContainersRunning | - | Number of running containers |
| AllocatedContainers | - | Number of containers allocated by NodeManager |
Average container launch time | ContainerLaunchDurationAvgTime | ms | Average container launch time |
Container launches | ContainerLaunchDurationNumOps | - | Container launches |
CPU cores | AvailableVCores | - | Number of VCores available to NodeManager |
| AllocatedVCores | - | Number of VCores allocated by NodeManager |
Memory size | AllocatedGB | GB | Amount of memory allocated by NodeManager |
| AvailableGB | GB | Amount of memory available to NodeManager |
CPU utilization | ProcessCpuLoad | % | CPU utilization |
Cumulative CPU usage time | ProcessCpuTime | ms | Cumulative CPU usage time |
File descriptors | MaxFileDescriptorCount | - | Maximum number of file descriptors |
| OpenFileDescriptorCount | - | Number of opened file descriptors |
Process execution duration | Uptime | s | Process execution duration |
Worker threads | DaemonThreadCount | - | Number of daemon threads in the process |
| ThreadCount | - | Number of threads in the process |