Metric | Description | Example Value |
job_records_in_per_second | The total number of records the job receives from all sources per second. | 22478.14 Record/s |
job_records_out_per_second | The total number of records the job emits to all sinks per second. | 12017.09 Record/s |
job_bytes_in_per_second | The total number of bytes the job receives from all sources (Kafka sources only) per second. | 786576 Byte/s |
job_bytes_out_per_second | The total number of bytes the job emits to all sinks (Kafka sinks only) per second. | 156872 Byte/s |
job latency | The total latency it takes the data to flow through all operators. Sample errors may exist, so the value is for reference only. | 275 ms |
job_service_delay | The difference between the current timestamp and the watermark at the sink (if there are multiple sinks, the maximum difference is used). | 5432 ms |
job_cpu_load | The average CPU utilization of all TaskManagers of the job. | 23.85% |
taskmanager_status_jvm_memory_heap_used_percentage | The average heap memory utilization of all TaskManagers of the job. | 57.12% |
taskmanager_status_jvm_memory_heap_used | The total heap memory used of all TaskManagers of the job. | 830897056.00 Bytes |
taskmanager_memory_heap_committed | The total heap memory committed of all TaskManagers of the job. | 4937220096.00 Bytes |
taskmanager_memory_heap_max | The total max heap memory of all TaskManagers of the job. | 4937220096.00 Bytes |
taskmanager_status_jvm_memory_nonheap_used | The total non-heap memory (JVM metaspace and code cache) used of all TaskManagers of the job. | 296651064.00 Bytes |
taskmanager_memory_nonheap_committed | The total non-heap memory (JVM metaspace and code cache) committed of all TaskManagers of the job. | 103219200.00 Bytes |
taskmanager_status_jvm_memory_nonheap_max | The total max non-heap memory (JVM metaspace and code cache) of all TaskManagers of the job. | 780140544.00 Bytes |
taskmanager_status_jvm_memory_process_memoryused | The max JVM memory (RSS) of all TaskManagers of the job, including heap, non-heap, native, and other areas. This metric is used to give an early warning for OOM Killed events in a Pod. | 3597035110.00 Bytes |
taskmanager_memory_direct_count | The sum of buffers in the direct buffer pools of all TaskManagers of the job. | 10993.00 Items |
taskmanager_memory_direct_used | The total direct buffer pools used of all TaskManagers of the job. | 360328431.00 Bytes |
taskmanager_memory_direct_max | The total max direct buffer pools of all TaskManagers of the job. | 360328431.00 Bytes |
taskmanager_memory_mapped_count | The sum of buffers in the mapped buffer pools of all TaskManagers of the job. | 4 Items |
taskmanager_memory_mapped_used | The total mapped buffer pools used of all TaskManagers of the job. | 33554432.00 Bytes |
taskmanager_memory_mapped_max | The total max mapped buffer pools of all TaskManagers of the job. | 33554432.00 Bytes |
jobmanager_jvm_old_gc_count | The old GC count of the JobManager of the job. | 3.00 Times |
jobmanager_jvm_old_gc_time | The old GC time of the JobManager of the job. | 701.00 ms |
jobmanager_jvm_young_gc_count | The young GC count of the JobManager of the job. | 53.00 Times |
jobmanager_jvm_young_gc_time | The young GC time of the JobManager of the job. | 4094.00 ms |
job_lastcheckpointduration | The time taken to make the last checkpoint of the job. | 723.00 ms |
job_lastcheckpointsize | The size of the last checkpoint of the job. | 751321.00 Bytes |
taskmanager_jvm_old_gc_count | The sum of old GC counts of all TaskManagers of the job. | 9.00 Times |
taskmanager_jvm_old_gc_time | The sum of old GC time of all TaskManagers of the job. | 2014.00 ms |
taskmanager_jvm_young_gc_count | The sum of young GC counts of all TaskManagers of the job. | 889.00 Times |
taskmanager_jvm_young_gc_time | The sum of young GC time of all TaskManagers of the job. | 15051.00 ms |
job_numberofcompletedcheckpoints | The number of successful checkpoints of the job. | 11.00 Times |
job_numberoffailedcheckpoints | The number of failed checkpoints of the job. | 1.00 Time |
job_numberofinprogresscheckpoints | The number of checkpoints in progress (not completed) of the job. | 1.00 Time |
job_totalnumberofcheckpoints | The total number of checkpoints (in progress, completed, and failed) of the job. | 13.00 Times |
job_numrecordsinbutfailed | The number of failed records (such as raising various exceptions) in the operator. If its value is greater than 1, the semantics of Exactly-Once will be affected. It is a testing parameter for reference only. | 0.00 Times |
jobmanager_job_numrestarts | The recorded number of job restarts due to crash (excluding restart of the job after the JobManager exits) of the JobManager of the job. | 10.00 Times |
jobmanager_status_jvm_memory_heap_used_percentage | The heap memory utilization of the JobManager of the job. | 31.34% |
jobmanager_memory_heap_used | The heap memory used of the JobManager of the job. | 1040001560.00 Bytes |
jobmanager_memory_heap_committed | The heap memory committed of the JobManager of the job. | 3318218752.00 Bytes |
jobmanager_memory_heap_max | The max heap memory of the JobManager of the job. | 3318218752.00 Bytes |
jobmanager_status_jvm_memory_nonheap_used | The non-heap memory (JVM metaspace and code cache) used of the JobManager of the job. | 117362656.00 Bytes |
jobmanager_memory_nonheap_committed | The non-heap memory (JVM metaspace and code cache) committed of the JobManager of the job. | 122183680.00 Bytes |
jobmanager_status_jvm_memory_nonheap_max | The max non-heap memory (JVM metaspace and code cache) of the JobManager of the job. | 780140544.00 Bytes |
jobmanager_status_jvm_memory_used | The JVM memory used (RSS) of the JobManager of the job, including heap, non-heap, native and other areas. This metric is used to give an early warning for OOM Killed events in a Pod. | 3597035110.00 Bytes |
jobmanager_cpu_load | The CPU utilization of the JobManager of the job. | 7.12% |
jobmanager_cpu_time | The CPU service time (ms) of the JobManager of the job. | 834490.00 ms |
jobmanager_downtime | For a non-running (failed or recovering) job, the duration of this downtime; for a running job, the value of this metric is 0. | 1088466.00 ms |
job_uptime | For a running job, the duration of continuous running of this job without interruption. | 202305.00 ms |
job_restartingtime | The time taken for the last restart of the job. | 197181.00 ms |
jobmanager_lastcheckpointrestoretimestamp | The Unix timestamp of the last job recovery from checkpoint (in ms), whose value will be -1 if no recovery is performed. | 1621934344137.00 ms |
jobmanager_memory_mapped_count | The number of buffers in the mapped buffer pool of the JobManager of the job. | 4.00 Items |
jobmanager_memory_mapped_memoryused | The mapped buffer pool used of the JobManager of the job. | 33554432.00 Bytes |
jobmanager_memory_mapped_totalcapacity | The max mapped buffer pool of the JobManager of the job. | 33554432.00 Bytes |
jobmanager_memory_direct_count | The number of buffers in the direct buffer pool of the JobManager of the job. | 22.00 Items |
jobmanager_memory_direct_memoryused | The direct buffer pool used of the JobManager of the job. | 575767.00 Bytes |
jobmanager_memory_direct_totalcapacity | The max direct buffer pool of the JobManager of the job. | 577814.00 Bytes |
jobmanager_numregisteredtaskmanagers | The number of registered TaskManagers of the job, which is generally equal to the max operator parallelism. The decline in the number of TaskManagers indicates that some TaskManagers are disconnected, and the job may crash and try to recover. | 3.00 TaskManagers |
jobmanager_numrunningjobs | The number of running jobs, with 1 for proper job running and 0 for job crash. | 1.00 Job |
jobmanager_taskslotsavailable | The number of task slots available, with 0 for proper job running and a value other than 0 for possible non-running of the job for a short period of time. | 0.00 Slots |
jobmanager_taskslotstotal | In Stream Compute Service, a TaskManager has only one task slot, so the total number of task slots is equal to the number of registered TaskManagers. | 3.00 Slots |
jobmanager_threads_count | The number of active threads in the JobManager of the job, including daemon and non-daemon threads. | 77.00 Threads |
taskmanager_cpu_time | The CPU service time (ms) of all TaskManagers of the job. | 2029230.00 ms |
taskmanager_network_availablememorysegments | The sum of memory segments available in all TaskManagers of the job. | 32890.00 Items |
taskmanager_network_totalmemorysegments | The sum of total memory segments assigned to all TaskManagers of the job. | 32931.00 Items |
taskmanager_threads_count | The total number of active threads in all TaskManagers of the job, including daemon and non-daemon threads. | 207.00 Threads |
job_lastcheckpointsize | The size of the last checkpoint. | 1,024 Bytes |
job_lastcheckpointduration | The time taken to make the last checkpoint. | 100ms |
job_numberoffailedcheckpoints | The number of failed checkpoints. | 50 Bytes |
JM CPU Load | The JVM CPU utilization of the JobManager. | 12% |
JM Heap Memory | The heap memory usage of the JobManager. | 50 Bytes |
JM GC Count | Status.JVM.GarbageCollector.<GarbageCollector>.Count of the JobManager, representing the GC count of the JobManager. | 5 times |
JM GC Time | Status.JVM.GarbageCollector.<GarbageCollector>.Time of the JobManager, representing the GC time of the JobManager. | 64ms |
TaskManager CPU Load | The JVM CPU utilization of the selected TaskManager. | 70% |
TaskManager Heap Memory | The heap memory usage of the selected TaskManager. | 50 bytes |
TaskManager GC Count | Status.JVM.GarbageCollector.<GarbageCollector>.Count of the selected TaskManager, representing the GC count of the TaskManager. | 5 times |
TaskManager GC Time | Status.JVM.GarbageCollector.<GarbageCollector>.Time of the selected TaskManager, representing the GC time of the TaskManager. | 5ms |
Task OutPoolUsage | The percentage of output queues. When this metric reaches 100%, the task is backpressured. | 64% |
Task OutputQueueLength | The number of output queues. | 6 |
Task InPoolUsage | The percentage of input queues. When this metric reaches 100%, the task is backpressured. | 64% |
Task InputQueueLength | The number of input queues. | 6 |
Task CurrentInputWatermark | The current watermark of the task. | 1623814418 |
Data import time (ETL) | The delay of a source taking the data in the job. | 10 ms |
job_records_in_per_second (ETL) | The total rate of all sources in the job. | 342 Records/s |
SourceIdleTime (ETL) | The interval between data batches processed by a source in the job, which indirectly reflects the idle time of the source. | 24532223 ms |
SynDelay (ETL) | The delay of a source taking the data and processing it in the job. | 1345 ms |
BinLogPos (ETL) | The MySQL binary log coordinates or PostgreSQL log sequence number (LSN) of the job. | 260690147 |
job latency (ETL) | The average delay between the sink and source operators of the job. | 49 ms |
DbFlushDelay (ETL) | The sum of the database flush delay and async callback time of the job. | 30 ms |
job_records_out_per_second (ETL) | The total rate of all sinks in the job. | 234 Records/s |
Source - full sync (ETL) | The full data sync progress of the job. | 30% |
Source - incremental sync (ETL) | For MySQL, sync delay refers to the gap between the binlog coordinates of the current source and the latest binlog coordinates of the MySQL instance source collected in the last sampling; for PostgreSQL, sync delay refers to the gap between the LSN of the current source and the latest LSN of the PostgreSQL instance source collected in the last sampling. | 205 |
Kafka - records_lag max | The maximum of kafka-lag-max (the difference of Kafka producer and consumer offsets) reported by the TaskManager. | 100 |
Kafka - records_lag min | The minimum of kafka-lag-max (the difference of Kafka producer and consumer offsets) reported by the TaskManager. | 50 |
Kafka - records_lag mean | The mean of kafka-lag-max (the difference of Kafka producer and consumer offsets) reported by the TaskManager. | 80 |
Kafka - records_lag sum | The sum of kafka-lag-max (the difference of Kafka producer and consumer offsets) reported by the TaskManager. | 500 |
CurrentFetchEventtimeLag (ms) | Formula: FetchTime (the time the source fetches the data) − EventTime (data event time). This metric reflects the retention of data in the external system. | 10 |
CurrentEmitEventtimeLag (ms) | Formula: EmitTime (the time the data leaves the source) − EventTime (data event time). This metric reflects the retention of data between the external system and the Source. | 20 |
taskmanager_job_task_backpressuredtimemspersecond (%) | The maximum of all subtask backpressure percentages in the job. | 30% |
taskmanager_job_task_dataskewcoefficient | This metric is the coefficient of variation (= standard deviation/mean) of subtask inputs of each job. A value less than 10% represents a weak skew. | 10% |
Was this page helpful?