Title | Metric | Unit | Description |
Tablets | TabletRunning | - | Total number of tablets currently running on all tablet servers |
Difference in the number of tablet replicas | ClusterReplicaSkew | - | Difference between the number of replicas on the tablet server hosting the most replicas and the number of replicas on the tablet server hosting the fewest replicas |
TServer threads | ThreadsRunning | - | Number of threads currently running on all tablet servers |
Master threads | ThreadsRunning | - | Number of threads currently running on all masters |
TServer logs | ErrorMessages | - | Number of ERROR-level log messages emitted in all processes |
Master logs | ErrorMessages | - | Number of ERROR-level log messages emitted in all processes |
| WarningMessages | - | Number of WARNING-level log messages emitted in all processes |
Oversized write requests | OversizedWriteRequests | - | Number of oversized write requests to the system catalog tablet rejected by the master since start |
Title | Metric | Unit | Description |
Block cache hit | BlockCacheHit | - | Number of block cache hits. When confirming the cache efficiency, use the value of this metric instead of that of cache_hits |
| BlockCacheMiss | - | Number of block cache misses. When confirming the cache efficiency, use the value of this metric instead of that of cache_misses |
Block cache utilization | BlockCacheUsage | bytes | Memory used by block cache |
File cache hit | FileCacheHit | - | Number of file descriptor cache hits. When confirming the cache efficiency, use the value of this metric instead of that of cache_hits |
| FileCacheMiss | - | Number of file descriptor cache misses. When confirming the cache efficiency, use the value of this metric instead of that of cache_misses |
File cache utilization | FileCacheUsage | - | Number of entries file cache |
Scanner | ActiveScanners | - | Number of currently active scanners |
| ExpiredScanners | - | Number of scanners that have expired due to inactivity since service start |
Block manager blocks | BlockUnderManagement | - | Number of currently managed data blocks |
| BlockOpenReading | - | Number of data blocks currently opened for read |
| BlockOpenWriting | - | Number of data blocks currently opened for write |
Block manager bytes | BytesUnderManagement | bytes | Number of bytes of currently managed data blocks |
Block manager containers | ContainersUnderManagement | - | Number of log block containers |
| FullContainersUnderManagement | - | Number of full log block containers |
Tablet leaders | NumRaftLeaders | - | Number of tablet replicas that are Raft leaders |
Tablet sessions | OpenClientSessions | - | Number of currently opened tablet copy client sessions on this server |
| OpemSourceSessions | - | Number of currently opened tablet copy source sessions on this server |
Tablets | TabletBootstrapping | - | Number of currently bootstrapping tablets |
| TabletFailed | - | Number of failed tablets |
| TabletInitialized | - | Number of currently initialized tablets |
| TabletNotInitialized | - | Number of currently uninitialized tablets |
| TabletRunning | - | Number of currently running tablets/Number of currently running threads |
| TabletShutdown | - | Number of currently shut down tablets |
| TabletStopped | - | Number of currently stopped tablets |
| TabletStopping | - | Number of currently stopping tablets |
CPU time | CpuStime | ms | Total system CPU time of process |
| CpuUtime | ms | Total user CPU time of process |
Data path | DataDirsFailed | - | Number of data directories whose disks are currently in failed status |
| DataDirsFull | - | Number of data directories whose disks are currently full |
Thread | ThreadsRunning | - | Number of currently running threads |
Context | InvoluntarySwitches | - | Total involuntary context switches |
| VoluntarySwitches | - | Total voluntary context switches |
Spinlock | SpinlockContentionTime | μs | Amount of time consumed by contention on internal spinlocks since server start |
Log information | ErrorMessages | - | Number of ERROR-level log messages emitted by the application |
| WarningMessages | - | Number of WARNING-level log messages emitted by the application |
Operations in queue | TotalCount | - | Total number |
| Min | - | Minimum number of tasks waiting in the queue |
| Max | - | Maximum number of tasks waiting in the queue |
| Mean | - | Average number of tasks waiting in the queue |
| Percentile_99_9 | - | 99.9th percentile of the number of tasks waiting in the queue |
Operation execution duration | TotalCount | μs | Total number of operations |
| Min | μs | Minimum run time |
| Max | μs | Maximum run time |
| Mean | μs | Average run time |
| Percentile_99_9 | μs | 99.9th percentile of the run time |
Queuing wait time | TotalCount | μs | Total number of operations |
| Min | μs | Minimum wait time |
| Max | μs | Maximum wait time |
| Mean | μs | Average wait time |
| Percentile_99_9 | μs | 99.9th percentile of the wait time |
Allocated bytes | AllocatedBytes | bytes | Number of bytes used by applications. This usually does not match the memory usage reported by the operating system because it does not include TCMalloc overhead or memory fragments |
Hybrid clock error | HybridClockError | μs | Server clock maximum error; returns 2^64-1 when unable to read the base clock |
Hybrid clock timestamp | HybridClockTimestamp | μs | Hybrid clock timestamp; returns 2^64-1 when unable to read the base clock |
TCMalloc memory | HeapSize | bytes | Bytes of system memory reserved by TCMalloc |
| CurrentThreadCacheBytes | bytes | A measure of some of the memory TCMalloc is using (for small objects) |
| TotalThreadCacheBytes | bytes | A limit to how much memory TCMalloc dedicates for small objects |
TCMalloc PageHeap | FreeBytes | bytes | Number of bytes of free mapped pages in the page heap |
| UnMappedBytes | bytes | Number of bytes of free unmapped pages in the page heap |
RPC request | ConnectionsAccepted | - | Number of incoming TCP connections made to the RPC server |
| QueueOverflow | - | Number of RPCs dropped because the service queue was full |
| TimesOutInQueue | - | Number of RPCs that timed out while waiting in the service queue and thus were not processed |
RPC FetchData | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC AlterSchema | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC CreateTablet | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC DeleteTablet | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC Quiesce | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC scan | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC ScannerKeepAlive | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC write | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
Write requests rejected due to queue overloading | QueueOverloadRejections | count | Number of write requests rejected due to queue overloading |
Scan rate | ScannedFromDiskRate | bytes/s | Amount of data scanned per second |
| ScannerReturnedRate | bytes/s | Amount of data returned per second |
Scanner bytes | ScannedFromDisk | bytes | Total amount of data scanned from disk |
| ScannerReturned | bytes | Total amount of returned data |
Total row operations | RowsInserted | count | Number of rows inserted into the node |
| RowsDeleted | count | Number of rows deleted from the node |
| RowsUpserted | count | Number of rows upserted into the node |
| RowsUpdated | count | Number of rows updated on the node |
Row operation rate | RowsInsertedRate | count/s | Number of rows inserted into the node per second |
| RowsDeletedRate | count/s | Number of rows deleted from the node per second |
| RowsUpsertedRate | count/s | Number of rows upserted into the node per second |
| RowsUpdatedRate | count/s | Number of rows updated on the node per second |
Title | Metric | Unit | Description |
Block cache hit | BlockCacheHit | - | Number of block cache hits. When confirming the cache efficiency, use the value of this metric instead of that of cache_hits |
| BlockCacheMiss | - | Number of block cache misses. When confirming the cache efficiency, use the value of this metric instead of that of cache_misses |
Block cache utilization | BlockCacheUsage | bytes | Memory used by block cache |
File cache hit | FileCacheHit | - | Number of file descriptor cache hits. When confirming the cache efficiency, use the value of this metric instead of that of cache_hits |
| FileCacheMiss | - | Number of file descriptor cache misses. When confirming the cache efficiency, use the value of this metric instead of that of cache_misses |
File cache utilization | FileCacheUsage | - | Number of entries file cache |
Block manager blocks | BlockUnderManagement | - | Number of currently managed data blocks |
| BlockOpenReading | - | Number of data blocks currently opened for read |
| BlockOpenWriting | - | Number of data blocks currently opened for write |
Block manager bytes | BytesUnderManagement | bytes | Number of bytes of currently managed data blocks |
Block manager containers | ContainersUnderManagement | - | Number of log block containers |
| FullContainersUnderManagement | - | Number of full log block containers |
CPU time | CpuStime | ms | Total system CPU time of process |
| CpuUtime | ms | Total user CPU time of process |
Thread | ThreadsRunning | - | Number of currently running threads |
Data path | DataDirsFailed | - | Number of data directories whose disks are currently in failed status |
| DataDirsFull | - | Number of data directories whose disks are currently full |
Allocated bytes | AllocatedBytes | bytes | Number of bytes used by applications. This usually does not match the memory usage reported by the operating system because it does not include TCMalloc overhead or memory fragments |
Log information | ErrorMessages | - | Number of ERROR-level log messages emitted by the application |
| WarningMessages | - | Number of WARNING-level log messages emitted by the application |
Context | InvoluntarySwitches | - | Total involuntary context switches |
| VoluntarySwitches | - | Total voluntary context switches |
Operations in queue | TotalCount | - | Total number |
| Min | - | Minimum number of tasks waiting in the queue |
| Max | - | Maximum number of tasks waiting in the queue |
| Mean | - | Average number of tasks waiting in the queue |
| Percentile_99_9 | - | 99.9th percentile of the number of tasks waiting in the queue |
Queuing wait time | TotalCount | μs | Total number of operations |
| Min | μs | Minimum wait time |
| Max | μs | Maximum wait time |
| Mean | μs | Average wait time |
| Percentile_99_9 | μs | 99.9th percentile of the wait time |
Operation execution duration | TotalCount | μs | Total number of operations |
| Min | μs | Minimum run time |
| Max | μs | Maximum run time |
| Mean | μs | Average run time |
| Percentile_99_9 | μs | 99.9th percentile of the run time |
Spinlock | SpinlockContentionTime | μs | Amount of time consumed by contention on internal spinlocks since server start |
Oversized write requests | OversizedWriteRequests | - | Number of oversized write requests to the system catalog tablet rejected since start |
Hybrid clock error | HybridClockError | μs | Server clock maximum error; returns 2^64-1 when unable to read the base clock |
Hybrid clock timestamp | HybridClockTimestamp | μs | Hybrid clock timestamp; returns 2^64-1 when unable to read the base clock |
Difference in the number of tablet replicas | ClusterReplicaSkew | - | Difference between the number of replicas on the tablet server hosting the most replicas and the number of replicas on the tablet server hosting the fewest replicas |
Tablet leaders | NumRaftLeaders | - | Number of tablet replicas that are Raft leaders |
Tablet sessions | OpemSourceSessions | - | Number of currently opened tablet copy source sessions on this server |
TCMalloc memory | HeapSize | bytes | Bytes of system memory reserved by TCMalloc |
| CurrentThreadCacheBytes | bytes | A measure of some of the memory TCMalloc is using (for small objects) |
| TotalThreadCacheBytes | bytes | A limit to how much memory TCMalloc dedicates for small objects |
TCMalloc page heap | FreeBytes | bytes | Number of bytes of free mapped pages in the page heap |
| UnMappedBytes | bytes | Number of bytes of free unmapped pages in the page heap |
RPC request | ConnectionsAccepted | - | Number of incoming TCP connections made to the RPC server |
| QueueOverflow | - | Number of RPCs dropped because the service queue was full |
| TimesOutInQueue | - | Number of RPCs that timed out while waiting in the service queue and thus were not processed |
RPC RunLeaderElection | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC ConnectToMaster | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC Ping | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC TSHeartbeat | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
RPC FetchData | TotalCount | μs | Total number of operations |
| Min | μs | Minimum processing time |
| Max | μs | Maximum processing time |
| Mean | μs | Average processing time |
| Percentile_99_9 | μs | 99.9th percentile of the processing time |
Was this page helpful?