tencent cloud

文档反馈

Aerospike Exporter 接入

最后更新时间:2024-12-13 11:50:03

    操作场景

    Aerospike Exporter 是一个用于 Aerospike 数据库的 Prometheus 指标导出工具,允许用户监视和收集 Aerospike 数据库的性能指标和统计信息。它可以帮助用户实时监控 Aerospike 集群的健康状况、性能表现和负载情况,有助于进行故障排除、性能优化和规划容量。通过将这些指标导出到 Prometheus,用户可以利用 Prometheus 的强大功能进行数据可视化、报警和分析。腾讯云可观测平台 Prometheus 提供了 Aerospike Exporter 集成及开箱即用的 Grafana 监控大盘。

    接入方式

    方式一:一键安装(推荐)

    操作步骤

    2. 在实例列表中,选择对应的 Prometheus 实例。
    3. 进入实例详情页,选择数据采集 > 集成中心
    4. 在集成中心找到并单击 Aerospike,即会弹出一个安装窗口,在安装页面填写指标采集名称和地址等信息,并单击保存即可。
    
    
    

    配置说明

    参数
    说明
    名称
    集成名称,命名规范如下:
    名称具有唯一性。
    名称需要符合下面的正则:'^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$'。
    域名
    Aerospike 数据库域名。
    地址
    Aerospike 数据库端口。
    用户名
    Aerospike 数据库用户名称。
    密码
    Aerospike 数据库密码。
    标签
    给指标添加自定义 Label。

    方式二:自定义安装

    说明
    为了方便安装管理 Exporter,推荐使用腾讯云 容器服务 来统一管理。

    前提条件

    在 Prometheus 实例对应地域及私有网络(VPC)下,创建腾讯云容器服务 Kubernetes 集群,并为集群创建 命名空间
    Prometheus 监控服务控制台 > 选择对应的 Prometheus 实例 > 数据采集 > 集成容器服务中找到对应容器集群完成关联集群操作。可参见指引 关联集群

    操作步骤

    步骤一:Exporter 部署
    2. 在左侧菜单栏中单击集群。
    3. 单击需要获取集群访问凭证的集群 ID/名称,进入该集群的管理页面。
    4. 执行以下 部署 Exporter 配置 > 部署 Aerospike Exporter > 验证 步骤完成 Exporter 部署。
    步骤二:部署 Exporter 配置
    1. 在左侧菜单中选择工作负载 > Deployment,进入 Deployment 管理页面。
    2. 在页面右上角单击 YAML 创建资源,创建 YAML 配置,选择对应的命名空间来进行部署服务,可以通过控制台的方式创建。如下以 YAML 的方式部署 Exporter, 配置示例如下:
    apiVersion: v1
    kind: Secret
    metadata:
    name: aerospike-secret-test # 根据业务需要调整成相应名称
    namespace: aerospike-demo # 根据业务需要调整到相应命名空间
    type: Opaque
    stringData:
    ape.toml: |-
    [Agent]
    # metrics server timeout in seconds
    timeout = 30
    
    # support system statistics also
    refresh_system_stats = true
    
    # prometheus binding port
    bind = ":8080" # 暴露指标端口
    
    [Aerospike]
    db_host = "127.0.0.1" # 根据业务需要调整成对应的 IP 或域名
    db_port = 3000 # 根据业务需要调整成对应的端口
    user = "admin" # 根据业务需要调整成对应的用户名
    password = "admin" # 根据业务需要调整成对应的密码
    
    # timeout for sending commands to the server node in seconds
    timeout = 30
    gauge_stats_list.toml: |-
    # This file represents a list of metrics which are treated as Gauges while exporting to Prometheus or some other Observability tool.
    # to know more about these stats, please visit https://docs.aerospike.com
    
    #
    # SETS: below section define all Sets stats which are treated as Gauges
    #
    sets_gauge_stats = [
    "device_data_bytes",
    "index_populating",
    "memory_data_bytes",
    "objects",
    "sindexes",
    "tombstones",
    "truncate_lut",
    
    # 7.0 changes
    "data_used_bytes",
    "truncating",
    ]
    
    #
    # XDR: below section define all XDR stats which are treated as Gauges
    #
    xdr_gauge_stats = [
    "compression_ratio",
    "in_progress",
    "in_queue",
    "lag",
    "lap_us",
    "latency_ms",
    "nodes",
    "recoveries_pending",
    "throughput",
    "uncompressed_pct",
    ]
    
    #
    # Sindex: below section define all Sindex stats which are treated as Gauges
    #
    sindex_gauge_stats = [
    "entries_per_bval",
    "entries_per_rec",
    "entries",
    "histogram", # removed in server6.0
    "ibtr_memory_used", # removed in server6.0
    "keys", # removed in server6.0
    "load_pct",
    "load_time",
    "loadtime", # removed in server6.0
    "memory_used", # deprecated in server6.3 version and replaced by used_bytes
    "nbtr_memory_used", # removed in server6.0
    "query_basic_avg_rec_count", # removed in server6.0
    "used_bytes", # added in server6.3 represents memory used by data (aka memory_used)
    ]
    
    #
    # Node: below section define all Node stats which are treated as Gauges
    #
    
    node_gauge_stats = [
    "batch_index_proto_compression_ratio",
    "batch_index_proto_uncompressed_pct",
    "batch_index_queue",
    "batch_index_unused_buffers",
    "client_connections",
    "cluster_clock_skew_ms",
    "cluster_clock_skew_stop_writes_sec",
    "cluster_integrity",
    "cluster_is_member",
    "cluster_max_compatibility_id",
    "cluster_min_compatibility_id",
    "cluster_size",
    "fabric_bulk_recv_rate",
    "fabric_bulk_send_rate",
    "fabric_connections",
    "fabric_ctrl_recv_rate",
    "fabric_ctrl_send_rate",
    "fabric_meta_recv_rate",
    "fabric_meta_send_rate",
    "fabric_rw_recv_rate",
    "fabric_rw_send_rate",
    "failed_best_practices",
    "heap_active_kbytes",
    "heap_allocated_kbytes",
    "heap_efficiency_pct",
    "heap_mapped_kbytes",
    "heap_site_count",
    "heartbeat_connections",
    "info_queue",
    "migrate_partitions_remaining",
    "objects",
    "process_cpu_pct",
    "proxy_in_progress",
    "queries_active",
    "rw_in_progress",
    "scans_active",
    "sindex_gc_list_creation_time",
    "sindex_gc_list_deletion_time",
    "system_free_mem_pct",
    "system_kernel_cpu_pct",
    "system_total_cpu_pct",
    "system_user_cpu_pct",
    "threads_detached",
    "threads_joinable",
    "threads_pool_active",
    "threads_pool_total",
    "time_since_rebalance",
    "tombstones",
    "tree_gc_queue",
    "tsvc_queue",
    #
    # 4.x XDR stats
    "dlog_free_pct",
    "dlog_used_objects",
    "xdr_active_failed_node_sessions",
    "xdr_active_link_down_sessions",
    "xdr_global_lastshiptime",
    "xdr_read_active_avg_pct",
    "xdr_read_idle_avg_pct",
    "xdr_read_latency_avg",
    "xdr_read_reqq_used_pct",
    "xdr_read_reqq_used",
    "xdr_read_respq_used",
    "xdr_read_txnq_used_pct",
    "xdr_read_txnq_used",
    "xdr_ship_compression_avg_pct",
    "xdr_ship_inflight_objects",
    "xdr_ship_latency_avg",
    "xdr_ship_outstanding_objects",
    "xdr_throughput",
    "xdr_timelag",
    ]
    
    #
    # Namespace: below section define all Namespace stats which are treated as Gauges
    #
    namespace_gauge_stats =[
    "appeals_rx_active",
    "appeals_tx_active",
    "appeals_tx_remaining",
    "available_bin_names",
    "cache_read_pct",
    "clock_skew_stop_writes",
    "dead_partitions",
    "defrag_q",
    "device_available_pct",
    "device_compression_ratio",
    "device_free_pct",
    "device_total_bytes",
    "device_used_bytes",
    "effective_is_quiesced",
    "effective_prefer_uniform_balance",
    "effective_replication_factor",
    "evict_ttl",
    "hwm_breached",
    "index_flash_alloc_bytes",
    "index_flash_alloc_pct",
    "index_flash_used_bytes",
    "index_flash_used_pct",
    "index_pmem_used_bytes",
    "index_pmem_used_pct",
    "master_objects",
    "master_tombstones",
    "memory_free_pct",
    "memory_used_bytes",
    "memory_used_data_bytes",
    "memory_used_index_bytes",
    "memory_used_set_index_bytes",
    "memory_used_sindex_bytes",
    "migrate_rx_instances",
    "migrate_rx_partitions_active",
    "migrate_rx_partitions_initial",
    "migrate_rx_partitions_remaining",
    "migrate_signals_active",
    "migrate_signals_remaining",
    "migrate_tx_instances",
    "migrate_tx_partitions_active",
    "migrate_tx_partitions_imbalance",
    "migrate_tx_partitions_initial",
    "migrate_tx_partitions_lead_remaining",
    "migrate_tx_partitions_remaining",
    "n_nodes_quiesced",
    "non_expirable_objects",
    "non_replica_objects",
    "non_replica_tombstones",
    "ns_cluster_size",
    "nsup_cycle_deleted_pct",
    "nsup_cycle_duration",
    "nsup_cycle_sleep_pct",
    "objects",
    "pending_quiesce",
    "pmem_available_pct",
    "pmem_compression_ratio",
    "pmem_free_pct",
    "pmem_total_bytes",
    "pmem_used_bytes",
    "prole_objects",
    "prole_tombstones",
    "query_aggr_avg_rec_count",
    "query_basic_avg_rec_count",
    "query_proto_compression_ratio",
    "query_proto_uncompressed_pct",
    "record_proto_compression_ratio",
    "record_proto_uncompressed_pct",
    "scan_proto_compression_ratio",
    "scan_proto_uncompressed_pct",
    "shadow_write_q",
    "stop_writes",
    "storage-engine.device.defrag_q",
    "storage-engine.device.free_wblocks",
    "storage-engine.device.shadow_write_q",
    "storage-engine.device.used_bytes",
    "storage-engine.device.write_q",
    "storage-engine.device.age",
    "storage-engine.file.defrag_q",
    "storage-engine.file.free_wblocks",
    "storage-engine.file.shadow_write_q",
    "storage-engine.file.used_bytes",
    "storage-engine.file.write_q",
    "storage-engine.file.age",
    "storage-engine.stripe.defrag_q",
    "storage-engine.stripe.free_wblocks",
    "storage-engine.stripe.shadow_write_q",
    "storage-engine.stripe.used_bytes",
    "storage-engine.stripe.write_q",
    "storage-engine.stripe.age",
    "storage-engine.stripe.backing_write_q",
    "migrate_fresh_partitions",
    "tombstones",
    "truncate_lut",
    "unavailable_partitions",
    "unreplicated_records",
    "write_q",
    "xdr_bin_cemeteries",
    "xdr_tombstones",
    # added in 7.0
    "data_avail_pct",
    "data_compression_ratio",
    "data_total_bytes",
    "data_used_bytes",
    "data_used_pct",
    "index_mounts_used_pct",
    "index_used_bytes",
    "indexes_memory_used_pct",
    "set_index_used_bytes",
    "sindex_mounts_used_pct",
    "sindex_used_bytes",
    "truncating",
    ]
    
    # System Info Gauge metrics list
    #
    system_info_gauge_stats = [
    "",
    ]
    步骤三:部署 Aerospike Exporter
    1. 在左侧菜单中选择工作负载 > Deployment,进入 Deployment 管理页面。
    2. 在页面右上角单击 YAML 创建资源,创建 YAML 配置,选择对应的命名空间来进行部署服务,可以通过控制台的方式创建。如下以 YAML 的方式部署 Exporter, 配置示例如下:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    labels:
    k8s-app: aerospike-exporter # 根据业务需要调整成对应的名称,建议加上 Aerospike 实例的信息
    name: aerospike-exporter # 根据业务需要调整成对应的名称,建议加上 Aerospike 实例的信息
    namespace: aerospike-demo # 根据业务需要调整成对应的命名空间
    spec:
    replicas: 1
    selector:
    matchLabels:
    k8s-app: aerospike-exporter # 根据业务需要调整成对应的名称,建议加上 Aerospike 实例的信息
    template:
    metadata:
    labels:
    k8s-app: aerospike-exporter # 根据业务需要调整成对应的名称,建议加上 Aerospike 实例的信息
    spec:
    volumes:
    - name: sec
    secret:
    defaultMode: 420
    secretName: aerospike-secret-test # 对应 步骤二 配置名称
    containers:
    - name: aerospike-exporter
    image: ccr.ccs.tencentyun.com/rig-agent/common-image:aerospike-exporter-1.18.0
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 8080 # 对应 步骤二 配置中的指标导出端口
    name: metrics
    livenessProbe:
    tcpSocket:
    port: metrics
    readinessProbe:
    tcpSocket:
    port: metrics
    volumeMounts:
    - mountPath: /etc/aerospike-prometheus-exporter
    name: sec
    readOnly: true
    步骤四:验证
    1. 在 Deployment 页面单击上述步骤创建的 Deployment,进入 Deployment 管理页面。
    2. 单击日志页签,无报错信息输出即可,如下图所示:
    
    
    
    3. 单击 Pod 管理页签进入 Pod 页面。
    4. 在右侧的操作项下单击远程登录,即可登录 Pod,在命令行窗口中执行以下 wget 命令对应 Exporter 暴露的地址,可以正常得到对应的 Aerospike 指标。如发现未能得到对应的数据,请检查连接串是否正确,具体如下:
    wget -qO- http://localhost:8080/metrics
    执行结果如下图所示:
    
    
    
    步骤五:添加采集任务
    1. 登录 Prometheus 控制台,选择对应 Prometheus 实例进入管理页面。
    2. 单击数据采集 > 集成容器服务,选择已经关联的集群,通过数据采集配置 > 新建自定义监控 > YAML 编辑来添加采集配置。
    3. 通过服务发现添加 PodMonitors 来定义 Prometheus 抓取任务,YAML 配置示例如下:
    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor
    metadata:
    name: aerospike-exporter # 填写一个唯一名称
    namespace: cm-prometheus # 按量实例: 集群的 namesapce; 包年包月实例(已停止售卖): namespace 固定,不要修改
    spec:
    podMetricsEndpoints:
    - interval: 30s
    port: metric-port # 填写pod yaml中Prometheus Exporter对应的Port的Name
    path: /metrics # 填写Prometheus Exporter对应的Path的值,不填默认/metrics
    relabelings:
    - action: replace
    sourceLabels:
    - instance
    regex: (.*)
    targetLabel: instance
    replacement: 'crs-xxxxxx' # 调整成对应的 Aerospike 实例 ID
    namespaceSelector: # 选择要监控 aerospike exporter pod 所在的 namespace
    matchNames:
    - aerospike-demo
    selector: # 填写要监控pod的Label值,以定位目标pod
    matchLabels:
    k8s-app: aerospike-exporter
    

    查看监控

    前提条件

    Prometheus 实例已绑定 Grafana 实例。

    操作步骤

    1. 登录 腾讯云可观测平台 Prometheus 控制台,选择对应 Prometheus 实例进入管理页面。
    2. 在实例 基本信息 页面,找到绑定的 grafana 地址,打开并登录,然后在 aerospike 文件夹中找到 Aerospike 实例相关监控面板,查看实例相关监控数据,如下图所示:
    
    
    

    配置告警

    腾讯云 Prometheus 托管服务支持告警配置,可根据业务实际的情况来添加告警策略。详情请参见 新建告警策略

    附录:Aerospike Exporter 配置文件主要配置项

    Agent 配置项

    名称
    描述
    bind
    指标导出端口,默认":9145"
    cert_file
    签名用证书文件
    key_file
    签名用证书文件
    root_ca
    签名用证书文件
    basic_auth_username
    http auth 验证用户名
    basic_auth_password
    http auth 验证密码
    timeout
    指标拉取超时
    labels
    自定义标签值
    refresh_system_stats
    支持系统数据统计

    Aerospike 配置项

    名称
    描述
    db_host
    Aerospike 数据库域名或 IP
    db_port
    Aerospike 数据库服务端口
    auth_mode
    Aerospike 校验模式,默认 internal,取值有 "external","internal","pki",""
    user
    Aerospike 数据库用户名
    password
    Aerospike 数据库密码
    timeout
    Aerospike 数据库连接超时
    
    联系我们

    联系我们,为您的业务提供专属服务。

    技术支持

    如果你想寻求进一步的帮助,通过工单与我们进行联络。我们提供7x24的工单服务。

    7x24 电话支持