Broker Configuration Parameter Description
Current configurations on the CKafka broker side are as follows for reference:
message.max.bytes=1000012
auto.create.topics.enable=false
delete.topic.enable=true
socket.request.max.bytes=16777216
max.connections.per.ip=5000
offsets.retention.minutes=10080
allow.everyone.if.no.acl.found=true
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
Configuration Parameter Description of Topic
Select a Suitable Number of Partitions
From the perspective of producers, writing to different partitions is fully parallel; from the perspective of consumers, the number of concurrencies completely depends on the number of partitions (if the number of consumers is greater than the number of partitions, there must be idle consumers). Therefore, selecting a suitable number of partitions is very important for enhancing the performance of CKafka instances.
The number of partitions needs to be determined according to the throughput of production and consumption. Ideally, the number of partitions can be determined by the following formula:
Num = max( T/PT , T/CT ) = T / min( PT , CT )
Among them, Num represents the number of partitions, T represents the target throughput, PT represents the maximum throughput of producers writing to one partition, and CT represents the maximum throughput of consumers consuming from one partition. Then the number of partitions should equal the larger one of T/PT and T/CT.
In the actual situation, the influencing factors of the maximum throughput PT of producers writing to one partition include the scale of batch processing, compression algorithm, acknowledgement mechanism, number of replicas, etc. The influencing factors of the maximum throughput CT of consumers consuming from one partition are related to business logic and need to be obtained through actual tests in different scenarios.
It is usually recommended that the number of partitions be equal to or greater than the number of consumers to achieve maximum concurrency. If the number of consumers is 5, the number of partitions should also be ≥ 5. Meanwhile, too many partitions can cause a reduction in production throughput and an increase in election time. Therefore, it is not recommended to have too many partitions. The following information is provided for reference:
One partition can achieve sequential writing of messages.
One partition can only be consumed by one consumer process of the same consumer group.
One consumer process can consume multiple partitions simultaneously, that is, the partition limits the concurrent capability of the consumer end.
The more partitions there are, the longer the time taken for leader election after a failure.
The finest granularity of the offset is at the partition level. The more partitions there are, the more time-consuming it is to query the offset.
The number of partitions can be dynamically increased. It can only increase and cannot be decreased. However, an increase will result in message rebalancing.
1. Select a Suitable Replica
Currently, the number of replicas must be greater than or equal to 2 to ensure availability. If necessary, it is recommended to have 3 replicas for high reliability.
Note
The number of replicas will impact production/consumption flow. For example, if there are 3 replicas, the actual traffic = production flow × 3.
The log.retention.ms configuration of the Topic is set uniformly through the retention time of the instance via console.
1. Other Topic Level Configuration Instructions
max.message.bytes=1000012
message.format.version=0.10.2-IV0
unclean.leader.election.enable=true
min.insync.replicas=1
Producer Configuration Guide
Common parameters on the production side are configured as follows. It is recommended that customers adjust the configuration according to actual business scenes:
batch.size=16384
acks=1
timeout.ms=30000
buffer.memory=33554432
max.block.ms=60000
linger.ms=100
batch.size=16384
max.request.size=1048576
compression.type=[none, snappy, lz4]
request.timeout.ms=30000
max.in.flight.requests.per.connection=5
retries=0
retry.backoff.ms=100
Consumer Configuration Guide
Common parameters on the consumer side are configured as follows. It is recommended that customers adjust the configuration according to actual business scenes:
enable.auto.commit=true
auto.commit.interval.ms=5000
auto.offset.reset=latest
group.id=""
session.timeout.ms=10000
heartbeat.interval.ms=3000
max.poll.interval.ms=300000
fetch.min.bytes=1
fetch.max.bytes=52428800
fetch.max.wait.ms=500
max.partition.fetch.bytes=1048576
max.poll.records=500
request.timeout.ms=305000