Cluster-Level Distributed Rate Limiting
Suitable for Pulsar professional clusters. Pulsar producers and consumers produce/consume large message volumes at extremely high speed, thereby consuming server resources and causing saturation of CPU, memory, network, disk I/O, etc. Therefore, Pulsar designs a throttling scheme, sets different throttling thresholds according to instance specifications, and performs self-protection on the cluster to avoid affecting cluster quality and causing global stability risks due to excessive resource consumption.
Traffic Throttling Mechanism Description
The Pulsar production traffic throttling mechanism uses packet response delay. The statistics window for traffic throttling is 1 s.
Take production TPS rate limiting as an example, for example:
Assume that the production TPS is set to 100. When a user sends 100 messages in the first 400 ms of 1 s, the request to send the 101st message will have to wait for 600 ms before it can be processed.
From the perspective of the producer, when production traffic throttling occurs, the time taken to send messages will increase or even result in a sending timeout.
From the perspective of the consumer, when consumption traffic throttling occurs, the overall latency of the message link from production to consumption will increase, and message backlog may occur.
Throttling Principle Description
Producer Side:
The traffic throttling statistics window is 1 s. When the quota within the statistics window is used up, the server will close all channels of the producer, stop accepting message sending requests, until the next time window, reopen the channels of the producer, and process message sending requests.
Consumer Side:
The traffic throttling statistics window is 1 s. When the quota within the statistics window is used up, the server will stop pushing messages to consumers until the next time window.
Note:
How to understand closing the channel after traffic throttling on the production side?
When traffic throttling occurs on the production side, the server will close the corresponding tcp connection channel of the producer. After closing, the server will no longer accept requests for the corresponding tcp connection until the tcp connection channel is reopened.
Practical Tutorial on Pulsar Distributed Rate Limiting
1. You are recommended to purchase cluster specifications according to the peak production/consumption volume in business reality. Set the production/consumption allocation ratio for rate limiting according to the fan-out ratio of production consumption. It is recommended to conduct stress testing before official release and assess in advance whether the cluster capacity is met.
2. If it is a non-delayed message, do not set the delayed message field. Because once the sender sets the delayed message field, regardless of the delay time set, the server will follow delayed message statistics rate. A typical situation: Take Java as an example (other SDKs such as GO are similar). As long as deliverAfter
or deliverAt
is set when sending a message, it will be considered a delayed message, even if the value is 0 or earlier than the current time. 3. Configure alarms for the production/consumption rate and bandwidth of the cluster. When the production/consumption rate and bandwidth of the cluster exceed 80% of the set specification, it is recommended to upgrade to the Professional Edition Instance Specification in a timely manner to avoid the risks of increased time consumption caused by traffic throttling.
4. Configure alarms for the throttling count of production/consumption. When traffic throttling occurs, it indicates that there is a situation where production/consumption exceeds the limit within a second-level window. It is recommended to upgrade the specifications of the professional edition instance in a timely manner to avoid the risks of increased time consumption caused by traffic throttling.
Common Symptom Descriptions
Question 1: Why will traffic throttling be triggered when production/consumption is lower than the specification?
As described in the throttling principle above, traffic throttling is measured in seconds (s). The data of the monitoring platform on the console is collected and reported on a minute (min) basis. The computational formula for the statistical value of production/consumption on the monitoring platform is [amount of messages in 1 min / 60]. When the amount of production/consumption by the client is unevenly distributed within 1 min, it may be highly concentrated in a 1-second or several-second time window within 1 min, exceeding the quota in the throttling window. The amount is far lower than the quota in the throttling window at other times. In such cases, the monitored production/consumption is lower than the instance specification, but traffic throttling is triggered.
Question 2: Why can the production/consumption peak be higher than the instance specification?
Situation 1: Pulsar is a distributed system. A Pulsar node consists of multiple broker nodes. At the same time point (within one throttling window), throttling is performed by each node. The throttling threshold of each node is the remaining threshold of the current cluster. For example, if the cluster throttling threshold is 1000 and the number of broker nodes is 5, when the actual usage is 750 (assuming that the usage of each node is evenly distributed and is 150 at this time), the throttling threshold of each node at this time is 400 (150 + 1000 - 750). At this moment, the instantaneous traffic that can actually be achieved may reach 2000 (400 * 5). In this way, situations exceeding the specification may occur within one throttling window.
Situation 2: As described in the throttling principle above, after traffic throttling occurs, the write channel will be closed. However, the current request (even if it has exceeded the throttling threshold) will still be further processed. Therefore, when the number of concurrent requests is relatively high, it may occur that the throttling threshold is exceeded within a statistics window.
Question 3: How to determine whether Pulsar is throttling?
Monitor clusters on the cluster monitoring page of the Pulsar pro console. When the throttling count is more than 0, it proves that traffic throttling has occurred.
Topic Partition Traffic Throttling
Applicable to all types of Pulsar clusters.
Throttling Principle Explanation
Producer Side
Server-side Throttling Logic Description: The producer-side throttling is imprecise and depends on internal scheduled tasks (executing a round every 50 ms by default) to check whether the amount produced by each partition within a 1-second window exceeds the quota.
Behavior after server-side throttling: The producer side uses soft throttling. When throttling occurs, the read channel of the producer corresponding to the topic is closed and production requests are no longer processed. Wait up to 1 s before restoring the producer's read channel, which can then continue processing message sending requests until throttling occurs again.
Client performance after throttling occurs: When throttling occurs, the sending duration will increase and sending timeout may occur.
Note:
How to understand closing the channel after traffic throttling on the production side?
When traffic throttling occurs on the production side, the server will close the corresponding tcp connection channel of the producer. After closing, the server will no longer accept requests for the corresponding tcp connection until the tcp connection channel is reopened.
Consumer Side
Server-side Throttling Logic Description: Consumer-side throttling is imprecise. It counts whether the TPS and bandwidth consumed within a 1-second time window exceed the quota.
Server-side Behavior after Throttling: The server stops pushing messages to consumers within 1 second.
Client Performance after Throttling Occurs: When throttling occurs, the overall latency from the production end to the consumer end will increase, and message backlog may occur.
Practical Tutorial on Pulsar Topic Partition Rate Limiting
1. A single topic partition has tps and bandwidth limits for production/consumption. If the concurrency of tps/bandwidth of the topic is relatively large, it is necessary to extend the partition appropriately.
2. Configure alarms for the used quota percentage of the production/consumption rate and traffic of the topic. When it exceeds 80%, it is recommended to expand the number of partitions to avoid triggering single topic partition throttling.
Common Symptom Descriptions
Question 1: Why can the production/consumption traffic of a partition exceed the throttling threshold?
As described in the throttling principle above, the throttling of topic partitions uses a rate limiting algorithm with a non-precise soft limit. Combined with the throttling logic of the production and consumption ends, traffic exceeding the throttling threshold may occur in both production and consumption.