Background
In Pulsar, a message of over 5 MB cannot be successfully sent. To send such a large message, you need to compress it in the client first.
Processing Large Message in Pulsar
As the default size limit for a single message is 5 MB in Pulsar, the producer will fail to send a message exceeding this limit. You can handle this in the following two ways:
Message chunking: Message chunking enables Pulsar to process large payload messages by splitting the message into chunks at the producer side and aggregating chunked messages at the consumer side.
Message compression: The message size can be compressed by replacing the same character sequences in the message data. Pulsar supports four compression algorithms: LZ4, ZLIB, ZSTD, and Snappy.
We recommend that you compress large messages before sending them.
Compression Algorithm Introduction and Comparison
Introduction
LZ4
LZ4 is a lossless data compression algorithm that consumes a small amount of CPU. It features extremely fast compression/decompression speed.
ZLIB
ZLIB is a common lossless data compression algorithm that can improve network transfer efficiency and network capacity because it can effectively reduce the size of transferred data. As a variant of the Lempel-Ziv compression algorithm, it can compress data to half the original size or even less. It can be used for data compression and decompression.
ZSTD
ZSTD is a variant of the LZ77 compression algorithm and is based on Huffman coding. It is an effective compression algorithm for different compression scenarios. Compared with other compression algorithms, it compresses data faster and more efficiently because it features real-time encoding. It can guarantee a high compression ratio and high compression speed at the same time.
Snappy
Snappy is a lossless compression algorithm based on LZ77. Its core principle lies in the replacement of the repetitive character strings in a data stream with shorter codes to reduce the stream size.
Comparison
|
ZLIB 1.2.11 -1 | 2.743 | 110 MB/sec | 400 MB/sec |
LZ4 1.8.1 | 2.101 | 750 MB/sec | 3,700 MB/sec |
ZSTD 1.3.4-1 | 2.877 | 470 MB/sec | 1,380 MB/sec |
Snappy 1.1.4 | 2.091 | 530 MB/sec | 1,800 MB/sec |
Throughput: LZ4 > Snappy > ZSTD > ZLIB
Compression ratio: ZSTD > ZLIB > LZ4 > Snappy
Physical resource occupation: Snappy occupies the most network bandwidth while ZSTD occupies the least.
Compression Algorithm Test
Test result
Note:
The following test results are for reference only. The actual compression effect is subject to the specific message content.
|
5 MB | Random message body | LZ4 (threshold: 5 MB) | 9.95 MB | 31 ms | 0.049 ms |
|
| ZLIB | 7.26 MB | 31 ms | 0.038 ms |
|
| ZSTD | 8.20 MB | 31 ms | 0.039 ms |
|
| Snappy (threshold: 5 MB) | 9.70 MB | 33 ms | 0.046 ms |
6 MB | Random message body | ZLIB (threshold: 6 MB) | 8.71 MB | 35 ms | 0.044 ms |
|
| ZSTD (threshold: 6 MB) | 9.84 MB | 35 ms | 0.046 ms |
20 MB | Same message body | LZ4 | 0.16 MB | 41 ms | 0.006 ms |
|
| ZLIB | 0.20 MB | 42 ms | 0.006 ms |
|
| ZSTD | 0.01 MB | 42 ms | 0.003 ms |
|
| Snappy | 2.47 MB | 41 ms | 0.021 ms |
40 MB | Same message body | LZ4 | 0.32 MB | 123 ms | 0.008 ms |
|
| ZLIB | 0.39 MB | 122 ms | 0.008 ms |
|
| ZSTD | 0.01 MB | 124 ms | 0.004 ms |
|
| Snappy | 4.95 MB | 123 ms | 0.036 ms |
80 MB | Same message body | LZ4 | 0.63 MB | 241 ms | 0.009 ms |
|
| ZLIB | 0.39 MB | 244 ms | 0.01 ms |
|
| ZSTD | 0.01 MB | 243 ms | 0.004 ms |
|
| Snappy (threshold: 80 MB) | 9.9 MB | 243 ms | 0.056 ms |
160 MB | Same message body | LZ4 | 1.26 MB | 484 ms | 0.013 ms |
|
| ZLIB | 1.56 MB | 479 ms | 0.016 ms |
|
| ZSTD | 0.03 MB | 481 ms | 0.004 ms |
320 MB | Same message body | LZ4 | 2.5 MB | 1,035 ms | 0.03 ms |
|
| ZLIB | 3.1 MB | 1,008 ms | 0.027 ms |
|
| ZSTD | 0.03 MB | 949 ms | 0.004 ms |
585 MB | Same message body | LZ4 | 4.59 MB | 1,705 ms | 0.027 ms |
|
| ZLIB | 5.67 MB | 1,733 ms | 0.03 ms |
|
| ZSTD | 0.11 MB | 1,722 ms | 0.006 ms |
Summary:
For data streams with random message body (non-repetitive strings), the four compression algorithms show low compression ratios. When the message is larger than 5 MB, none of the four algorithms can compress it to less than 5 MB.
For data streams with same message body (repetitive strings), all the compression algorithms show high compression ratios. Especially, LZ4, ZLIB, and ZSTD can compress a message of 5–600 MB to less than 5 MB.
Message compression demo and test
Test
Parameters called by the producer:
java -jar tdmq-sdk-demo-1.0-SNAPSHOT-jar-with-dependencies.jar pulsar://xxxx:6650
eyJrZXlJZCI6ImRlZmF1bHRfa2V5SWQiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJzdXBlcnVzZXIifQ.dYcCfp4XrdWRKdKaWylobY-_xEExfRCi1pMvNyZXbqU
pulsar-78ra8ownxb7d/BigMSGSpace/BigMSGTopic subname 1 500 0 1 20480 1 0
Parameters called by the consumer:
java -jar tdmq-sdk-demo-1.0-SNAPSHOT-jar-with-dendencies.jar pulsar://xxxx:6650
eyJrZXlJZCI6ImRlZmF1bHRfa2V5SWQiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJzdXBlcnVzZXIifQ.dYcCfp4XrdWRKdKaWylobY-_xEExfRCi1pMvNyZXbqU
pulsar-92d7w2mjwmv9/BigMessSpace/BigMessTopic subname 1 500 1
Was this page helpful?