tencent cloud

All product documents
TDMQ for Apache Pulsar
Message Compression
Last updated: 2024-06-28 11:31:37
Message Compression
Last updated: 2024-06-28 11:31:37

Background

In Pulsar, a message of over 5 MB cannot be successfully sent. To send such a large message, you need to compress it in the client first.

Processing Large Message in Pulsar

As the default size limit for a single message is 5 MB in Pulsar, the producer will fail to send a message exceeding this limit. You can handle this in the following two ways:
Message chunking: Message chunking enables Pulsar to process large payload messages by splitting the message into chunks at the producer side and aggregating chunked messages at the consumer side.
Message compression: The message size can be compressed by replacing the same character sequences in the message data. Pulsar supports four compression algorithms: LZ4, ZLIB, ZSTD, and Snappy.
‌We recommend that you compress large messages before sending them.

Compression Algorithm Introduction and Comparison

Introduction

LZ4
LZ4 is a lossless data compression algorithm that consumes a small amount of CPU. It features extremely fast compression/decompression speed.
ZLIB
‌ZLIB is a common lossless data compression algorithm that can improve network transfer efficiency and network capacity because it can effectively reduce the size of transferred data. As a variant of the Lempel-Ziv compression algorithm, it can compress data to half the original size or even less. It can be used for data compression and decompression.
ZSTD
‌ZSTD is a variant of the LZ77 compression algorithm and is based on Huffman coding. It is an effective compression algorithm for different compression scenarios. Compared with other compression algorithms, it compresses data faster and more efficiently because it features real-time encoding. It can guarantee a high compression ratio and high compression speed at the same time.
Snappy
‌Snappy is a lossless compression algorithm based on LZ77. Its core principle lies in the replacement of the repetitive character strings in a data stream with shorter codes to reduce the stream size.

Comparison

Compression Algorithm
Compression Ratio
Compression Speed
Decompression Speed
ZLIB 1.2.11 -1
2.743
110 MB/sec
400 MB/sec
LZ4 1.8.1
2.101
750 MB/sec
3,700 MB/sec
ZSTD 1.3.4-1
2.877
470 MB/sec
1,380 MB/sec
Snappy 1.1.4
2.091
530 MB/sec
1,800 MB/sec
Throughput: LZ4 > Snappy > ZSTD > ZLIB
Compression ratio: ZSTD > ZLIB > LZ4 > Snappy
Physical resource occupation: Snappy occupies the most network bandwidth while ZSTD occupies the least.

Compression Algorithm Test

Test result

Note:
The following test results are for reference only. The actual compression effect is subject to the specific message content.
Message Size
Message
Compression Algorithm
Monitored Message Size
Message Compression Duration
Message Sending Duration
5 MB
Random message body
LZ4 (threshold: 5 MB)
9.95 MB
31 ms
0.049 ms
ZLIB
7.26 MB
31 ms
0.038 ms
ZSTD
8.20 MB
31 ms
0.039 ms
Snappy (threshold: 5 MB)
9.70 MB
33 ms
0.046 ms
6 MB
Random message body
ZLIB (threshold: 6 MB)
8.71 MB
35 ms
0.044 ms
ZSTD (threshold: 6 MB)
9.84 MB
35 ms
0.046 ms
20 MB
Same message body
LZ4
0.16 MB
41 ms
0.006 ms
ZLIB
0.20 MB
42 ms
0.006 ms
ZSTD
0.01 MB
42 ms
0.003 ms
Snappy
2.47 MB
41 ms
0.021 ms
40 MB
Same message body
LZ4
0.32 MB
123 ms
0.008 ms
ZLIB
0.39 MB
122 ms
0.008 ms
ZSTD
0.01 MB
124 ms
0.004 ms
Snappy
4.95 MB
123 ms
0.036 ms
80 MB
Same message body
LZ4
0.63 MB
241 ms
0.009 ms
ZLIB
0.39 MB
244 ms
0.01 ms
ZSTD
0.01 MB
243 ms
0.004 ms
Snappy (threshold: 80 MB)
9.9 MB
243 ms
0.056 ms
160 MB
Same message body
LZ4
1.26 MB
484 ms
0.013 ms
ZLIB
1.56 MB
479 ms
0.016 ms
ZSTD
0.03 MB
481 ms
0.004 ms
320 MB
Same message body
LZ4
2.5 MB
1,035 ms
0.03 ms
ZLIB
3.1 MB
1,008 ms
0.027 ms
ZSTD
0.03 MB
949 ms
0.004 ms
585 MB
Same message body
LZ4
4.59 MB
1,705 ms
0.027 ms
ZLIB
5.67 MB
1,733 ms
0.03 ms
ZSTD
0.11 MB
1,722 ms
0.006 ms
Summary:
For data streams with random message body (non-repetitive strings), the four compression algorithms show low compression ratios. When the message is larger than 5 MB, none of the four algorithms can compress it to less than 5 MB.
For data streams with same message body (repetitive strings), all the compression algorithms show high compression ratios. Especially, LZ4, ZLIB, and ZSTD can compress a message of 5–600 MB to less than 5 MB.

Message compression demo and test

For the demo, see tdmq-sdk-demo.‌

Test

Parameters called by the producer:
java -jar tdmq-sdk-demo-1.0-SNAPSHOT-jar-with-dependencies.jar pulsar://xxxx:6650 
eyJrZXlJZCI6ImRlZmF1bHRfa2V5SWQiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJzdXBlcnVzZXIifQ.dYcCfp4XrdWRKdKaWylobY-_xEExfRCi1pMvNyZXbqU
pulsar-78ra8ownxb7d/BigMSGSpace/BigMSGTopic subname 1 500 0 1 20480 1 0
‌Parameters called by the consumer:
java -jar tdmq-sdk-demo-1.0-SNAPSHOT-jar-with-dendencies.jar pulsar://xxxx:6650 
eyJrZXlJZCI6ImRlZmF1bHRfa2V5SWQiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJzdXBlcnVzZXIifQ.dYcCfp4XrdWRKdKaWylobY-_xEExfRCi1pMvNyZXbqU 
pulsar-92d7w2mjwmv9/BigMessSpace/BigMessTopic subname 1 500 1

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support