tencent cloud

$0 14-Day TrialExperience EdgeOne for acceleration and security protection!

Feedback

TencentDB for MySQL

Column Compression

Last updated: 2024-12-30 18:05:42

Overview

Row compression and data page compression are already supported, but if small fields in a table are read and written frequently while big fields are not, both of the compression methods waste a lot of computing resources.
The column compression feature enables the compression of infrequently accessed large fields without compressing those small fields that are frequently accessed. This approach not only reduces the storage space required for the entire row of fields but also enhances the efficiency of read and write operations.
For instance, consider an employee tablecreate table employee(id int, age int, gender boolean, other varchar(1000) primary key (id)). When there is frequent access to the smaller fields such as id, age, gender, and less frequent access to the larger field other, it is advisable to designate the other column as a compressed column. Generally, only read and write operations on the other column will trigger its compression and decompression, while access to other columns will not. This strategy further reduces the storage size of row data, enabling swifter access to frequently accessed smaller fields and significantly reducing the storage space required for less frequently accessed larger fields.
Note:
The parameter `cdb_column_compression_enabled` serves as the toggle for the column compression feature.
The column compression feature in MySQL 5.7 is disabled by default. Should you wish to utilize it, please submit a ticket to enable this function.
The column compression feature is enabled by default in MySQL 8.0 kernel versions 20221215 and above.
Owing to the adoption of the open-source collaborative version's column compression capability in the MySQL 8.0 version, its implementation diverges from that of the MySQL 5.7 version. Below, we will delineate the usage instructions for the column compression capabilities of both versions. You may click on "MySQL 5.7 Column Compression" or "MySQL 8.0 Column Compression" to toggle and learn more.
MySQL 5.7 Column Compression
MySQL 8.0 Column Compression

Supported Versions

Kernel version: MySQL 5.7 20210330 and above.

Use Cases

If a table has many frequently accessed small fields and infrequently accessed large fields, you can compress the large field columns.

Instructions

Supported data types

1. BLOB(including TINYBLOB, MEDIUMBLOB, and LONGBLOB
2. TEXT(including TINYTEXT, MEDIUMTEXT, and LONGTEXT
3. VARCHAR
4. VARBINARY
Note
Here, the maximum length of LONGBLOB and LONGTEXT is 232-2 bytes, which is one byte less than 232-1 supported by native MySQL as described in String Type Storage Requirements.

Supported DDL syntax types

Different from the table creation syntax of native MySQL, the definition of COLUMN_FORMAT in column_definition is changed in TencentDB for MySQL. In addition, column compression is supported only for tables with the InnoDB storage engine.
column_definition:
data_type [NOT NULL | NULL] [DEFAULT default_value]
[AUTO_INCREMENT] [UNIQUE [KEY]] [[PRIMARY] KEY]
[COMMENT 'string']
[COLLATE collation_name]
[COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}|COMPRESSED=[zlib]] # COMPRESSED is the compressed column keyword
[STORAGE {DISK|MEMORY}]
[reference_definition]
Below is a simple example:
CREATE TABLE t1(
id INT PRIMARY KEY,
b BLOB COMPRESSED
);
Here, as the compression algorithm is not specified, the zlib algorithm will be selected by default. You can also specify the compression algorithm keyword, but only zlib is supported currently.
CREATE TABLE t1(
id INT PRIMARY KEY,
b BLOB COMPRESSED=zlib
);
The following DDL syntaxes are supported:
CREATE TABLE:
DDL
Whether the Compression Attribute is Inherited
CREATE TABLE t2 LIKE t1;
Yes
CREATE TABLE t2 SELECT * FROM t1;
Yes
CREATE TABLE t2(a BLOB) SELECT * FROM t1;
No
ALTER TABLE:
DDL
Description
ALTER TABLE t1 MODIFY COLUMN a BLOB;
Alters a compressed column into a non-compressed one.
ALTER TABLE t1 MODIFY COLUMN a BLOB COMPRESSED;
Alters a non-compressed column into a compressed one.

Parameter Description

Parameter
Effective Immediately
Type
Default Value
Valid Values/Value Range
Description
cdb_column_compression_enabled
Yes
bool
FALSE
TRUE/FALSE
The column compression feature can be toggled on or off. When disabled, the creation of tables with compression attributes is prohibited, although tables that already possess compression attributes remain unaffected.
innodb_column_compression_zlib_wrap
Yes
bool
TRUE
TRUE/FALSE
If enabled, it will generate the zlib header and zlib footer for the data, and perform an adler32 checksum verification.
innodb_column_compression_zlib_strategy
Yes
Integer
0
[0,4]
Column compression policy. Valid values: 0: Z_DEFAULT_STRATEGY; 1: Z_FILTERED; 2: Z_HUFFMAN_ONLY; 3: Z_RLE; 4: Z_FIXED.
Generally, Z_DEFAULT_STRATEGY is the best choice for text data, while Z_RLE for image data.
innodb_column_compression_zlib_level
Yes
Integer
6
[0,9]
Column compression level. Value range: 0–9. 0 indicates not to compress. The higher the value, the smaller the data size after compression, and the longer the compression duration.
innodb_column_compression_threshold
Yes
Integer
256
[0, 0xffffffff]
Column compression threshold in bytes. Value range: 1–0xffffffff. Only data whose length is at or above this threshold will be compressed; otherwise, the original data will stay unchanged with only a compression header added.
innodb_column_compression_pct
Yes
Integer
100
[1, 100]
Column compression ratio in percentages. Value range: 1–100. Data will be compressed only if the data size after compression/data size before compression is below this value; otherwise, the original data will stay unchanged with only a compression header added.
Note:
Currently, you cannot directly modify the values of the above parameters. If needed, submit a ticket for assistance.

New status description

Name
Type
Description
Innodb_column_compressed
Integer
Number of column compressions, including compressions for non-compressed data and compressed data.
Innodb_column_decompressed
Integer
Number of column decompressions, including decompressions for non-compressed data and compressed data.

New error description

Name
Scope
Description
Compressed column '%-.192s' can't be used in key specification
Name of the column specified for compression
The compression attribute cannot be specified for a column with an index.
Unknown compression method: %s
Name of the compression algorithm specified in the DDL statement
An invalid compression algorithm other than zlib is specified in the CREATE TABLE or ALTER TABLE statement.
Compressed column '%-.192s' can't be used in column format specification
Name of the column specified for compression
If the COLUMN_FORMAT attribute has been specified for a column, other attributes cannot be specified, and COLUMN_FORMAT can be used only in NDB.
Alter table ... discard/import tablespace not support column compression
\\
The ALTER TABLE ... DISCARD/IMPORT TABLESPACE statement cannot be executed for tables with column compression enabled.

Performance

The performance varies by DDL and DML statements:
For DDL statements, sysbench is used for testing:
Column compression compromises much performance of DDL statements with the COPY algorithm, and the performance after compression is 7–8 times lower than before.
The impact of column compression on INPLACE DDL statements is subject to the data volume after compression. If the overall data size is reduced after compression, the DDL performance will be improved; otherwise, it will be compromised.
Column compression almost has no impact on INSTANT DDL statements.
For DML statements, in an 8-column table with the most common compression ratio of 1:1.8 (where the length of the inserted data varies randomly from 1 to 6,000, the inserted characters are random within 0–9 and a–b, a column contains a large volume of varchar data, and the data types of other columns are either char(60) or int), the performance of insertion, deletion, and query of non-compressed columns in this table is improved by below 10%, but the performance of update of non-compressed and compressed columns is reduced by below 10% and 15% respectively. This is because that TencentDB for MySQL first reads the value of a row and then writes the updated value, which triggers one decompression/compression process, while insertion and query only trigger one compression or decompression.

Notes

1. During logic export, CREATE TABLE statements will carry the COMPRESSED keyword. Therefore, TencentDB for MySQL supports such statements during import. Below are notes on official MySQL versions:
If the official MySQL version is below 5.7.18, data can be imported directly.
If the official MySQL version is 5.7.18 or above, the COMPRESSED keyword must be removed after logic export.
2. When DTS exports data from other cloud service providers or users, incompatibility may occur during binlog sync. In this case, you can skip DDL statements with the COMPRESSED keyword.
3. Column compression employs the zlib compression algorithm to condense data. However, applying column compression to data that has already been compressed typically does not significantly enhance the compression outcome. Taking JPEG images as an example, JPEG is a highly optimized lossy compression format that has already reduced the image data to a smaller file size. Therefore, in practical applications, the effect of column compression on JPEG data is not optimal.

Supported Versions

Kernel version: MySQL 8.0 20221215 and above.

Use Cases

If a table has many frequently accessed small fields and infrequently accessed large fields, you can compress the large field columns.

Instructions

Supported data types

1. BLOB(including TINYBLOB, MEDIUMBLOB, and LONGBLOB
2. TEXT(including TINYTEXT, MEDIUMTEXT, and LONGTEXT
3. VARCHAR
4. VARBINARY
5. JSON
The syntax is as follows:
CREATE TABLE t1(
id INT PRIMARY KEY,
b BLOB COMPRESSED
);
or
CREATE TABLE t1(
id INT PRIMARY KEY,
b BLOB COLUMN_FORMAT COMPRESSED
);
Compression Threshold
The compression threshold is governed by the parameter innodb_min_column_compress_length, which defaults to 256. Should the original size of a column exceed the value of this parameter, compression will be applied; otherwise, only a compression header is added without actual data compression being performed.
The supported DDL syntax is summarized as follows:
CREATE TABLE:
DDL
Whether the Compression Attribute is Inherited
CREATE TABLE t2 LIKE t1;
Yes
CREATE TABLE t2 SELECT * FROM t1;
No
CREATE TABLE t2(a BLOB) SELECT * FROM t1;
No
ALTER TABLE:
DDL
Description
ALTER TABLE t1 MODIFY COLUMN a BLOB;
Alters a compressed column into a non-compressed one.
ALTER TABLE t1 MODIFY COLUMN a BLOB COMPRESSED;
Alters a non-compressed column into a compressed one.

Parameter Description

Parameter
Effective Immediately
Type
Default Value
Valid Values/Value Range
Description
innodb_zlib_column_compression_level
Yes
UINT
6
[0-9]
Zlib compression, where 0 denotes no compression, 1 signifies the fastest compression, and 9 represents the highest level of compression achievable. As the scale progresses from 1 to 9, the compression speed decreases, yet the compression ratio increases.
innodb_zstd_column_compression_level
Yes
UINT
3
[1-22]
Zstd compression employs a scale where 1 signifies the fastest compression speed, and 22 denotes the highest level of compression. As the scale progresses from 1 to 22, the compression speed decreases, yet the compression ratio increases.
innodb_min_column_compress_length
Yes
UINT
256
[1,UINT_MAX32]
Adjust the compression threshold, measured in bytes. Should the original length of a column be greater than or equal to the value of this parameter, compression will be applied. Otherwise, only a compression header will be added, without actual data compression being performed.

Multiple Algorithms

TencentDB for MySQL 8.0 version supports three compression algorithms: ZLIB, LZ4, and ZSTD. It is also permissible to omit specifying the algorithm, in which case ZLIB will be selected as the default algorithm.
The syntax is as follows:
CREATE TABLE t1(
id INT PRIMARY KEY,
b BLOB COMPRESSED ALGORITHM = [ZLIB|LZ4|ZSTD]
);
or
CREATE TABLE t1(
id INT PRIMARY KEY,
b BLOB COLUMN_FORMAT COMPRESSED ALGORITHM = [ZLIB|LZ4|ZSTD]
);

Compression Algorithms and Compression Levels

1. ZLIB: Currently, multiple compression levels of ZLIB are offered, with the parameter designated as innodb_zlib_column_compression_level, ranging from 0 to 9. Here, 0 signifies no compression, 1 denotes the fastest compression, and 9 represents the highest degree of compression, with the default set to 6.
2. LZ4: It is consistent with MySQL's Page compression and does not support multi-level compression of LZ4. When using the LZ4 compression algorithm, you need to pay attention to the fact that the maximum original length of LZ4 compression is 231-1, while the maximum length of LONGBLOB is 232-1. When the compressed original data When the length is greater than or equal to 231, ZLIB will be used implicitly for compression.
3. ZSTD: ZSTD, short for ZStandard, offers three modes of compression. This section details the support for non-streaming, dictionary-less standard compression, providing multiple levels of compression. The parameter for this is innodb_zstd_column_compression_level, with values ranging from 1 to 22. A setting of 1 denotes the fastest compression, while 22 indicates the highest degree of compression, with the default setting being 3.
Display of Compression Attributes
This section presents the default compression algorithm: ALGORITHM = ZLIB.
CREATE TABLE t2 (a VARCHAR(100) COMPRESSED) ENGINE=InnoDB;

SHOW CREATE TABLE t2;

Notes

1. During logic export, CREATE TABLE statements will carry the COMPRESSED keyword. Therefore, TencentDB for MySQL supports such statements during import. Below are notes on official MySQL versions:
If the official MySQL version is below 8.0.22, data can be imported directly.
If the official MySQL version is 8.0.22 or above, the COMPRESSED keyword must be removed after logic export.
2. When DTS exports data from other cloud service providers or users, incompatibility may occur during binlog sync. In this case, you can skip DDL statements with the COMPRESSED keyword.
3. In terms of physical backups, since the fields are already compressed within InnoDB at the time of backup, the version used for the backup must also support column compression.
4. Physical upgrades from MySQL 5.7 to MySQL 8.0 are not supported.
5. Column compression utilizes ZLIB, LZ4, and ZSTD compression algorithms to condense data. However, applying column compression to data that has already been compressed typically does not significantly enhance the compression outcome. Taking JPEG images as an example, JPEG is a highly optimized lossy compression format that has already reduced the image data to a smaller file size. Therefore, in practical applications, the effect of applying column compression to JPEG data is not optimal.


Catalog

In The Article

Column Compression

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon