Background
As more and more users migrate their core systems to the cloud, the amount of user data is rapidly expanding, and business logic is becoming increasingly complex. Although the cloud-native architecture of TDSQL-C for MySQL can handle transaction requests competently, queries based on row store may not always meet response time requirements of diverse analytical queries from users. A standard solution is to use data synchronization tools to replicate the data of the transaction system into a separate analysis system and route the user's query and analysis requests to this system. However, this solution harbors issues such as additional costs for data synchronization and independent analysis clusters, as well as problems related to real-time performance and data consistency of synchronization.
To address these challenges, TDSQL-C for MySQL provides the Column Store Index (CSI) feature that stores, retrieves, and manages data in a column-wise format, achieving higher query performance and data compression ratio.
Advantages
Compared to conventional row-based storage, CSI can achieve great query performance improvement using column-based data storage and query processing and significantly improve the data compression over the uncompressed data size.
Supported Versions
The kernel version is TDSQL-C for MySQL version 8.0 3.1.14 or later.
Note:
For read-only instances that meet the version requirements, the CSI feature can be enabled only on those with four or more CPU cores.
Application Scenarios
Scenarios requiring real-time analysis of online data, such as online report and data dashboard.
Scenarios involving analytical queries for large volumes of data.
Technical Principle
The CSI feature of TDSQL-C for MySQL is based on the following three key technologies:
1. Row-column mixed storage of data
TDSQL-C for MySQL stores data in a row-wise format by default. However, column-based storage can better support data analysis and query. Based on a unified architecture, column store indexes can be created for row store tables, thereby achieving mixed data storage.
2. Generation and scheduling of hybrid row/column scheduling plans
TDSQL-C for MySQL uses the optimization statistics and cost-based optimization (CBO) model to assess the cost of plans. It adapts column store indexes to the current row store optimization model and includes the indexes in the CBO search space. In this way, it pushes down execution plan segments for mixed scheduling.
3. Efficient computation
Data is stored in data blocks in a column-wise format. During computing, only the data of required columns are retrieved, thus significantly reducing I/O layer overheads, particularly in scenarios involving large wide tables. Additionally, columnar data is stored in a more compact manner in the memory. Operators can process the column data of multiple rows in a batch during computing to fully utilize the cache affinity, thereby improving computational efficiency.
Was this page helpful?