The TXSQL kernel of TencentDB for MySQL supports a compilation optimization high-performance version, which can maintain the original compatibility without changing the internal implementation logic of the kernel. By leveraging dynamic compiler optimization techniques to identify possible user input behavior, the database kernel demonstrates stronger performance in common business scenarios, while also reducing power consumption. This article introduces the compilation optimization high-performance version of TencentDB for MySQL.
Supported Versions
TDSQL-C for MySQL 5.7 (kernel version 2.1.11) or later.
TDSQL-C for MySQL 8.0 (kernel version 3.1.12) or later.
Note:
The compilation optimization high-performance version is currently in grayscale release. If you want to experience it in advance, please submit a ticket to apply for use under the precondition of meeting the above database kernel version requirements. Overview
As the internal implementation of modern CPUs becomes increasingly complex, the default compilation, configuration, and execution methods of cloud databases can hardly fully exploit the performance potential of CPUs, leading to a significant number of CPU cycles idling. This phenomenon, in the case of the compunded scale effect, not only results in the wastage of hardware resources but also consumes a lot of power. Therefore, it is necessary to optimize cloud databases to maximize the performance potential of CPUs, reduce the idling of CPU cycles, improve the utilization rate of hardware resources, and decrease power consumption waste.
TDSQL-C for MySQL, without changing the database kernel's business logic code precondition, employs dynamic compiler optimization techniques to achieve kernel performance improvement and power consumption reduction with minimal cost. By collecting behavior and performance consumption data of cloud databases in typical/real business scenarios, it improves the default compilation method being unaware of business behavior, analyzes the database runtime behavior characteristics along with the CPU microarchitecture features, and utilizes compilation optimization technologies to make the optimized version more friendly to the CPU microarchitecture, fully unleashing the CPU's performance potential; and ensures the optimization effect does not degrade under various conditions through extensive scenario testing.
The following optimizations were achieved through the above technical measures:
1. Based on database operation behavior data, feedback optimization improves the optimization capabilities of function inlining/function reordering/basic library reordering, thereby significantly reducing database CPU ICache/ITLB miss rates and enhancing performance.
2. By utilizing link-time optimization techniques, the compilation optimization perspective is expanded from a single file/single function to across files/entire binary files, significantly enhancing the optimization space for inlining and reducing the directive count.
3. In practice, a set of efficient verification and analysis methods has been developed to ensure the effects are close to theoretical thresholds and guarantee no degradation under various scenarios.
Definition of Compilation Optimization
Compilation optimization refers to the process of enhancing a program's execution efficiency and performance by optimizing the code and adjusting compilation parameters during code compilation.
Optimization Principles
The high-performance version of TDSQL-C for MySQL uses Profile-Guided Optimization (PGO) technology for compilation optimization. PGO technology addresses the issue that traditional compilers, during optimization, rely solely on static code information without considering potential user inputs, thus failing to effectively optimize the code.
PGO technology is divided into the following three stages:
1. Instrument: During the instrument stage, an initial compilation is performed on the application. In this compilation, the compiler inserts directives into the code so that data can be collected in the next stage. These directives are of three types, used to track how many times each function is executed, how many times each branch is executed (for example, in if-else scenarios), and certain variable values (primarily for switch-case scenarios).
2. Train: In the train stage, users need to run the application compiled in the previous stage using the most common inputs. Since the previous stage has prepared for data collection, the data corresponding to the most common usage scenarios of that application will be collected after the train stage.
3. Optimization: In the optimization stage, the compiler recompiles the application using the data collected in the previous stage. Since the data from the previous stage comes from the most common user input scenarios, the final optimized result will perform better in these scenarios.
Through the optimization of these three stages, the high-performance version of TDSQL-C for MySQL can better meet the needs of users, improving the performance and efficiency of the application.
Performance Testing
Test Scenario
Mixed read-write (POINT SELECT) test scenario mainly tests the performance of the database while conducting read and write operations concurrently. It can help evaluate the database's performance in real application scenarios, including the handling capability of concurrent read-write operations, response time, throughput, and other metrics.
Test Result
Specification | Concurrency | Single table data volume (table_size) | Total number of tables (tables) | QPS |
|
|
|
|
| Using the compilation optimization version | Boost percentage |
2-core 16 GB MEM | 64 | 800,000 | 150 | 29207 | 27% |
4-core 16GB | 256 | 800,000 | 300 | 65562 | 27% |
4-core 32 GB MEM | 256 | 800,000 | 300 | 78973 | 27% |
8-core 32 GB | 256 | 800,000 | 300 | 139,845 | 28% |
8-core 64 GB MEM | 256 | 800,000 | 450 | 154,894 | 28% |
16-core 64 GB | 256 | 800,000 | 450 | 249,954 | 29% |
16-core 96 GB | 256 | 800,000 | 600 | 238,061 | 29% |
16-core 128 GB | 512 | 5,000,000 | 300 | 253,848 | 29% |
32-core 128 GB MEM | 512 | 5,000,000 | 300 | 399647 | 30% |
32-core 256 GB MEM | 512 | 5,000,000 | 400 | 402,105 | 30% |
64-core 256 GB MEM | 512 | 6,000,000 | 450 | 596,706 | 31% |
Was this page helpful?