Overview
The LibraDB engine, mainly serving efficient analytical queries, is an extended read-only analysis component that provides customers with real-time and high-performance complex SQL processing. Its columnar storage capability, vectorized parallel execution engine, and optimizer extended for distributed parallel execution enable customers to easily experience efficient analysis capabilities in their databases. Additionally, the columnar storage of LibraDB is optimized specific to the ACID for high QPS changes and transactions, ensuring the real-time performance and consistency of query data.
Supported Versions
LibraDB engine kernel version 1.2404.7 and later.
Principles
Implementation of the LibraDB engine kernel mainly consists of three parts, namely data synchronization components, compute engine, and columnar storage. The data synchronization components are primarily responsible for converting and loading row-based data into columnar storage format, which is implemented through binlog synchronization and consumption, similar to the primary-secondary replication mechanism in MySQL. The compute engine is mainly responsible for metadata storage, execution plan generation, and execution operator generation. The columnar storage is primarily responsible for data storage and operator execution.
Supported Features
The LibraDB engine kernel support various excellent features. Below is a brief introduction to the supported features.
1. Massively Parallel Processing (MPP)
The MPP architecture is a distributed data processing technology that enhances the data processing performance by distributing the workload across multiple nodes. After extending the support for the multi-replica capability, the LibraDB engine can combine nodes of multiple read-only analysis engine instances into a cluster. Each node has independent disks and memory systems, and they are connected through private or commercial general-purpose networks for collaborative computing, thereby providing overall data processing services. This effectively improves the data processing performance for ultra-large data scales, avoiding restriction by performance bottlenecks of a single node and supporting different levels of user requirements.
The MPP capability is advantageous in high-performance data processing, which can fully leverage the computing resources of multiple nodes. During execution of the same SQL statement, SQL operators can be distributed across multiple nodes for joint execution. It also supports horizontal scaling and performance expansion to satisfy the business growth of users.
2. Vectorized Execution Engine
For the LibraDB engine, data is not only stored by column but is also computed based on columns. In traditional OLTP engines like TXSQL, computation is typically based on row storage mainly because transactions are mostly point lookups, point reads, and point writes. However, in the LibraDB engine used mainly under the analysis scenarios, the computational amount of a single SQL statement may be substantial. Therefore, the LibraDB engine has implemented a vectorized execution mode, in which SIMD instructions are called once for a batch of columnar data in memory, reducing the number of function calls and lowering cache misses. It also takes full advantage of the parallel capabilities of SIMD instructions to shorten the computation time.
3. Support for Columnar Storage in High-Speed Data Change Scenarios
In the read-write instances of TDSQL-C for MySQL, it can support over a million QPS for online data operations. As a LibraDB engine that supports real-time data analysis, it must ensure data consistency in such high-frequency data change scenarios. Traditional columnar storage has certain advantages in bulk data writing but performs poorly in DELETE and UPDATE of large-scale data. In traditional real-time data warehouse scenarios, a good practice is to change UPDATE to DELETE and INSERT, and to implement the batch execution capability at the data synchronization layer. However, even in DELETE scenarios, columnar storage still has some performance disadvantages. Given the above situations, traditional columnar storage inevitably has a higher data latency and cannot achieve real-time data analysis.
Through optimization and support at the storage layer, the LibraDB engine can meet data consistency requirements in high-concurrency data change scenarios, avoiding data latencies caused by frequent data changes in read-write instances and ensuring analysis timeliness.
4. Specified Data Loading Capability
Not all data in TDSQL-C for MySQL has analytical value, so not all objects need to be loaded into columnar format. Therefore, the LibraDB engine supports the specified object loading capability. You can specify objects to be loaded into LibraDB by setting in the data loading console or using command line SQL.
Was this page helpful?