Tencent Cloud Elasticsearch Service (ES) team has been continuously optimizing the ES kernel based on its extensive practical experience in large-scale applications while remaining fully compatible with the open-source Elasticsearch kernel, in an effort to improve cluster performance and stability and reduce costs. In addition, the team keeps up with latest updates in the community. This document describes the major kernel optimizations of ES.
Major optimizations in April 2022:
Optimization Category | Optimization Policy | Supported Version |
---|---|---|
Performance | Time series index query clipping is optimized, shifting from large-scale traversal to fixed-point boundary clipping and increasing the high-dimensional time series search performance by over ten times. | 7.14.2 |
DSL query results can be returned in columns, which greatly reduces the duplicate key redundancy, lowers the network bandwidth usage by 35%, and increases the performance by 20%. | 7.14.2 | |
The serialization of transparent data transfer between nodes is optimized, reducing the redundant serialization costs and increasing the query performance by 30%. | 7.14.2 | |
The X-Pack authentication performance is optimized. CPU hotspots are eliminated through special permission processing, caching, and delayed loading, improving the query performance by over 30%. | 7.10.1, 7.14.2 | |
The query performance is optimized in fine-grained block-level sampling, increasing the estimated query performance of operators such as topk, avg, min, max, and histogram by over ten times. | 7.14.2 | |
Feature | The query preference parameters are optimized. `_shards` and `custom_string` can be used in combination to fix primary and replica shards, which ensures stable query results in scoring scenarios. | 7.14.2 |
The truncation of super-long content of keyword fields is optimized, so that such content can be written without an truncation exception reported. | 7.14.2 | |
The underlying fine-grained control of query timeout is optimized to avoid the further occupation of cluster resources by a large number of canceled or timed-out queries (the queries should carry the `timeout` parameter). | 7.10.1, 7.14.2 | |
Stability | The memory leak issue in specific memory usage throttling scenarios during the query process is fixed, and the memory usage throttling policy is further optimized to avoid OOM errors in aggregation scenarios and enhance the cluster stability. | 7.14.2 |
The issue of repeated join and removal of nodes leaving the cluster is fixed to increase the cluster stability. | 7.10.1, 7.14.2 | |
The node-level and index-level shard balancing policies are optimized to improve the shard balancing capabilities and eliminate load hotspotting. | 7.10.1, 7.14.2 | |
The shard relocation and balancing policies are optimized for multi-disk scenarios to improve the shard relocation performance. | 6.8.2, 7.10.1, 7.14.2 | |
The shard start and the priority of failed shard tasks are optimized to avoid prolonged index unavailability. | 6.8.2, 7.10.1, 7.14.2 | |
The cluster scalability performance is optimized, with the shard quantity and node expansion capabilities greatly increased, many metadata changes implemented, and cluster restart performance multiplied. | 7.14.2 | |
Security | The Log4j vulnerability is fixed. | All versions |
Major optimizations in February 2021:
Optimization Dimension | Optimization Category | Optimization Policy | Supported Version |
---|---|---|---|
Performance | Write performance | Shard-targeted routing is optimized, solving the long-tail shard issue in the writing process in single-index multi-shard scenarios. This also increases the write throughput by over 10% and reduces the CPU usage by over 20%. | 6.8.2, 7.5.1, 7.10.1 |
Query performance | Query performance is improved by over 10% by cropping the query results, instead of using `filter_path`. | 6.8.2, 7.5.1, 7.10.1 | |
Stability | Memory | Node crashes and cluster avalanches caused by high-concurrent writes and large queries are significantly reduced, and the overall availability is increased to 99.99%. |
6.8.2, 7.5.1, 7.10.1 |
JDK, GC | Tencent's proprietary KONA JDK11 is adopted and known JDK bugs are fixed, improving serial full GC capability. You can switch to the G1 collector to improve GC efficiency and reduce glitches caused by old GC. | 6.8.2, 7.5.1, 7.10.1 | |
Metadata performance | The priority of mapping update tasks is optimized, solving the issue where nodes cannot work properly due to excessive requests in the queue caused by high number of concurrent mapping update tasks. Metadata asynchronous storage is optimized and metadata synchronization performance is improved to avoid frequent timeouts of index creations and mapping updates. | 6.8.2, 7.5.1, 7.10.1 | |
Costs | Storage | The zstd compression algorithm is adopted, increasing the compression ratio by 30% to 50% and the compression performance by 30%. | 6.8.2, 7.5.1, 7.10.1 |
Major optimizations as of July 2020 since the ES team restarted its kernel research:
Optimization Dimension | Optimization Category | Optimization Policy | Supported Versions |
---|---|---|---|
Performance | Write performance | The translog lock mechanism is optimized, increasing the overall write performance by 20%. Write deduplication and segment file cropping are optimized, increasing the performance of writes with primary keys by over 50%. | 7.5.1, 7.10.1 |
Query performance | 6.4.3, 6.8.2, 7.5.1, 7.10.1 | ||
Stability | Availability | 6.4.3, 6.8.2, 7.5.1, 7.10.1 | |
Balancing policy | 5.6.4, 6.4.3, 6.8.2, 7.5.1, 7.10.1 | ||
Rolling restart speed | 6.4.3, 6.8.2, 7.5.1, 7.10.1 | ||
Online master switch | The proprietary online master switch feature allows you to switch the master online in seconds by specifying the preferred master through APIs. Typical use cases include:
|
6.4.3, 6.8.2, 7.5.1, 7.10.1 | |
Costs | Memory | 6.8.2, 7.5.1, 7.10.1 | |
Storage | 5.6.4, 6.4.3, 6.8.2, 7.5.1, 7.10.1 |
Was this page helpful?