Suggested Scenes to Avoid
Avoid large-scale periodic scheduling of offline/batch ETL jobs (insert into select / create table as select) in production clusters, particularly when running both offline and online businesses within the same cluster. Offline jobs can consume significant resources, impacting the stability and performance of online businesses.
Note:
It is recommended to isolate offline and online business on different clusters, or to complete offline processing with Spark first, followed by writing the data to Doris.
Avoid executing insert into one by one: Each insert into in Doris is a transaction, and inserting data row by row can cause concurrency to exceed the upper limit of transactions.
Note:
It is recommended to batch the data, such as executing insert into dozens or hundreds of rows at a time, to reduce write pressure.
1.2 Kernel Version: Try to avoid using complex data types (e.g., MAP, ARRAY, STRUCT).
1.2 Kernel Version: Support for complex data types is not fully developed, and some write and query operations might cause errors.
Suggested Queries to Avoid
Try to avoid using select * queries on tables with many columns and large amounts of data.
Avoid enabling the profile globally (this can result in significant resource overhead, so it is recommended to enable the profile only for specific SQL statements that need it).
Try to avoid joining multiple large tables.
Note:
To deal with multiple large table joins, it is recommended to join large tables in pairs using Colocation Join, or to use pre-aggregated tables, indexes, etc., to speed up queries.
Suggested Features to Avoid
1.2 Kernel Version: Avoid enabling merge_on_write (this feature is not yet fully developed).
1.2 Kernel Version: Avoid enabling Light scheme change (this feature is not yet fully developed).
Was this page helpful?