To ensure stable operation of Spark engine query analysis when network bandwidth is limited (e.g. during storage system throttling), the DLC Spark engine provides a local cache capability. When you need to cache table data, you can quickly enable caching by adding engine configuration.
Directions
Spark SQL Engine Configuration:
Note:
After the configuration is added, the engine cluster will restart. It is recommended to enable the cache when no tasks are running to avoid affecting ongoing tasks.
3. To use the engine cache, go to Data Exploration, write the query SQL in the SQL interface, select the engine with the cache enabled, and execute the SQL. Once executed, the engine will cache the DLC external table data locally. When the SQL is executed again, the data will be fetched from the local cache, improving query efficiency.
Spark SQL Engine Query:
Spark Batch Engine Query:
Cache Description
Cache Configuration Items Description
|
spark.hadoop.fs.cosn.impl | alluxio.hadoop.ShimFileSystem | Fixed value; the configuration value is the cache implementation class. Configure this value to enable the cache feature. If the cache feature is enabled, configuring a value other than this will result in the engine not being able to access COS data. Please follow the instructions carefully. If you need to disable the cache after enabling it, please delete this configuration item. |
Cache Usage Instructions
1. Engine Type Description
SparkSQL Engine: When the engine restarts, the cached data becomes invalid because it is a local cache.
SparkBatch Engine: The SparkBatch engine runs tasks at the session level. Once the task execution is complete, the cached data becomes invalid.
2. Table Type Description
Currently, only DLC external tables are cached.
Was this page helpful?