Spark parameters are used to configure and optimize the settings of Apache Spark applications.
In a self-built Spark environment, these parameters can be set through command-line options, configuration files, or programmatically.
In the DLC Standard Engine, you can set Spark parameters directly on the engine, and these parameters will take effect when users submit Spark jobs or submit interactive SQL with custom configurations.
Setting Standard Spark Engine Parameters
1. Navigate to the Standard Engine Module, click Parameter Configuration, and the Engine Parameter Side Drawer will appear.
2. The Standard Spark Engine parameter configuration is divided into two sections: Job Default Resource Specification and Parameter Configuration. The Job Default Resource Specification defaults to the total resources available to the Standard Spark Engine, and this setting can be modified.
The Parameter Configuration section is initially empty, and users can add parameters as needed.
You can see the official Spark documentation for details on Spark parameters.
Using Standard Spark Engine Parameter Configuration
The scenes in which the Standard Spark Engine parameter configuration takes effect are as follows:
|
Data Jobs | Yes |
Data Exploration - Custom Configuration | Yes |
Data Exploration - Resource Group | No |
JDBC Submission | No |
1. Using Parameter Configuration in Data Jobs:
In the Data Job configuration, the default settings will inherit the parameters and resource configurations from the Standard Spark Engine.
You can override the Standard Spark Engine parameters using the job parameters (--config) and choose whether to inherit the resource configuration from the Standard Spark Engine. If you select the default configuration, the resource configuration from the Standard Spark Engine will be used.
2. Using Parameter Configuration in Data Exploration:
When you run interactive SQL with the Standard Spark Engine in Data Exploration using a custom configuration, the default settings will inherit the parameters and resource configurations from the Standard Spark Engine. You can override the engine-level parameters using the set command within the SQL and choose whether to modify the default resource configuration. After modifying the resource configuration, the changes will be cached, and the configuration on this page will be automatically saved as the user's modified resource configuration.
Note:
1. The parameter configuration of the Standard Spark Engine will only apply to Spark data jobs and Data Exploration with custom configuration; it will not take effect at the resource group level. Resource group-level parameters can be adjusted within the resource group settings.
2. CPU and memory-related parameters will not be effective; the amount of resources a task uses can only be determined by configuring the number of CUs.
Was this page helpful?