Parameter name | Parameter description |
Data access policy | Required, security policy to access COS data during task execution. For details, refer to DLC Configuration Data Access Policy. |
Entry parameters | Optional, entry parameters of the program. Multiple parameters are supported and should be separated by "space". |
Dependent resources | Optional, supports selecting --py-files, --files, --archives. Multiple COS paths for each resource can be input, separated by commas (,). |
Conf parameters | Optional, parameters starting with spark., formatted as k=v. Multiple parameters should be separated by new lines. Example: spark.network.timeout=120s. |
Task image | The image for task execution. If the task requires a specific image, you can choose between DLC built-in image and custom image. |
Resource configuration | Using cluster resource configuration: Use the default resource configuration parameters of the cluster. Custom: Resource usage parameters for custom tasks, including executor size, driver size, and number of executors. |
from os.path import abspathfrom pyspark.sql import SparkSessionif __name__ == "__main__":spark = SparkSession \\.builder \\.appName("Operate DB Example") \\.getOrCreate()# 1. Create databasespark.sql("CREATE DATABASE IF NOT EXISTS `DataLakeCatalog`.`dlc_db_test_py` COMMENT 'demo test' ")# 2. Create inner tablespark.sql("CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`dlc_db_test_py`.`test`(`id` int,`name` string,`age` int) ")# 3. Write inner dataspark.sql("INSERT INTO `DataLakeCatalog`.`dlc_db_test_py`.`test` VALUES (1,'Andy',12),(2,'Justin',3) ")# 4. Query inner dataspark.sql("SELECT * FROM `DataLakeCatalog`.`dlc_db_test_py`.`test` ").show()# 5. Create outer tablespark.sql("CREATE EXTERNAL TABLE IF NOT EXISTS `DataLakeCatalog`.`dlc_db_test_py`.`ext_test`(`id` int, `name` string, `age` int) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION 'cosn://cos-bucket-name/ext_test' ")# 6. Write outer dataspark.sql("INSERT INTO `DataLakeCatalog`.`dlc_db_test_py`.`ext_test` VALUES (1,'Andy',12),(2,'Justin',3) ")# 7. Query outer dataspark.sql("SELECT * FROM `DataLakeCatalog`.`dlc_db_test_py`.`ext_test` ").show()spark.stop()
Was this page helpful?