Parameters | Description |
Data Source | Available HIVE Data Source |
Database | Supports selecting or manually entering the database name to be read By default, the database bound to the data source is used as the default database. Other databases need to be manually entered If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected |
Table | Supports selecting or manually entering the table name to be read |
Read Method | Only JDBC reading method is supported |
Filter Conditions (Optional) | When reading data based on Hive JDBC, it supports using WHERE conditions for data filtering. However, in this scenario, the Hive engine may generate MapReduce tasks at the backend, resulting in lower efficiency |
Advanced Settings (Optional) | You can configure parameters according to business needs. |
Parameters | Description |
Data Destination | Specify the Hive data source to write to. |
Database | Supports selection or manual input of the database name to write to By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected. |
Table | Supports selection or manual input of the table name to write to If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected. |
Write Mode | Hive writing supports three modes: Append: Retain original data, append new rows nonConflict: Report an error when data conflicts occur Overwrite: Delete the existing data and rewrite it writeMode is a high-risk parameter. Please pay attention to the data output directory and write mode to avoid accidental data deletion. The data loading behavior needs to be used with hiveConfig. Please check your configuration. |
Batch Submission Size | The record size of one-time batch submission, this value can greatly reduce the number of network interactions between the data synchronization system and Hive, and improve the overall throughput. If this value is set too large, it may cause OOM exceptions in the data synchronization process. |
Empty Character String Processing | No action taken: Do not process empty strings when writing. Processed as null: Process empty strings as null when writing. |
Pre-Executed SQL (Optional) | The SQL statement executed before the synchronization task. Fill in the correct SQL syntax according to the data source type, such as clearing the old data in the table before execution (truncate table tablename). |
Post-Executed SQL (Optional) | The SQL statement executed after the synchronization task. Fill in the correct SQL syntax according to the data source type, such as adding a timestamp (alter table tablename add colname timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP). |
Advanced Settings (Optional) | You can configure parameters according to business needs. |
Hive Data Types | Internal Types |
TINYINT,SMALLINT,INT,BIGINT | Long |
FLOAT,DOUBLE | Double |
String,CHAR,VARCHAR,STRUCT,MAP,ARRAY,UNION,BINARY | String |
BOOLEAN | Boolean |
Date,TIMESTAMP | Date |
Internal Types | Hive Data Types |
Long | TINYINT,SMALLINT,INT,BIGINT |
Double | FLOAT,DOUBLE |
String | String,CHAR,VARCHAR,STRUCT,MAP,ARRAY,UNION,BINARY |
Boolean | BOOLEAN |
Date | Date,TIMESTAMP |
Region | CHDFS permission ID for Data Integration resource group |
Beijing | ag-wgbku4no |
Guangzhou | ag-x1bhppnr |
Shanghai | ag-xnjfr9d3 |
Singapore | ag-phxtv0ah |
USA Silicon Valley | ag-tgwl8bca |
Virginia | ag-esxpwxjn |
alter table table_name add columns (time_interval TIMESTAMP) cascade;
Was this page helpful?