Information | Description |
Data source type | Select Hive type. |
Data Source | Select Hive type data source. |
Database name | Customize Hive database name. |
Description | Optional: Customize description content. |
Information | Description | |
Basic Information Configuration | Data source type | Select DLC type. |
| Data Source | Select DLC type data source. |
| Database name | Custom Definition DLC database name. |
| Description | Optional: Customize description content. |
Event Policy Configuration | AddDataFiles | Set the maximum number of files to be added. Exceeding this value will trigger small file merging. |
| AddPositionDeletes | Set the maximum number of Position deletes. Exceeding this value will trigger small file merging. |
| AddEqualityDeletes | Set the maximum number of Equality deletes. Exceeding this value will trigger small file merging. |
| AddDeleteFiles | Set the number of delete files. When the total of expired snapshot's AddDataFiles + AddDeleteFiles exceeds the threshold AddDataFiles + AddDeleteFiles, the snapshot will be deleted from that point. |
Governance Rule Configuration | Small File Combination | Once enabled, a large number of data files smaller than the threshold will be combined into larger files, reducing the number of files and improving query performance. |
| Delete Expired Snapshot | Once enabled, expired historical snapshot information will be automatically cleaned up, reducing the number of metadata/data files, saving storage space, and improving query speed. |
| Delete Orphan Files | Once enabled, invalid data files will be automatically cleaned up periodically, saving storage space. |
| Metadata Merge | Once enabled, metadata manifests files will be automatically merged, reducing the number of manifests files and improving data query efficiency. |
Information | Description |
Table Creation Method | Wizard Mode Using the traditional method to manually add fields, define the field name, field Chinese name, field English name, column type, whether it is partitioned, and description after inserting the field. DDL Mode Use SQL Create Table statements to create data tables. Only the CREATE TABLE statement is supported for new tables, and only the ALTER TABLE ADD / REPLACE COLUMNS statement is supported for editing tables. For example:
Note: During the table creation process, ensure the table name in the DDL statement matches the name entered when creating the new data table. |
Permissions on Table | Project sharing Assign data table permissions to the current project. All members within the project will have data table permissions, including editing, inquiring, and deleting. Individuals and administrators only Assign data table permissions to the creator individual and the current project's administrator. (Note: Data permissions take effect in approximately 30 seconds) |
Lifecycle | EMR-Hive tables do not support lifecycle configuration. The current configuration is ineffective. Please be aware that this configuration item will be removed in future iterations. |
Storage Class | Support choosing four types of storage methods: TEXTFILE: A type of text format storage where plain text files are stored, with each line representing a record. PARQUET: A columnar storage format that divides data into rows and columns and stores them by column on the disk. It can be faster than row-based storage in certain scenarios and supports column compression. ORC: An optimized column storage format for storing and processing large-scale data. It uses advanced compression algorithms and indexing technology to improve processing speed and query efficiency. CSV: A common text format that uses commas as field delimiters and encloses each field value in quotation marks. |
Field Separator | Separate each field in the data table for reading and processing in a program or system. Five types of field delimiters are supported: \\u0001 (Hive default), | (vertical bar), (space), ; (semicolon), , (comma), \\t (tab) |
Field configuration | A field contains configuration information such as field name, field description, column type, and partition status. Partition Field Description: All fields cannot be selected as partition fields; at least one field must be a non-partition field. Partition fields do not support array, map, decimal types. |
Information | Description | |
Data Table Format | Select Table Creation Type | You can choose to create an internal table or an external table. |
| Data Table Source | Specify whether to create an empty table or COS COS when creating an internal table. |
| Storage Path | COS COS and external tables require the location full path. |
| Data Format | Data formats include: CSV, JSON, PARQUET, ORC, AVRO. |
| Data Table Version | Select the data table version, V1 or V2. |
| upsert | When selecting the data table version V2, you can choose whether to use upsert for writing. |
Basic Attributes | Chinese name | Custom Definition of table Chinese name. |
| Description | Custom Description Information. |
Field Information | Field name | Design table field names. |
| Field Type | Supports DLC data table field types. |
| Description | Custom Definition of field description information. |
| Whether to use partitioning | Design partitioning, including partition field, conversion strategy, and policy parameters. |
| Event Policy Configuration | AddDataFiles: Set the maximum number of files to be added. Exceeding this value will trigger small file merging. |
| | AddPositionDeletes: Set the maximum number of Position deletes. Exceeding this value will trigger small file merging. |
| | AddEqualityDeletes: Set the maximum number of Equality deletes. Exceeding this value will trigger small file merging. |
| | AddDeleteFiles: Set the number of delete files. When the total of expired snapshot's AddDataFiles + AddDeleteFiles exceeds the threshold AddDataFiles + AddDeleteFiles, the snapshot will be deleted from that point. |
| Governance Rule Configuration | Support enabling data table governance rules. Governance rule configuration items can choose to inherit the governance rules of the database selected when the current data table was created, or the data table can have its own Definition governance rules. The following governance rules are included: Small File Merge: Once enabled, a large number of data files smaller than the threshold will be combined into larger files, reducing the number of files and improving query performance. Delete Expired Snapshots: Once enabled, expired historical snapshot information will be automatically cleaned up, reducing the number of metadata/data files, saving storage space, and improving query speed. Delete Orphaned Files: Once enabled, invalid data files will be automatically cleaned up periodically, saving storage space. Metadata Merge: Once enabled, metadata manifests files will be automatically merged, reducing the number of manifests files, and improving data query efficiency. |
Attribute settings | Parameter configuration | Support self Definition data table parameter configuration, such as format-version, write.upsert.enabled. |
Information | Description |
Data source type | Hive type data sources are supported. |
Data Source | Select the WeData data source under the corresponding data source type. |
Database | Displays the Hive databases bound to the current project and links by data source type. Searching by library name is supported. |
Bucket | COS bucket for temporarily storing uploaded files. |
Table name | The default is to automatically enter the uploaded file name without the suffix, but you can customize the name. |
Upload resources | Click to upload or drag and drop to upload, a progress bar is provided. The upload format is: CSV or TSV format. |
Information | Description | |
Basic Attributes | Permissions on Table | Select the permission ownership after creating the current data table, either for in-project sharing or for use by the individual and administrator only. |
| Chinese name | The default automatically incorporates the file name without the suffix, can be customized. |
| Description | Custom Data Table Description Information. |
File Attributes | Data preview | After file parsing, only the first 500 rows of data are displayed. Click Re-upload to open the file upload dialog for re-uploading the table file. |
| File Format | Drop-down selection supports CSV,TSV. |
| Column delimiter | Users can enter custom input, a single character or a Unicode escape sequence like \\u0001. CSV default: , (comma) TSV default: \\t (tab character) |
| Column Quotes | The default is double quotes. Users can switch to single quotes. |
| First line is column name | The default is no. It can be switched to yes. |
| File encoding method | Default is UTF-8. Users can choose UTF-8, GBK, ISO-8859-1. |
Field attributes | Field name | Field names are parsed according to the first line of the file being the column names attribute. If the first line of data in the file is not the column name, use column_1, column_2, column_3, ... column_x to sequentially fill in the field names. Users can also custom define and modify the field names. |
| Field Chinese Name | Custom Definition Field Chinese Name. |
| Field English Name | Custom Definition Field English Name. |
| Column type | Choose the corresponding data type supported by the data source based on the data source type. |
| Description | Custom Definition of field description information. |
Information | Description | |
Basic information | Data Type | The storage and computing engine type to which the data table belongs. |
| Database name | The name of the database to which the data table belongs. |
| Table name | The identifier name of the data table. |
| Owner | The person in charge of the data table. |
| Chinese name | The Chinese name of the data table. |
| Description | User-defined description information. |
Storage Information | Table Size | The data in the current table has occupied physical storage space. |
| Lifecycle | The lifecycle of the current table is used to control its effective usage time, enhancing overall security and saving storage and computing resources during data governance. |
| Creation Time | Creation date and time of the current table. |
Was this page helpful?