Field name | Column Type | Partitioning |
pt_date | string | Yes |
id | int | No |
name | string | No |
gender | int | No |
INSERT INTO TABLE emall.dq_test PARTITION (pt_date = '2024-05-01')VALUES ('1', 'Zhang San', '1');
Element | Note |
Rule Type | Select System Template. Here, you can choose a System Template, a Custom Template, or Custom SQL: System Template: WeData has built-in 56 rule templates that you can try for free. Detailed descriptions of each template can be found in the System Template Description. Custom Template: You can add rules that are applicable to your own business in the rule template menu for easy reuse. Detailed operation instructions can be found in the Custom Template Description. Custom SQL: You can directly write SQL statements as detection rules. Detailed operation instructions can be found in Adding Quality Rules. |
Monitoring Object | Select Table(table). Monitored objects can be divided into: Table-level and Field-Level: Table-level, can monitor the number of table rows, table size (only supports Hive tables). Field-Level, can monitor whether a field is empty, duplicated, its average value, maximum value, minimum value, etc. |
Select Template | Select Number of table rows. WeData has built-in 56 rule templates that you can try for free. Detailed descriptions of each template can be found in the System Template Description. |
Detection Range | Select Conditional Scan. Enter the following WHERE Condition:
Note: Typically, partition fields are specified here to avoid full table scans for every quality task, thus preventing a waste of computing resources. In SQL, ${yyyy-MM-dd-1d} is a date variable representing the day before the execution date. It will be replaced with the specific date when the quality task is executed. For example: When the quality task executes on 2024-05-02 00:00:00, ${yyyy-MM-dd-1d} will be replaced with 2024-05-01. The specific replacement logic of time variables can be found in the Time Parameter Description. |
Trigger Condition | Comparator: select Less than. Comparison Value: enter 1. Number of table rows less than 1, combined with the time variable entered for the detection range, indicates: trigger an alert when there is no new data added yesterday. Note: The trigger condition entered here is Abnormal Value, meaning: the condition that triggers the alert. |
Trigger Level | Select Medium. Trigger levels can be categorized into: High, Medium, Low. High: When an alert is triggered, immediately block downstream task execution (only effective when linked with a production task). Medium: Only trigger an alert. Low: Does not trigger an alarm, only displays the result as abnormal. |
Element | Note |
Execution Method | Select Associate Production Scheduling. You can choose to associate production scheduling and offline periodic detection here: Associate Production Scheduling: That is, associated scheduling. Quality tasks are linked with production tasks (data synchronization tasks or data development tasks), and running quality rule tasks are inserted after the completion of production tasks. If an anomaly is found, the handler will be notified immediately to address the issue, and depending on the task level, downstream tasks may be blocked to prevent the problem data from expanding. Note: The same quality inspection task can be associated with multiple production tasks; likewise, the same production task can be associated with multiple quality inspection tasks. Offline Periodic Detection: That is, independent scheduling. For selected database tables, the core business fields are set for periodic quality inspections at a self-defined frequency such as daily, hourly, or by minute. Quality tasks will be executed at the set period, and if anomalies are detected, subscribers will be notified immediately. |
Execution Engine | Select Hive. Here you can choose between Hive and Spark, depending on the EMR resources purchased. Generally, Hive tables can directly select the Hive engine. |
Computing Resources | Select default Here you can select a resource group within the EMR cluster. Generally, you can directly select default. |
Execute Resources | The execute resources here refer to the schedule resource group that the project has already bound. |
Select task | Select the Hive SQL task created in the preparation work. |
Was this page helpful?