Type | Note |
Supported Data Source Type | Currently, WeData supports the following Data Source Types: EMR EMR-Hive DLC TCHouse-P TCHouse-D Doris |
Ways to Add Monitoring Rules | Currently, WeData supports the following three methods: Add Rules for a Single Table: Create monitoring rules for the same table. Only one table can be selected at a time. Multiple rules can be added at a time. Add Rules to Multiple Tables: Batch create monitoring rules for multiple fields of multiple tables in the same data source. Multiple tables and multiple fields can be selected at a time. Only one monitoring rule can be selected at a time. Batch Upload Rules: Upload an Excel template for bulk import. Only one data source type can be selected at a time. Only supports custom SQL (does not support built-in templates and custom templates). You can upload up to 100 records at a time. |
Element | Note |
Rule Category | Here, you can choose a System Template, a Custom Template, or Custom SQL: System Template: WeData has 56 built-in rule templates that can be tried for free. Detailed descriptions of each template can be found in System Template Description. Custom Template: You can add rules applicable to your business in the rule template menu for easy reuse. Detailed operation instructions can be found in Custom Template Description. Select Template: Choose an already added custom template. Database Table Parameters: The page will render based on the SQL statements filled in the custom template, allowing users to choose. table_1 represents the currently selected table; table_2...table_n represent other tables that need to be specified (currently only one is supported). ${table_1.column_1}...${table_1.column_n} represent the fields within the table, and the specific fields need to be selected. where Parameter: The page will render based on the SQL statements filled in the custom template, allowing users to choose. ${param_1}...${param_n} represent the parameters in the where condition, and the specific values need to be filled in. Custom SQL: You can directly fill in the SQL statement as a detection rule. Monitored object: Only tables are supported. Custom Dimension: Need to select from six dimensions. Applicable engine: Different engines can be selected based on different data sources, for example, Hive tables support Hive and Spark. SQL Statements: An SQL statement needs to be filled in here with the following requirements: The result must be one row and one column, i.e., a fixed value. Only partition variables are allowed, such as ${yyyy-MM-dd}. Do not allow the use of table name and column name variables. |
Monitoring Object | For example, in the system template, the monitored object can be classified as Table-level and Field-level: Table-level, can monitor the number of table rows, table size (only supports Hive tables). Field-Level, can monitor whether a field is empty, duplicated, its average value, maximum value, minimum value, etc. |
Select Template | This will be filtered based on rule type and monitored objects. For example, select System template, choose Table-level as the monitored object, and you can only select Number of table rows, Table size, etc. |
Detection Range | You can choose between conditional scan and whole table. It is recommended to choose Conditional Scan. Partition where conditions can be filled in, for example:
Note: Typically, partition fields are specified here to avoid full table scans for every quality task, thus preventing a waste of computing resources. In SQL, ${yyyy-MM-dd-1d} is a date variable representing the day before the execution date. It will be replaced with the specific date when the quality task is executed. For example: When the quality task is executed at 2024-05-02 00:00:00, ${yyyy-MM-dd-1d} will be replaced with 2024-05-01. |
Trigger Condition | Comparators can select range values and size values. Example: Number of table rows less than 1. Combined with the time variable filled in the detection range, it means: When there is no new data yesterday, trigger the alarm. Comparator: select Less than. Comparison Value: enter 1. Different templates have different trigger conditions. For detailed configuration logic, refer to System Template Usage Instructions. Note: The trigger condition filled here is abnormal value, which means conditions for triggering alerts. |
Trigger Level | Select Medium. Trigger levels can be categorized into: High, Medium, Low. High: When an alert is triggered, immediately block downstream task execution (only effective when linked with a production task). Medium: Only trigger an alert. Low: Does not trigger an alarm, only displays the result as abnormal. |
Element | Note |
Execution Method | You can select Related Production Scheduling and Offline Cycle Detection. Associate Production Scheduling: That is, associated scheduling. Quality tasks are linked with production tasks (data synchronization tasks or data development tasks), and running quality rule tasks are inserted after the completion of production tasks. If an anomaly is found, the handler will be notified immediately to address the issue, and depending on the task level, downstream tasks may be blocked to prevent the problem data from expanding. Select task: You can associate data synchronization tasks and data development tasks. Note: The same quality inspection task can be associated with multiple production tasks; likewise, the same production task can be associated with multiple quality inspection tasks. Offline Periodic Detection: That is, independent scheduling. For selected database tables, the core business fields are set for periodic quality inspections at a self-defined frequency such as daily, hourly, or by minute. Quality tasks will be executed at the set period, and if anomalies are detected, subscribers will be notified immediately. Scheduling Period: Monthly, weekly, daily, hourly, or by minute. Effective Date: Select the effective time range. Interval: When selecting daily or hourly, you can choose the task interval time. Specify Date: When selecting by month or by week, you need to set the specific date, such as the day of the week or the date of the month. Execution Time: When selecting monthly, weekly, or daily, you need to set the specific runtime. |
Execution Engine | Different data sources can select different engines. EMR-Hive: You can choose Hive or Spark. Generally, Hive tables can directly select the Hive engine. DLC: You need to select the DLC data engine from the dropdown (includes Standard Engine and SuperSQL Engine). TCHouse-P: Only TCHouse-P can be selected. TCHouse-D: Only TCHouse-D can be selected. Doris: Only Doris can be selected. |
Computing Resources | Different data sources can select different computing resources. When the compute engine is EMR: You can select a resource group from the EMR cluster here; usually, you can directly select the default. When the compute engine is DLC: You can select a resource service from the DLC here. TCHouse-P: No selection needed TCHouse-D: No selection needed. Doris: No selection needed. |
Execute Resources | The execute resources here refer to the schedule resource group that the project has already bound. |
;
Element | Note |
Execution Method | You can select Related Production Scheduling and Offline Cycle Detection. Associate Production Scheduling: That is, associated scheduling. Quality tasks are linked with production tasks (data synchronization tasks or data development tasks), and running quality rule tasks are inserted after the completion of production tasks. If an anomaly is found, the handler will be notified immediately to address the issue, and depending on the task level, downstream tasks may be blocked to prevent the problem data from expanding. Select task: You can associate data synchronization tasks and data development tasks. Note: The same quality inspection task can be associated with multiple production tasks; likewise, the same production task can be associated with multiple quality inspection tasks. Offline Periodic Detection: That is, independent scheduling. For selected database tables, the core business fields are set for periodic quality inspections at a self-defined frequency such as daily, hourly, or by minute. Quality tasks will be executed at the set period, and if anomalies are detected, subscribers will be notified immediately. Scheduling Period: Monthly, weekly, daily, hourly, or by minute. Effective Date: Select the effective time range. Interval: When selecting daily or hourly, you can choose the task interval time. Specify Date: When selecting by month or by week, you need to set the specific date, such as the day of the week or the date of the month. Execution Time: When selecting monthly, weekly, or daily, you need to set the specific runtime. |
Execution Engine | Different data sources can select different engines. EMR-Hive: You can choose Hive or Spark. Generally, Hive tables can directly select the Hive engine. DLC: You need to select the DLC data engine from the dropdown (includes Standard Engine and SuperSQL Engine). TCHouse-P: Only TCHouse-P can be selected. TCHouse-D: Only TCHouse-D can be selected. Doris: Only Doris can be selected. |
Computing Resources | Different data sources can select different computing resources. When the compute engine is EMR: You can select a resource group from the EMR cluster here; usually, you can directly select the default. When the compute engine is DLC: You can select a resource service from the DLC here. TCHouse-P: No selection needed. TCHouse-D: No selection needed. Doris: No selection needed. |
Execute Resources | The execute resources here refer to the schedule resource group that the project has already bound. |
pt_date='${yyyy-MM-dd-1d}'
Element | Note |
Execution Method | You can select Related Production Scheduling and Offline Cycle Detection. Associate Production Scheduling: That is, associated scheduling. Quality tasks are linked with production tasks (data synchronization tasks or data development tasks), and running quality rule tasks are inserted after the completion of production tasks. If an anomaly is found, the handler will be notified immediately to address the issue, and depending on the task level, downstream tasks may be blocked to prevent the problem data from expanding. Select task: You can associate data synchronization tasks and data development tasks. Note: The same quality inspection task can be associated with multiple production tasks; likewise, the same production task can be associated with multiple quality inspection tasks. Offline Periodic Detection: That is, independent scheduling. For selected database tables, the core business fields are set for periodic quality inspections at a self-defined frequency such as daily, hourly, or by minute. Quality tasks will be executed at the set period, and if anomalies are detected, subscribers will be notified immediately. Scheduling Period: Monthly, weekly, daily, hourly, or by minute. Effective Date: Select the effective time range. Interval: When selecting daily or hourly, you can choose the task interval time. Specify Date: When selecting by month or by week, you need to set the specific date, such as the day of the week or the date of the month. Execution Time: When selecting monthly, weekly, or daily, you need to set the specific runtime. |
Execution Engine | Different data sources can select different engines. EMR-Hive: You can choose Hive or Spark. Generally, Hive tables can directly select the Hive engine. DLC: A dropdown selection of the DLC data engine is required (). TCHouse-P: Only TCHouse-P can be selected. TCHouse-D: Only TCHouse-D can be selected. Doris: Only Doris can be selected. |
Computing Resources | Different data sources can select different computing resources. When the compute engine is EMR: You can select a resource group from the EMR cluster here; usually, you can directly select the default. When the compute engine is DLC: You can select a resource service from the DLC here. TCHouse-P: No selection needed. TCHouse-D: No selection needed. Doris: No selection needed. |
Execute Resources | The execute resources here refer to the schedule resource group that the project has already bound. |
Was this page helpful?