Data quality is one of the core components of data governance. It aims to help users discover dirty data generated in DataInLong and data development at the earliest opportunity, automatically intercept abnormal tasks, block the transmission of dirty data downstream, and reduce the cost and resources wasted in dealing with user issues.
Applicable roles:Data development engineers, data warehouse table managers.
Note on Fees
The costs associated with running data quality tasks mainly consist of the following three parts:
1. WeData product feature version fees (prerequisite).
2. WeData execution resource costs:Charges are based on the scheduling resources consumed by the quality task instances.
3. Non-WeData Direct Expenses: Quality task verification requires the execution coordination of engines and data source services (such as EMR, DLC, TCHouse-D, TCHouse-P, etc.), which will incur engine fees. These fees are charged by the engine side and are not included in the WeData billing items. Please refer to the billing instructions in the respective engine product documentation on the Tencent Cloud official website for specific charges for each engine.
Core Capabilities
The Quality Module mainly includes the following core features:
1. Supports multiple Tencent Cloud big data storage engines (EMR, DLC, TCHouse-P, TCHouse-D) as well as open-source big data storage engines (Doris).
2. Data Quality Inspection Rules can be configured at the table-level and field-level.
3. Configure the execution strategy based on actual business scenarios.
4. Set rule strength to decide whether to block downstream tasks.
5. Supports multiple user contact methods (WeChat Work Group, WeChat, Telephone, SMS, Email, FeiShu Group, DingTalk Group).
6. Quality scores can be compiled from six dimensions (accuracy, timeliness, completeness, uniqueness, consistency, and validity) and form quality reports at the library and table levels.
Module Feature
An introduction to the Data Quality module features is as follows:
|
Quality Overview | Quality Result Overview: Viewing inspection and rule operation status; Viewing alert situations and table alarm ranking. |
Rule Template | Unified management of rule templates for easy reuse: 56+ system built-in templates: view only; Custom Rule Templates: support for add, delete, modify, and search operations. |
Data Monitoring | Creating Detection Rules: Supports multiple Tencent Cloud big data engines: EMR, DLC, TCHouse-P, TCHouse-D, Doris Supports various creation methods: Single Table Addition, Multiple Tables Addition, Bulk Upload. Viewing Detection Rules: Supports multiple viewing methods: View All, Table Dimension, Rule Dimension; Supports viewing rule lists for a particular table and managing rules. |
Operations Management | Execute Instances and Results: Supports viewing execution results of quality tasks and historical operation of each rule; Supports exporting execution results and viewing historical export logs. Quality Task: Supports viewing created quality inspection tasks; Supports configuring alert information for quality tasks. Alert Information: Supports viewing historical alert situations. |
Quality Report | Quality Report: Supports compiling historical operation results into quality scores from multiple dimensions: Database Table, Rule Dimension; Supports viewing quality scores from multiple dimensions: Comprehensive Quality Index, Dimension Quality Score, Quality Score Details. |
Notes
After configuring table-level and field-level data quality rules for EMR, DLC, TCHouse-P, and TCHouse-D, the scheduling node that produces data needs to use a scheduling resource group connected to the network and ensure the stability of the execution machine and that its version has been updated to the new version, in order to normally trigger data quality rule verification.
Each table can be configured with multiple table-level and field-level data quality rules to validate simultaneously.
Was this page helpful?