What Happens to Data that Fails to Be Processed due to Format Issues During Data Processing?
During data processing, data with processing failure due to format issues or does not meet filter rules will be directed to the Failure Handling policy.
Failure Handling policies include:
1. The Discard policy means discarding raw data that failed to process.
2. The Retain policy means retaining the raw data that failed to process and encapsulating the raw data and failure information into a new message body, and delivering it to the target topic.
3. The Dead Letter Queue policy means retaining the raw data that failed to process and encapsulating the raw data and failure information into a new message body, and delivering it to the Dead Letter Queue.
How to Identify Performance Bottlenecks in Data Processing?
The process of data processing consists of three phases:
1. Consuming data from the source Topic;
2. Processing the raw data;
3. Delivering the processed data to the data target Topic.
We can monitor the performance of phase 1 by tracking the message backlog and consumption speed of the consumer, and for phase 3, by monitoring the production speed of the target Topic.
Generally, we can increase the number of partitions in the source Topic to improve the maximum concurrency of task scalability, thereby enhancing the overall task performance.
Was this page helpful?