Table Design and Data Import

How to choose a data model?
For data that is already aggregated, either the Aggregate or Duplicate model can be used.
For detail data that has already been cleaned, if you need to perform aggregation for querying statistical values, choose the Aggregate model.
For detail data that has already been cleaned, if you need to perform detailed queries, choose the Duplicate model (which does not include pre-aggregation, resulting in lower aggregation query efficiency. You can enhance aggregation query performance through materialized views).
If there is a unique Key and you need to deduplicate by Key, choose the Unique Model.
How to set up partitions?
Choose partition fields with values that are as evenly distributed as possible. It is recommended to use fields such as dates or IDs for partitioning.
How to set up replicas?
The number of replicas can be set to 3 (it can also be set to 2, but setting it to 1 is not recommended as it may lead to data unavailability during subsequent rolling upgrades).
How to set up buckets?
Use Keys with a uniform Hash distribution as bucket Keys to avoid data skewing.
The number of buckets should not be too many or too few. It is recommended to keep each bucket between 1 and 10 GB in size. For small tables, a few buckets are usually sufficient.
The bucket Key can be one or multiple keys. Using multiple keys ensures more balanced data distribution, while a single key is easier to match (a single bucket key should generally be a Key with high cardinality).
How to choose field formats?
For each field in the table, prioritize using Integer types instead of string types. This greatly improves query and version merge efficiency.
Use decimal for floating-point numbers instead of double or float.
Must-Knows for Data Import
For real-time imports, it is recommended to use Stream Load, and for offline imports, Broker Load is preferred. The basic principle of import is batch loading: Reduce concurrency and try to import as much data as possible in a single instance to minimize merge costs and avoid impacting read efficiency. (For example, the total number of imports per minute should not exceed 20 times. Considering various types of concurrency, high-frequency imports are currently not suitable). Too many small files can severely affect query efficiency.
When importing data, pay special attention to filtering NULL values for fields used as Hash keys to avoid data skew. Too many small files can severely affect query efficiency.
When data is imported, make sure to specify the partitions to be imported. Otherwise, importing data into large tables may easily fail.
It is recommended to use CSV for data import and avoid JSON format.
Must-Knows for Schema Changes
To ensure cluster stability and meet business requirements, we recommend thoroughly evaluating field types before creating a table. When the table Schema needs to be modified, we only recommend the Add Column operation. We guarantee that new columns will be added as quickly as possible.
Must-Knows for Data Clearing
If you need to clear partition data, it is recommended to prioritize the truncate operation (which is equivalent to dropping and then creating the partition) instead of the delete operation. The delete operation significantly impacts query performance.
Other
For unfamiliar operations, it is best to type help keyword in the command line for guidance (e.g., help stream load to understand how to perform real-time data import). Alternatively, you can Submit Ticket to get more detailed assistance with your issue.
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Voice Technology

Internet of Things

Stream Services

Cloud Real-time Rendering

Management and Audit Tools

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Developer Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Tencent Big Model

Natural Language Processing

Communication

Media Process Services

Education Sevices

Monitor and Operation

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Image Creation

Optical Character Recognition

Interactive Video Services

Media SDK

Cloud Resource Management

More

How to choose a data model?

How to set up partitions?

How to set up replicas?

How to set up buckets?

How to choose field formats?

Must-Knows for Data Import

Must-Knows for Schema Changes

Must-Knows for Data Clearing

Other