Basic Feature Usage

Tencent Cloud TCHouse-D

Product Introduction

Purchase Guide

Configuration Adjustment Billing Instructions

Getting Started

Using Tencent Cloud TCHouse-D Through the Console

Using Tencent Cloud TCHouse-D Through a Client

Operation Guide

Cluster Operation

Creating and Terminating

Scale-Out and Scale-In

Configuration Adjustment

Monitoring and Alarm Configuration

Cluster Monitoring

FE Monitoring Metric

BE Monitoring Metric

Business Monitoring Metrics

Alarm Configuration

Account Privilege Management

Privilege Management Through a Client

Data Management

Data Backup

Data Recovery

Cross-Cluster DataMigration

Database Auditing

Cold-Hot Data Tiering

Query Management

Query Analysis

Slow Query Analysis

Modify Configurations

Modify Configurations Through Console

Modify Configurations Through Client

Node Management

Log Analysis

SQL Studio

Enabling Resource Isolation

Development Guide

Design of Data Table

Data Table and Data Model

Data Partition and Bucket

Data Distribution and Replica

Index, Sorted Column and Prefix Index

Importing Data

Import Overview

Import Methods

Stream Load (local file)

Broker Load (HDFS Data)

S3 Load ( Cloud Object Storage, COS)

Spark Load

Routine Load (Kafka Data)

Flink Connector (Real-time or Batch Data with Flink)

INSERT INTO

Synchronizing Data with JDBC

Synchronizing Data Through External Tables

Real-time or Batch Writing of MySQL Data

Importing Data from Kafka

Importing with DataX

Importing Doris from Logstash

Data Transformation, Column Mapping and Filtering

Importing Strict Mode

Importing Data in JSON Format

Exporting Data

Exporting via the EXPORT Statement

Using Mysqldump Tool to Export Table Structure or Data

Basic Feature

Table Structure Change

Two Tables of Atomic Swap (Swap)

Query Optimization

Caching Table or Partition to Memory

Colocation Join

Bucket Shuffle Join

Runtime Filter

Query Profiling and Optimization

Query Cache Configuration

Ecological Expansion Feature

API Documentation

Making API Requests

Cluster Operation APIs

ModifyClusterConfigs

ModifyInstanceKeyValConfigs

ModifyNodeStatus

RestartClusterForConfigs

RestartClusterForNode

DescribeClusterConfigsHistory

ModifyInstance

DescribeCreateTablesDDL

DescribeInstanceOperationHistory

Database and Table APIs

ExecuteParametrizedQuery

Cluster Information Viewing APIs

DescribeClusterConfigs

DescribeAreaRegion

DescribeInstancesHealthState

DescribeSpec

DescribeInstance

DescribeInstanceState

DescribeInstances

DescribeInstanceNodesInfo

DescribeInstanceUsedSubnets

DescribeSqlApis

DescribeInstanceNodes

DescribeInstanceNodesRole

Hot-Cold Data Layering APIs

CheckCoolDownWorkingVariableConfigCorrect

DescribeCoolDownBackends

DescribeCoolDownPolicies

DescribeCoolDownTableData

Database and Operation Audit APIs

DescribeBackUpJobDetail

DescribeBackUpTables

DescribeBackUpTaskDetail

DescribeRestoreTaskDetail

DescribeBackUpSchedules

RecoverBackUpJob

DescribeInstanceOperations

DescribeDatabaseAuditDownload

DescribeDatabaseAuditRecords

DescribeSlowQueryRecords

DescribeSlowQueryRecordsDownload

DescribeQueryAnalyse

User and Permission APIs

ActionAlterUser

ModifyUserPrivilegesV3

DescribeUserPolicy

ModifyDatabaseTableAccess

Resource Group Management APIs

DescribeUserBindWorkloadGroup

CreateWorkloadGroup

DeleteWorkloadGroup

DescribeWorkloadGroup

ModifyWorkloadGroup

ModifyUserBindWorkloadGroup

ModifyWorkloadGroupStatus

Data Types

Error Codes

Cloud Ecosystem

Granting CAM Policies to Sub-accounts

Query Acceleration for Tencent Cloud DLC

Practical Tutorial

Basic Feature Usage

Advanced Features Usage

Resource Specification Selection and Optimization Suggestions

Naming Specifications and Limits to the Database and Data Table

Table Design and Data Import

Query Optimization

Suggested Usage to Avoid

Accessing TCHouse-D via JDBC over the Public Network

Performance Testing

TPC-H Performance Testing

Test Scheme Introduction

Test Results for Reference

SSB Performance Testing

Test Scheme Introduction

Test Results for Reference

TPC-DS 性能测试

Testing Scheme Introduction

Test Results for Reference

FAQs

Common Operational Issues

Common Errors

Glossary

Product Policy

Service Level Agreement

Data Processing And Security Agreement

DocumentationTencent Cloud TCHouse-DPractical TutorialBasic Feature Usage

Basic Feature Usage

Download PDF

Last updated: 2024-07-31 09:16:24

Basic Feature Usage

Last updated: 2024-07-31 09:16:24

Download PDF

Create a table
Choosing a Data Model
The Doris data model is currently divided into three categories: AGGREGATE KEY, UNIQUE KEY, and DUPLICATE KEY. In all three models, data is sorted by KEY.
1. AGGREGATE KEY
When AGGREGATE KEYs are the same, the new and old records are aggregated. Currently supported aggregate functions include SUM, MIN, MAX, and REPLACE.
The AGGREGATE KEY model can aggregate data in advance and is suitable for reporting and multidimensional analysis business.
CREATE TABLE site_visit
(
 siteid      INT,
 city        SMALLINT,
 username    VARCHAR(32),
 pv BIGINT   SUM DEFAULT '0'
)
AGGREGATE KEY (siteid, city, username)
DISTRIBUTED BY HASH (siteid) BUCKETS 10;
2. UNIQUE KEY
When UNIQUE KEYs are the same, the new record overwrites the old record. Currently, the implementation of UNIQUE KEYs is the same as the REPLACE aggregation method of AGGREGATE KEYs, and the two are essentially the same. It is suitable for analytical businesses requiring updates.
CREATE TABLE sales_order
(
     orderid     BIGINT,
     status      TINYINT,
     username    VARCHAR(32),
     amount      BIGINT DEFAULT '0'
)
UNIQUE KEY(orderid)
DISTRIBUTED BY HASH(orderid) BUCKETS 10;
3. DUPLICATE KEY
Only the sort column is specified, and the same rows will not be merged. It is suitable for analytical businesses where advanced data aggregation is not required.
CREATE TABLE session_data
(
     visitorid   SMALLINT,
     sessionid   BIGINT,
     visittime   DATETIME,
     city        CHAR(20),
     province    CHAR(20),
     ip          varchar(32),
     brower      CHAR(20),
     url         VARCHAR(1024)
)
DUPLICATE KEY(visitorid, sessionid)
DISTRIBUTED BY HASH(sessionid, visitorid) BUCKETS 10;
Large Wide Tables and Star Schema
When the business side creates a table, in order to adapt to the front-end business, they often do not distinguish between dimension information and metric information, but define the Schema as a large wide table. For Doris, these large, such large wide tables often do not perform as well as one expects.
There are many fields in the Schema, and there may be more key columns in the aggregation model, which will increase the columns to be ordered during the import process.
The update of dimension information will be reflected in the entire table, and its update frequency directly affects query efficiency.
During use, users should use Star Schema to distinguish between dimension tables and metric tables whenever possible. Frequently updated dimension tables can also be placed in MySQL external tables. If there are only a few updates, they can be placed directly in Doris. When storing dimension tables in Doris, you can set more replicas of the dimension tables to improve the performance of Join.
Partitioning and Bucketing
Doris supports two-level partition storage. The first level is partition, which currently supports two types: RANGE partition and LIST partition. The second level is HASH bucket.
1. Partition
Partitions are used to divide data into different intervals, which can be logically understood as dividing a base table into multiple sub-tables. Data can be easily managed by partition, for example, deleting data is faster with partition.
1.1 RANGE Partition
In business, most users choose to partition by time.
1.2 LIST Partition
In business, users can choose cities or other enumeration values for partitioning.
2. HASH Bucket
Data is divided into different buckets by hash value.
It is recommended to use columns with high discrimination for bucketing to avoid data skew.
To facilitate data recovery, it is recommended to restrict the size of a single bucket to 10 GB. You should consider the number of buckets when creating a table or adding partitions. Different numbers of buckets can be designated to different partitions.
Sparse Indexes and Bloom Filters
Doris stores data in order and creates a sparse index for the data based on the order of the data. The index granularity is block (1,024 rows).
For the sparse index, a prefix of fixed length in schema is chosen as the index content. Currently in Doris, a prefix of 36 bytes is chosen as the index.
When creating a table, it is recommended to place the commonly used filter fields in the query in front of Schema. The more distinctive and frequently used query fields are, the earlier they are placed.
There is one thing needing attention, that is, the varchar type field. A varchar type field can only be the last field of a sparse index. The index is truncated at varchar, so if varchar appears first, the index may be less than 36 bytes long. For details, see Data Model, ROLLUP and Prefix Index.
In addition to sparse indexes, Doris also provides bloomfilter indexes, which have a significant filtering effect on columns with high discrimination. Considering that varchar cannot be placed in a sparse index, you can create a bloomfilter index.
Materialized Views (Rollup)
Rollup can essentially be understood as a materialized index of the Base Table. When creating a Rollup, you can select certain columns in the Base Table as Schema. The order of fields in Schema can also be different from that in the Base Table.
A Rollup can be created in the following situations:
1. The data in the Base Table is not highly aggregated.
This is usually because that there are highly discriminate fields in the Base Table. At this time, you can consider selecting certain columns and creating a Rollup.
For example, for the site_visit table:
site_visit(siteid, city, username, pv)
SiteID may result in low data aggregation. If the business side often needs to count PVs by cities, a Rollup with only city and PV can be created:
ALTER TABLE site_visit ADD ROLLUP rollup_city(city, pv);
2. The prefix index in the Base Table cannot be hit.
This is generally because that the Base Table creation method cannot cover all query modes. At this time, you can consider adjusting the column order and creating a Rollup.
For example, for the session_data table:
session_data(visitorid, sessionid, visittime, city, province, ip, brower, url)
If you want to analyze visits by browser and province in addition to visitorid, you can create a separate Rollup.
ALTER TABLE session_data ADD ROLLUP rollup_brower(brower,province,ip,url) DUPLICATE KEY(brower,province);
Schema Change
In Doris, there are currently three ways to perform Schema Changes: Sorted Schema Change, Direct Schema Change, and Linked Schema Change.
1. Sorted Schema Change
The sorting method of the column has been changed, and the data needs to be re-sorted. For example, when a sorted column is deleted, the fields need to be re-sorted.
ALTER TABLE site_visit DROP COLUMN city;
2. Direct Schema Change: No re-sorting is required, but a data transformation is required. For example, modify the column type, add a column to a sparse index, and so on.
ALTER TABLE site_visit MODIFY COLUMN username varchar(64);
3. Linked Schema Change: No need to convert data, it can be completed directly. For example, add a column.   
ALTER TABLE site_visit ADD COLUMN click bigint SUM default '0';
It is recommended to consider the Schema when creating a table, so that the process of Schema Change can be accelerated.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha