Table Storage Format Selection

Tencent Cloud TCHouse-P

Product Introduction

Purchase Guide

Operation Guide

Managing Cluster

Creating Cluster

Cluster Information

Scaling Cluster

Managing IP Allowlist and Blocklist

Applying for Public IP

Managing Resource Queue

Terminating Cluster

Accessing Data Warehouse

Connecting to Database

Managing User Permission

Defining Database

Managing Data

Monitoring and Alarming

Access Management

Performance Metrics

Data Ingestion

Importing TencentDB Data Offline with DataX

Syncing Incremental Data from MySQL with DataX

Importing and Exporting COS Data at High Speed with External Table

Syncing EMR Data with External Table

Implementing CDWPG UPSERT with Rule

Data Warehouse Development

Creating Airflow in Cloud

API Documentation

Making API Requests

Information Query APIs

DescribeInstance

DescribeInstances

DescribeInstanceState

DescribeSimpleInstances

Instance APIs

Practical Tutorial

Data Warehouse Table Development

Table Distribution Key Selection

Table Storage Format Selection

Table Partition Usage

Extension Usage

Cold Data Backup

Statistics and Space Maintenance

FAQs

CDWPG Policy

Data Processing And Security Agreement

DocumentationTencent Cloud TCHouse-PPractical TutorialTable Storage Format Selection

Table Storage Format Selection

Download PDF

Last updated: 2024-11-27 15:36:05

Table Storage Format Selection

Last updated: 2024-11-27 15:36:05

Download PDF

This document describes how to select a storage format in Tencent Cloud TCHouse-P.
Storage Format Overview
Greenplum (GP) stores data in heap or AO (AORO or AOCO) tables:
Heap table: It is inherited from PostgreSQL and is currently the default storage format of GP. It only supports row-oriented storage.
AO table: AO table was originally designed to only support APPEND (i.e., INSERT), so it was called append-only. It has been optimized since v4.3 and now supports UPDATE and DELETE, so it has been renamed append-optimized. AO supports both row-oriented (AORO) and column-oriented (AOCO) storage.
Heap Table
Heap table is inherited from PostgreSQL and uses MVCC for consistency. If you don't specify any storage format when creating a table, GP will use the heap table format.
A heap table supports partitioned table and row storage but not column storage or compression. It should be noted that when processing UPDATE and DELETE operations, the heap table does not actually delete data; instead, it relies on version information to block old data. Therefore, if your table has a large number of UPDATE or DELETE operations, the physical space used by the table will keep increasing. In this case, you need to use VACUUM to clear old data.
A heap tables doesn't support logical incremental backup, so if you want to take a snapshot of the heap table, you need to export the full data each time.
Table creation statement:
CREATE TABLE heap(
    a int,
    b varchar(32)
) DISTRIBUTED BY (a);
Best practices
For small tables such as dimension tables in the data warehouse or those containing fewer than one million data records, heap tables are recommended.
In OLTP scenarios where many UPDATE and DELETE operations exist and queries are mostly point queries with indexes, heap tables are recommended.
AO Table
AO table is designed to be used as large fact table in the data warehouse. It supports row storage (not recommended), column storage, and data compression.
An AO table is very different from a heap table in both the logical and physical table structures. For example, the heap table mentioned above uses MVCC to control the visibility of data after UPDATE and DELETE operations, while the AO table uses an additional bitmap table to indicate what data is visible in the AO table.
For an AO table with a large number of UPDATE and DELETE operations, you also need to use VACUUM for maintenance. However, in the AO table, VACUUM needs to reset the bitmap and compress the physical file, so it is usually slower than in a heap table.
AOCO
An AOCO table organizes data in columns and supports column-level compression.
The table creation statement is as follows, with the partitioning feature added:
CREATE TABLE aoco(
    a int  ENCODING (compresstype=zlib, compresslevel=5),
    b int  ENCODING (compresstype=none),
    c varchar(32) ENCODING (compresstype=RLE_TYPE, blocksize=32768),
    d varchar(32),
    fdate date
)
WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=6, blocksize=65536)
DISTRIBUTED BY (a)
PARTITION BY RANGE(fdate) 
(
    PARTITION pn START ('2018-11-01'::date) END ('2018-11-10'::date) EVERY ('1 day'::interval),
    DEFAULT PARTITION pdefault
);
Compression
Compression is mainly used for column-oriented tables or append-write (appendonly=true) row-oriented tables. The following two types of compression are available:
Table-level compression.
Column-level compression, where you can apply different compression algorithms to different columns.
Currently, Tencent Cloud TCHouse-P supports zstd, zlib, and rle_type compression algorithms.
Examples:
Create a column-oriented table using level-5 zlib compression:
CREATE TABLE foo (a int, b text) 
   WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=5);
Create a column-oriented table using level-5 zstd compression:
CREATE TABLE foo (a int, b text) 
   WITH (appendonly=true, orientation=column, compresstype=zstd, compresslevel=5);
Best practices
AOCO is typically used for fact tables in the data warehouse. Such tables have many fields and large data volumes and are mainly used in OLAP scenarios, where only some fields in the tables are read and aggregated when queried, with no SELECT \* FROM involved.
As AOCO is generally used for large tables, compression and partitioning are often used together to reduce the actual storage capacity and improve the performance.
In general, you can select the zlib compression algorithm at level 4 or 5; however, be sure to use the rle_type algorithm for fields with many repeated values.
Do not make the blocksize too large, especially for partitioned tables. GP maintains a buffer for each field in each partition, so if the blocksize is too large, the memory usage will be very high. The default value of 32768 is suitable in most cases.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha