Kudu Data Source

Use Limits
1. Kudu reader must configure upperBound and lowerBound for concurrency settings to be effective.
2. upperBound and lowerBound in kudu reader are of long type, thus only time type or integer fields support being set as Bound.
3. The reader uses kudu-client to directly connect to the Kudu server to read data, where conditions do not support Impala SQL syntax.
4. Incremental synchronization WHERE condition syntax: create_time>='${yyyy-MM-dd-1d HH:mm:ss}' and create_time<'${yyyy-MM-dd HH:mm:ss}'
Bound currently supports integer and date functions configuration. Date functions configuration usage method:
// Converts to 13-bit timestamp (milliseconds)
TimestampMillis('yyyy-MM-ddTHH:mm:00+0800')
TimestampMillis('2023-07-10T00:00:00+0800')
TimestampMillis('2023-07-10 00:00:00')
TimestampMillis('2023-07-10')
﻿
// Converts to 10-bit timestamp (seconds)
TimestampSeconds('yyyy-MM-ddTHH:mm:00+0800')
TimestampSeconds('2023-07-10T00:00:00+0800')
TimestampSeconds('2023-07-10 00:00:00')
TimestampSeconds('2023-07-10')
Kudu Offline Single Table Read Node Configuration
﻿
﻿
﻿
Parameters
Description
Data Source
Available Kudu data source.
Database
Supports selection or manual input of the library name to read from.
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Supports selecting or manually entering the table name to be read.
Split Key
Specify the field for data sharding. After specifying, concurrent tasks will be launched for data synchronization. You can use a column in the source data table as the partition key. It is recommended to use the primary key or indexed column as the partition key.
Filter Condition (Optional)
In actual business scenarios, it is common to select the data of the current day for synchronization, setting the WHERE condition to gmt_create>$bizdate. WHERE condition can effectively perform business incremental synchronization. If the WHERE statement is not filled, including not providing the key or value of WHERE, data synchronization will be regarded as synchronizing full data.
upperBound
Partition limit.
If SQL table creation statement partition "5"<= values <="10", then lowerbound is "5" , upperbound is "10"; 
If SQL table creation statement partition value ="x", then lowerbound is "x", upperbound is "x\000";
lowerBound
Partition Lower Limit.
If SQL table creation statement partition "5"<= values <="10", then lowerbound is "5" , upperbound is "10"; 
If the SQL create table statement has partition value = "x", then the lowerbound is "x" and the upperbound is "x\000"
Kudu Offline Single Table Write Node Configuration
﻿
﻿
﻿
Parameters
Description
Data Destination
Kudu Data Source to be written.
Database
Supports selection or manual input of the database name to write to
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Supports selection or manual input of the table name to write to
If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
Whether to Clear Table
You can manually choose whether to clear the Kudu data table before writing to it.
Write Mode
Kudu writing supports three modes:
Append: When there is a conflict with the primary key/unique index, the conflicting rows cannot be written 
Overwrite: When there is a conflict with the primary key/unique index, the original rows are deleted before inserting new rows 
On Duplicate Key: When there is a conflict with the primary key/unique index, the new rows will replace the specified fields
Batch Submission Size
Record size for one-time batch submission, which can greatly reduce the number of network interactions between the data synchronization system and Kudu, and improve overall throughput. If this value is set too high, it can cause an OOM exception in the data synchronization process.
Data type conversion support
Read
Kudu Data Type
Internal Types
int8,int16,int32,int64
Long
float,double,decimal
Double
string,varchar
String
unixtime_micors,date
Date
binary
Bytes
bool
Boolean
Write
Internal Types
Kudu Data Type
Long
int8,int16,int32,int64
Double
float,double,decimal
String
string,date
Date
unixtime_micors,varchar
Bytes
binary
Boolean
bool
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Parameters	Description
Data Source	Available Kudu data source.
Database	Supports selection or manual input of the library name to read from. By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table	Supports selecting or manually entering the table name to be read.
Split Key	Specify the field for data sharding. After specifying, concurrent tasks will be launched for data synchronization. You can use a column in the source data table as the partition key. It is recommended to use the primary key or indexed column as the partition key.
Filter Condition (Optional)	In actual business scenarios, it is common to select the data of the current day for synchronization, setting the WHERE condition to gmt_create>$bizdate. WHERE condition can effectively perform business incremental synchronization. If the WHERE statement is not filled, including not providing the key or value of WHERE, data synchronization will be regarded as synchronizing full data.
upperBound	Partition limit. If SQL table creation statement partition "5"<= values <="10", then lowerbound is "5" , upperbound is "10"; If SQL table creation statement partition value ="x", then lowerbound is "x", upperbound is "x\000";
lowerBound	Partition Lower Limit. If SQL table creation statement partition "5"<= values <="10", then lowerbound is "5" , upperbound is "10"; If the SQL create table statement has partition value = "x", then the lowerbound is "x" and the upperbound is "x\000"

Parameters	Description
Data Destination	Kudu Data Source to be written.
Database	Supports selection or manual input of the database name to write to By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table	Supports selection or manual input of the table name to write to If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
Whether to Clear Table	You can manually choose whether to clear the Kudu data table before writing to it.
Write Mode	Kudu writing supports three modes: Append: When there is a conflict with the primary key/unique index, the conflicting rows cannot be written Overwrite: When there is a conflict with the primary key/unique index, the original rows are deleted before inserting new rows On Duplicate Key: When there is a conflict with the primary key/unique index, the new rows will replace the specified fields
Batch Submission Size	Record size for one-time batch submission, which can greatly reduce the number of network interactions between the data synchronization system and Kudu, and improve overall throughput. If this value is set too high, it can cause an OOM exception in the data synchronization process.

Kudu Data Type	Internal Types
int8,int16,int32,int64	Long
float,double,decimal	Double
string,varchar	String
unixtime_micors,date	Date
binary	Bytes
bool	Boolean

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service free trial

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

E-commerce

E-commerce retail solutions

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Financial Services

Financial Services Solution

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha

Cloud Workload Protection Platform

Data Security Governance Center

Key Management Service