tencent cloud

Feedback

Doris Data Source

Last updated: 2024-11-01 17:00:28
    DataInLong provides real-time writing capabilities for Doris. This article introduces the current support for real-time data synchronization using Doris.

    Supported Versions

    Currently, DataInLong supports real-time writing for both single tables and whole databases in Doris. To use the real-time synchronization capability, the following version limitations must be observed:
    Data Source Type
    Edition
    Doris
    0.15,1.x

    Use Limits

    Doris supports three data models: [DUPLICATE KEY|UNIQUE KEY|AGGREGATE KEY]. If you need to write to Doris in Upsert mode, ensure that the data model is UNIQUE KEY. For details, refer to Data Model - Apache Doris.
    When the source table has a primary key, the automatically created Doris target table is UNIQUE KEY; when the source table has no primary key, the automatically created Doris target table is DUPLICATE KEY.
    When the source end is Kafka, Doris only supports synchronization to existing databases and tables. Ensure that the target database and table exist before running the task.

    Whole Database Writing Node Configuration

    Data Target Settings

    
    
    
    Parameter
    Description
    Data Destination
    Select the target data source to be synchronized.
    Database Matching Policy
    Set task runtime name matching rules for databases and tables in Doris:
    Same as source database/table name: During task runtime, the system will by default match objects in the target data source that have the same name as the source database/table.
    Custom Definition: Custom Definition rules allow setting special relationships between the source and target, such as adding a unified fixed prefix or suffix to source database or table names when writing to the target database or table during task runtime. Under this strategy, the system will by default match target objects based on the naming rules.
    Note:
    When the source end is Kafka, the whole database synchronization task to Doris does not support automated database/table creation during the full synchronization stage. Please pre-construct existing databases and table objects in Doris to ensure normal task operation.
    Batch table creation
    When the source end is MySQL, TDSQL-C MySQL, or TDSQL MySQL, batch creation of Doris target tables is supported during the task configuration stage.
    Click Batch Table Creation to display the following pop-up. The system will match target databases and tables according to the user-defined database and table matching rules. Tables that do not exist on the target end will appear in Unmatched Databases/Tables, while existing tables will appear in Matched Databases/Tables.
    For unmatched tables or tables that are matched but need to be deleted and recreated (you can change the target table creation method to delete the existing table and create a new one), you can perform table creation operations. Click Start Table Creation to create target tables based on the source table structure. The creation result will be displayed as shown below. Users can review the failed tables and the reasons, check and modify the table creation statements, and retry the failed items.
    Advanced Settings
    Set the runtime parameters for Doris writing end. These parameters can be configured according to business needs

    Single Table Writing Node Configuration

    1. In the DataInLong page, click Real-time synchronization on the left directory bar.
    2. In the real-time synchronization page, select Single Table Synchronization at the top to create a new one (you can choose either form or canvas mode) and enter the configuration page.
    
    
    
    Parameter
    Description
    Data Destination
    Doris data source to be written into.
    Database
    Support selection or manual entry of the library name to be written to.
    By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
    If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
    Table
    Support selection or manual entry of the table name to be written to.
    If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
    Advanced Settings (optional)
    You can configure parameters according to business needs. The requirements are as follows:
    1. One parameter per line; if parameters need to be used together, write them on the same line.
    2. Each parameter has a default value.

    Log collection write node configuration

    Parameter
    Description
    Data Destination
    Select the available Doris data source in the current project.
    Database/Table
    Select the corresponding database table from this data source.
    Advanced Settings (optional)
    You can configure parameters according to business needs.

    Write Data Type Conversion Supported

    The supported data types for Doris writing and their corresponding conversions are as follows:
    Internal Types
    Doris Type
    NULL
    NULL_TYPE
    BOOLEAN
    BOOLEAN
    TINYINT
    TINYINT
    SMALLINT
    SMALLINT
    INT
    INT
    BIGINT
    BIGINT
    FLOAT
    FLOAT
    DOUBLE
    DOUBLE
    DATE
    DATE
    TIMESTAMP
    DATETIME
    DECIMAL
    DECIMAL
    STRING
    CHAR
    STRING
    LARGEINT
    STRING
    VARCHAR
    STRING
    STRING
    DECIMAL
    DECIMALV2
    ARRAY
    ARRAY
    MAP
    MAP
    STRING
    JSON
    STRING
    VARIANT
    STRING
    IPV4
    STRING
    IPV6

    FAQs

    How to choose and optimize Doris specifications?

    Too many import tasks, new import task submission error "current running txns on db xxx is xx, larger than limit xx"?

    Adjust fe parameter: max_running_txn_num_per_db, default is 100. You can increase it appropriately, but it is recommended to keep it within 500.

    Import frequency too fast, encountering err=[E-235] error?

    Parameter tuning suggestion: Temporarily increase the max_tablet_version_num parameter. The default is 200, and it is recommended to keep it within 2000.
    Business optimization suggestion: Reducing the import frequency is the fundamental solution to this problem.

    The imported file is too large and is restricted by parameters. Error: "The size of this batch exceed the max size"?

    Adjust BE parameter: streaming_load_max_mb, suggested to be larger than the size of the file to be imported.

    Importing data error:"[-238]"?

    Reason: -238 error usually occurs when the amount of data imported in one batch is too large, resulting in excessive Segment files for a single tablet.
    Parameter tuning suggestion: Increase the BE parameter max_segment_num_per_rowset. The default value is 200, and it can be increased by multiples (e.g., 400, 800). It is recommended to keep it within 2000
    Business optimization suggestion: Reduce the amount of data imported in a single batch.

    Import failed, error: "too many filtered rows xxx," "ErrorURL":" or Insert has filtered data in strict mode, tracking url=xxxx."?

    Reason: The schema, partition, etc., of the table do not match the imported data. You can use TCHouse-P Studio or the client to execute the doris command to check the specific reason: show load warnings on `<tracking url>`. `<tracking url>` is the error URL returned in the error message.

    

    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support