tencent cloud

Feedback

Advanced Parameters for Offline Node

Last updated: 2024-11-01 17:33:23

    Parameter Description

    Offline Type
    Read/Write
    Configuration contents
    Applicable Scenario
    Description
    MySQL
    Read
    splitFactor=5
    Single table
    -
    TDSQL MySQL
    Read
    splitFactor=5
    Single table
    -
    Doris
    Read
    query_timeout=604800
    Single table
    Timeout for the query, in seconds
    exec_mem_limit=4294967296
    Single table
    Set the execution memory limit to restrict memory usage during query execution
    parallel_fragment_exec_instance_num=8
    Single table
    Specify the number of instances for executing parallel fragments
    Hive
    Read
    mapreduce.job.queuename=root.default
    Single table
    Specify which queue the job should be submitted to
    hive.execution.engine=mr
    Single table
    A configuration parameter in Hive to specify the execution engine for Hive queries. Currently, the default is mr, no need to modify.
    Note:
    When setting advanced parameters, the mapreduce.job.queuename=root.default and hive.execution.engine=mr parameters must be used together. A single parameter won't take effect.
    DLC
    Read
    fs.cosn.trsf.fs.ofs.data.transfer.thread.count=8
    Single table
    DLC concurrent writing, supports parameters none|hash|range Parameter description:
    1. none: If a primary key exists, concurrent writing is based on the primary key; otherwise, single-threaded writing is used
    2. ash: If there are partition fields, write concurrently based on partition fields. Otherwise, write based on the 'none' strategy
    3. range: Not supported yet, the strategy is the same as 'none'
    fs.cosn.trsf.fs.ofs.prev.read.block.count=4
    Single table
    Enable small file merge in DLC, which can also be enabled in the DLC console. The parameter defaults to false. The entire database sync interface has an option to enable it. Single-table sync requires manual configuration of this parameter
    Mongodb
    Read
    batchSize=1000
    Single table
    Number of records for batch reading
    COS
    Write
    splitFileSize=134217728
    Single table
    Single file split size
    Not effective for Hive on COS
    Supports text, orc, and parquet file types
    hadoopConfig={}
    Single table
    Supports adding configurations to hadoopConfig
    HDFS
    Write
    splitFileSize=134217728
    Single table
    Single file split size
    Hive on HDFS not effective
    Supports text, orc, and parquet file types
    Hive
    Write
    compress=none/snappy/lz4/bzip2/gzip/deflate
    Single table
    Default is none. This is valid only for textfile format, and not for orc/parquet (orc/parquet requires specifying compression in the create table statement)
    format=orc/parquet
    Single table
    The format of HDFS temporary files, default is orc, irrelevant to the final Hive table format
    partition=static
    Single table
    Static partitioning mode. Suitable for single partition writing, saves more memory
    Doris
    Write
    sameNameWildcardColumn=true
    Single table
    MySQL-Doris configuration* supports field mapping with same names
    Write
    loadProps={"format":"csv","column_separator":"\\\\x01","row_delimiter":"\\\\x03"}
    Single table
    CSV format writing. Compared to default JSON format writing, it offers higher performance. Needs to be used together with row delimiter \\\\x03.
    DLC
    Write
    fs.cosn.trsf.fs.ofs.data.transfer.thread.count=8
    Single table
    DLC concurrent writing, supports parameters none|hash|range Parameter description:
    1. none: If a primary key exists, concurrent writing is based on the primary key; otherwise, single-threaded writing is used
    2. hash: If a partition field exists, concurrent writing is based on the partition field; otherwise, writing follows the strategy of parameter 'none'
    3. range: Not supported yet, the strategy is the same as 'none'
    fs.cosn.trsf.fs.ofs.prev.read.block.count=4
    Single table
    Enable small file merge in DLC, which can also be enabled in the DLC console. The parameter defaults to false. The entire database sync interface has an option to enable it. Single-table sync requires manual configuration of this parameter
    Mongodb
    Write
    replaceKey=id
    Single table
    When the writing mode is Overwrite, it's used as the business primary key for the update
    batchSize=2000
    Single table
    Batch write count, if not set, defaults to 1000
    Elasticsearch
    Write
    compression=true
    Single table
    HTTP requests, enable Compression
    multiThread=true
    Single table
    HTTP requests, whether Multi-threading is used
    ignoreWriterError
    Single table
    Ignore write errors, no retries, continue writing
    ignoreParseError=false
    Single table
    Ignore data format parsing errors, continue writing
    alias
    Single table
    Elasticsearch alias is similar to the view mechanism in databases. Create an alias my_index_alias for the index my_index. Operations on my_index_alias will be consistent with those on my_index. Configuring an alias means creating an alias for the specified index after data import is complete.
    aliasMode=append
    Single table
    Alias mode after data import completion includes append (add mode) and exclusive (keep only this one). Append adds the current index to the alias mapping (one alias corresponds to multiple indices), exclusive deletes the alias first, then adds the current index to the alias mapping (one alias corresponds to one index). Subsequently, the alias will be converted to the actual index name. The alias can be used for index migration and unified query of multiple indices, and can be used to implement the view feature.
    nullToDate=null
    Single table
    Convert null values to date type, fill null
    Kafka
    Write
    kafkaConfig={}
    Single table
    Supports Kafka Producer configuration options
    Metadata Field
    Read/Write
    Configuration contents
    Kafka
    Read
    __key__ Indicates the message key
    __value__ Indicates the complete content of the message
    __partition__ Indicates the partition where the current message is located
    __headers__ Indicates the headers information of the current message
    __offset__ Indicates the offset of the current message
    __timestamp__ Indicates the timestamp of the current message
    Elasticsearch
    Read
    _id Supports obtaining _id information

    Configuration Method

    
    
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support