tencent cloud

Feedback

Creating Cluster

Last updated: 2024-10-30 10:32:07

    Overview

    This document introduces the directions and configurations for creating an EMR on CVM cluster through the EMR console.

    Directions

    Log in to the EMR console, click Create cluster on the EMR on CVM cluster list page. When creating a cluster, you need to go through four steps: Software configuration, AZ and hardware configuration, basic configuration, and Confirmation configuration .

    Software Configuration

    Configuration Item
    Configuration Description
    Region
    The physical data center where the cluster is deployed. Each region represents an independent physical data center, and private networks of CVMs between different regions are not interconnected. Note: Once the cluster is created, the region cannot be changed, so choose carefully.
    Cluster type
    EMR on CVM supports multiple cluster types. Select the appropriate cluster type based on your business needs. The default is the Hadoop cluster type. For an introduction to each cluster type, see Cluster Type.
    Use cases
    The Hadoop cluster type supports five use cases: Default Scene, Zookeeper, HBase, Presto, and Kudu. Select the appropriate use case for deployment based on your business needs.
    Product version
    The components and their versions bundled with different product versions vary.
    Components to deploy
    Optional components that can be customized and combined based on your needs.
    Kerberos mode
    Disabled by default. When it is enabled, the open-source components in the cluster will be started in Kerberos security mode. For more details, see Kerberos Overview.
    Component dependency
    Disabled by default. When it is enabled, components from an existing cluster are shared for use in the current cluster. For more information, see Component Configuration Sharing. Note: Since the selected components are deployed on an existing cluster, the cluster providing the dependency components cannot be directly terminated. You should first terminate all clusters that rely on its components.
    Software configuration
    Optional configuration. Before starting the cluster, you can specify a JSON file to modify component configuration parameters or access external clusters. For more details, see Software Configuration.

    AZ and Hardware Configuration

    Configuration Item
    Configuration Description
    Billing mode
    The billing mode supports monthly billing as well as pay-as-you-go billing. Monthly subscription: Prepay for N months of product fees, which is more cost-effective compared to pay-as-you-go pricing. Pay-as-you-go: Pay based on usage duration. Account identity verification is required, and a 2-hour fee will be frozen at the time of activation (vouchers cannot be used as a freezing deposit). The frozen resource fee will be refunded upon termination.
    (Cross) AZ
    1. You can choose whether to deploy across regions as needed. Cross-AZ deployment distributes service and management roles across multiple AZs, providing varying levels of high availability. By default, a single AZ is used, and cross-AZ deployment is available through allowlist access.
    2. Different AZs within the same region may support different model specifications; it is recommended to choose the latest AZ. Cloud products in different regions cannot communicate through private networks and cannot be changed after purchase. It is recommended to choose a region and AZ close to your business data to reduce access delay and improve download speed.
    Deployment policy
    When Cross-AZ deployment is selected, both the equalization policy and balancing policy are supported. Equalization policy: A deployment scheme using Primary AZ + Secondary AZ. If the secondary AZ encounters an issue, the primary AZ can continue to provide services normally.
    Balancing policy: A deployment scheme using Primary AZ + Secondary AZ + Balanced AZ. The primary and secondary AZs are mutually redundant. If any AZ has an issue, the other AZs are not affected and continue to provide services.
    Cluster network
    To ensure the security of the EMR cluster, all cluster nodes will be placed in a VPC. You need to set up a VPC to ensure the correct creation of the EMR cluster.
    Cluster public network
    It can be used for SSH log-in and component webui access from the public network, and the network can be adjusted in the console after the cluster is successfully created. The public network is enabled by default for the master1 node.
    Security group
    The security group functions as a firewall to configure network access control for CVM. If no security group is available, EMR will automatically create one for you. If there is an existing security group in use, you can select it directly. If the number of security groups has reached the upper limit and new ones cannot be created, you can delete some unused security groups. View the security groups currently in use.
    Create a security group: EMR helps users create a security group, enabling ports 22 and 30001, as well as the necessary private network IP ranges.
    Existing EMR security group: Select an already created EMR security group as the security group for the current instance, enabling ports 22 and 30001, as well as the necessary private network IP ranges.
    Remote log-in
    Port 22 is commonly used for remote login. The created security groups will have this port enabled by default. You can close this port based on your business needs, but it is enabled by default.
    High availability (HA)
    HA is enabled by default. Different cluster types and use cases have varying numbers of nodes deployed in HA or non-HA environments. For more details, see Cluster Type.
    Node type
    Select the appropriate model configuration for different node types based on business requirements. For more details, see Business Assessment.
    Note:
    1. Currently, Core nodes, Task nodes, and Router nodes support mounting multiple types of cloud disks (each type can only be selected once) and multiple disks (up to 20).
    2. Local disk models are not supported for deployment on Master and Common nodes. Select a non-local disk model.
    Placement group
    Optional configuration. A placement group is a policy for distributing CVM instances across underlying hardware. For more details, see Placement Group.
    Hive metadatabase
    If the Hive component is selected, Hive Metastore offers two storage options:
    The first is the default cluster option, where Hive metadata is stored in a separately purchased MetaDB for the cluster.
    The second option is to associate an external Hive Metastore, where you can choose to link to EMR-MetaDB or a self-built MySQL database, with metadata stored in the associated database, which will not be destroyed when the cluster is terminated. For more details, see Hive Metadata Management.
    Note: When you select one or more of the following components such as Hue, Ranger, Oozie, Druid, and Superset, the system will automatically purchase a MetaDB for storing metadata of components other than Hive.

    Basic Configuration

    Configuration Item
    Configuration Description
    Project
    Assign the current cluster to different project groups. Note: The project cannot be modified once the cluster is created.
    Cluster name
    Set a cluster name to distinguish between different EMR clusters. The system generates a name randomly, which can be modified. The cluster name needs to be between 6 and 36 characters and can only contain Chinese characters, letters, digits, hyphens (-), and underscores (_).
    Login method
    Currently, EMR provides two methods for logging in to cluster services, nodes, and MetaDB: Custom password setup and key association. SSH keys are used only for quick access through the EMR-UI. The default username is root, while the username for the Superset component’s WebUI quick access is admin.
    Bootstrap action
    Optional configuration. Bootstrap scripts allow you to run custom scripts during cluster creation, enabling you to modify the cluster environment, install third-party software, and use your own data. For more settings, see Bootstrap Actions.
    Tag
    Optional configuration. You can add tags to cluster or node resources during creation to facilitate resource management. A maximum of 5 tags can be added, and tag keys should not be duplicated.

    Configuration confirmation

    Configuration Item
    Configuration Description
    Configuration list
    Confirm if there is any error in the deployment information.
    Auto-renewal
    Automatic renewal Optional. Seven days before the cluster expires, the system will check daily if the user account has sufficient available balance to renew cluster resources set for automatic renewal. Monthly subscription clusters are automatically set for renewal by default, but users can manually deselect this option.
    Agreement
    After the above configurations are completed, click Purchase to proceed with payment. Once the payment is successful, the EMR cluster will start the creation process. In approximately 10 minutes, you can find the created cluster in the EMR console.
    Note
    Pay-as-you-go cluster: Creation starts immediately. Once the cluster is created, its status changes to Running.
    Monthly subscription cluster: An order is generated first, and the cluster creation starts after the payment is completed.
    You can view the instance information for each node in the CVM console. To ensure the proper functioning of the EMR cluster, do not modify the configuration of these instances in the CVM console.

    Next Steps

    After the cluster is successfully created, you can log in and further configure the cluster as needed. See the following documentation for detailed operations:
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support