tencent cloud

Feedback

Notebook Exploration

Last updated: 2024-11-01 16:26:14

    Feature Overview

    WeData has newly launched the Notebook Exploration feature, which supports reading data from Tencent Cloud's Big Data Engine EMR and DLC through Jupyter Notebook. With the help of interactive data analysis, users can perform data exploration and machine learning.
    Currently, Notebook exploration has been launched in the Beijing and Shanghai regions, and is now available for user testing. You can submit a ticket to apply.
    Note:
    During the invite-only testing period, users can try the Notebook Exploration feature for free. After the testing period ends, commercial billing will be implemented.
    
    
    

    Features

    One-click Workspace Creation

    No need to manually install the Python environment and configure environment dependencies. It supports one-click creation of a Notebook workspace, including a complete Jupyter Notebook environment and commonly used dependency packages.

    User and Resource Isolation

    Each user has a dedicated workspace under different projects. The storage and computing resources of each workspace are isolated from one another. Users' tasks and file resources do not interfere with each other.

    Integrated with Big Data Engine Base

    Supports binding with EMR and DLC Big Data Engines. Data can be directly read from the big data storage and computing engines for interactive exploration, algorithm model training, and predictive data analysis.

    Built-in Best Practices Tutorial

    The Notebook workspace comes with out-of-the-box Big Data Tutorial Series and Andrew Ng AI Tutorial Series, allowing users to quickly get started.

    Operation step

    Enter the Data Exploration page

    1. Log in to the Data Development and Governance Platform WeData Console.
    2. click the Project List in the left menu, and find the target project for the Notebook Exploration feature.
    3. After selecting a project, click to enter the Data Development module.
    4. Click Notebook Exploration in the left menu.

    Create Workspace

    Each sub-user can create an independent Notebook Workspace, and workspaces are not interfered with by other users.
    1. Click Create Workspace to enter the workspace configuration page.
    
    
    
    2. Configuration Items
    Attribute Item Name
    Attribute Item Description
    Required
    Basic Info
    Configure the basic information of the Notebook Workspace to create a Notebook Workspace instance
    
    Workspace Name
    Notebook workspace name, supports Chinese, English, numbers, underscores, hyphens, and a length of no more than 32 characters
    Yes
    Description(Optional)
    Notebook workspace description, supports Chinese, English, numbers, special characters, etc., and a length of no more than 255 characters
    No
    Image
    Default image of Jupyter Notebook
    Yes
    Engine
    Supports selecting the EMR or DLC computational storage engine bound to the current project. Once selected, it will pre-connect with the engine, allowing access to Notebook tasks using PySpark
    No
    Network
    When selecting the EMR engine, further network configuration is needed to establish network connectivity, defaulting to the VPC and subnet where the EMR engine is located
    Yes
    Computing Resources
    When selecting the DLC engine, further computational resources need to be selected for executing DLC PySpark tasks.
    Note:
    Only supports DLC Spark job type computational resources.
    Yes
    RoleArn
    When selecting the DLC engine, further selection of RoleArn is needed to authorize access permissions to the COS data storage
    Note:
    RoleArn is the data access policy (CAM role arn) for the DLC engine to access the COS. Users need to configure it in the DLC.
    Yes
    Resource configuration
    Configure the storage and computational resources for the workspace, used for executing Notebook tasks with CFS
    
    Specification Selection
    Supported specifications include:
    2 Cores 4GB Memory / 8GB Storage (Trial Version)
    4 Cores 8GB Memory / 16GB Storage (Advanced Edition)
    8 Cores 16GB Memory / 32GB Storage (Express Version)
    Yes

    Launch Workspace

    1. click Create Now to enter the Notebook workspace launch page.
    2. During the startup process, the PySpark environment will be configured for you, and common Python packages such as numpy, pandas, and scikit-learn will be installed. The installation may take some time, please be patient and wait until the installation is complete.
    3. Upon reaching the following page, the Notebook workspace has successfully launched, and you can begin creating Notebook tasks.
    Note:
    The kernel version only supports Python 3.11.1, please do not select other kernel versions.
    
    
    

    Workspace Management

    Exit Workspace

    click the top left Exit button to exit the current workspace and return to the list page. The exited workspace will automatically stop after ten minutes. Resuming a stopped workspace will restore the development environment and data.
    

    Editing Workspace

    click the Edit button on the list page to modify the configuration information of the current workspace. Configurable items include: space name, description, resource configuration.
    
    
    

    Deleting Workspace

    click the Delete button on the list page to delete the current workspace.

    Practical Tutorial

    The Notebook workspace comes with out-of-the-box Big Data Tutorial Series and Andrew Ng AI Tutorial Series, allowing users to quickly get started.

    Data analysis oriented towards DLC

    This sample Notebook demonstrates how to analyze data in DLC (Data Lake Compute, DLC). The Notebook workspace already has built-in DLC Jupyter Plugins, which can be directly loaded and used. The sample syntax includes running Spark Code, SparkSQL Code, and using SparkML.

    Read EMR data for Model Prediction

    1. This sample Notebook demonstrates how to create EMR-Hive Tables and import local data into EMR-Hive Tables. It then reads data from the EMR-Hive Tables and converts it into a pandas DataFrame for data preparation.
    2. After completing data preparation, you can use the Prophet Time Series Algorithm to train a Predictive Model, followed by Model Accuracy Evaluation and prediction.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support