tencent cloud

Feedback

Notebook Exploration

Last updated: 2025-01-25 19:57:20

    Feature Overview

    WeData has newly launched the Notebook Exploration feature, which supports reading data from Tencent Cloud's Big Data Engine EMR and DLC through Jupyter Notebook. With the help of interactive data analysis, users can perform data exploration and machine learning.
    Currently, Notebook exploration is available on the Chinese site in Beijing, Shanghai, Guangzhou, Singapore, and Silicon Valley regions, as well as on the international site in Singapore and Frankfurt regions. You can contact a Ticket to activate the allowlist for trial use.
    
    
    

    Features

    One-Click workspace creation

    No need to manually install the Python environment and configure environment dependencies. It supports one-click creation of a Notebook workspace, including a complete Jupyter Notebook environment and commonly used dependency packages.

    User and resource isolation

    Each user has a dedicated workspace under different projects. The storage and computing resources of each workspace are isolated from one another. Users' tasks and file resources do not interfere with each other.

    Integrated with big data engine base

    Supports binding with EMR and DLC Big Data Engines. Data can be directly read from the big data storage and computing engines for interactive exploration, algorithm model training, and predictive data analysis.

    Built-In practice tutorial

    The Notebook workspace comes with built-in Big Data tutorials, allowing users to get started quickly and easily.

    Overall Usage Process

    The full process for users to use Notebook in WeData is shown below:
    

    Operation Steps

    Create a Notebook Workspace

    1. log in to WeData console.
    2. click the Project List in the left menu, and find the target project for the Notebook Exploration feature.
    3. After selecting the project, click to enter the Data Analysis > Notebook Exploration module.
    
    
    
    4. Enter the Notebook Exploration list page and click Create workspace.
    
    
    
    5. Enter the workspace configuration page to set basic information and resource configuration information.
    
    
    
    Attribute Item Name
    Attribute Item Description
    Required
    Basic information
    Configure the basic information of the Notebook Workspace to create a Notebook Workspace instance
    
    Workspace Name
    Notebook workspace name, supports Chinese, English, numbers, underscores, hyphens, and a length of no more than 32 characters
    Yes
    Permission Scope
    If "Individual Use Only" is selected, only the current user can access the workspace; if "Project Share" is selected, all members of the project can access the space for collaborative development.
    Yes
    Description(Optional)
    Notebook workspace description, supports Chinese, English, numbers, special characters, etc., and a length of no more than 255 characters
    No
    Engines
    Supports selecting the EMR or DLC computational storage engine bound to the current project. Once selected, it will pre-connect with the engine, allowing access to Notebook tasks using PySpark
    No
    Network
    When selecting the EMR engine, further network configuration is needed to establish network connectivity, defaulting to the VPC and subnet where the EMR engine is located
    Yes
    DLC Data Engine
    When selecting the DLC engine, you need to further choose a DLC data engine bound to the project to execute DLC PySpark tasks.
    Note:
    Only supports DLC Spark job type computational resources.
    Yes
    Machine Learning
    If the DLC data engine you select contains a "machine learning" type resource group, this option will appear and be selected by default.
    If the DLC data engine you select does not contain a "machine learning" type resource group, this option will not appear. To use it, please go to DLC to create one.
    No
    RoleArn
    When selecting the DLC engine, further selection of RoleArn is needed to authorize access permissions to the COS data storage
    Note:
    RoleArn is the data access policy (CAM role arn) for the DLC engine to access Cloud Object Storage (COS), which needs to be configured by the user in DLC.
    Yes
    Advanced configuration
    You can choose to use Mlflow to manage experiments, data, and models in Notebook exploration. This feature currently requires allowlist access.
    
    MLflow service
    After checking, the creation of experiments and machine learning using MLflow functions in Notebook tasks will be reported to the MLflow service. You can later view them in Machine Learning - Experiment Management and Model Management.
    No
    Resource Configuration
    Configure the storage and computational resources for the workspace, used for executing Notebook tasks with CFS
    
    Specification Selection
    Supported specifications include:
    2 Cores 4GB Memory / 8GB Storage (Trial Version)
    4 Cores 8GB Memory / 16GB Storage (Advanced Edition)
    8 Cores 16GB Memory / 32GB Storage (Express Version)
    Yes

    Start/Stop Workspace Management

    Launch workspace

    1. click Create Now to enter the Notebook workspace launch page.
    2. During the startup process, the PySpark environment will be configured for you, and common Python packages such as numpy, pandas, and scikit-learn will be installed. The installation may take some time, please be patient and wait until the installation is complete.
    3. Upon reaching the following page, the Notebook workspace has successfully launched, and you can begin creating Notebook tasks.
    
    
    

    Log out of the workspace

    1. Click the log out button at the top left to exit the current workspace and return to the list page.
    The workspace will automatically stop ten minutes after exiting. Restarting a stopped workspace will restore the development environment and data.
    

    Editing workspace

    click the Edit button on the list page to modify the configuration information of the current workspace. Configurable items include: space name, description, resource configuration.
    

    Deleting workspace

    click the Delete button on the list page to delete the current workspace.

    Create and Run a Notebook File

    1. Create a Notebook File
    You can create folders and Notebook files in the left resource manager.
    Note:
    Notebook files need to end with (.ipynb).
    2. Select a running kernel
    Enter the Notebook file, click the top left select kernel, and choose a kernel from the dropdown options.
    Note:
    In Jupyter Notebook, the kernel is the backend application that executes code, handles the execution of code cells, returns calculation results, and interacts with the user interface.
    WeData Notebook currently supports two types of kernels:
    Python Environment: The default IPython kernel in Jupyter Notebook, supports Python code execution.
    DLC resource group: A remote kernel provided by Tencent Cloud Big Data, allowing Python tasks to be submitted to the DLC resource group for execution.
    
    If you select the DLC resource group, choose a machine learning resource group instance from the DLC data engine in the next level of options.
    
    3. Run a Notebook File
    Click Run to generate a Notebook kernel instance and start running the code. The results will be displayed below each cell.

    Periodic Scheduling Of Notebook Tasks

    Create a notebook task

    1. Enter the project and open the menu Data Research and Development > Offline Development.
    
    
    
    2. In the left directory, click Create Workflow and configure the workflow properties, including the workflow name, folder, etc.
    
    
    
    
    
    
    3. Create a task in the workflow, with the task type as General-Notebook Exploration. Configure the task's basic attributes on the Create New Task page, including the task name, task type, etc.
    
    
    

    Configure and run a notebook task

    On the Notebook task configuration page, refer to a file in a Notebook workspace.
    
    1. Select a Notebook workspace
    You can drop down to select all Notebook workspaces in the current project.
    2. Select a Notebook file
    You can drop down to select all files in the current Notebook workspace. Note: If the current user does not have permission for the Notebook workspace, they cannot access it for operations.
    3. Preview code
    After selecting a Notebook file, you can preview the specific content of the Notebook file below.
    4. Run a notebook task
    In the upper right corner, select the scheduling resource group and click Run to run the current Notebook file online. You can view the running logs, running code, and execution results below.

    Configuring scheduling

    1. Click on the right side Scheduling Configuration to set the scheduling cycle for the current Notebook task. For example, the figure below sets it to run every 5 minutes.
    
    
    
    2. Click the Submit button to submit the current task to periodic scheduling.
    
    
    

    Task ops

    1. Go to Data Research and Development > Ops Center.
    
    2. Task Ops
    Click Task Ops to see the workflows submitted to the scheduler and the task nodes within the workflows.
    
    3. Instance Ops
    Click Instance Ops to view each period instance generated by the workflow.
    
    4. Enter the instance detail to view the running logs and results.

    Practical Tutorial

    The Notebook workspace comes with built-in Big Data tutorials, allowing users to get started quickly and easily.

    Tutorial 1: Data Analysis Using DLC Jupyter Plug-In

    This sample Notebook demonstrates how to analyze data in Data Lake Compute (DLC). The Notebook workspace has the DLC Jupyter Plugin built-in, which can be loaded directly. The example syntax includes running Spark code, SparkSQL code, and using SparkML.
    Note:
    Using this tutorial, the Notebook workspace needs to bind the DLC engine and uncheck "Use Machine Learning Resource Group". The kernel should be set to Python Environment, and WeData Notebook will interact with DLC using the Jupyter Plugin.
    
    
    

    Tutorial 2 Reading EMR Data For Model Prediction

    1. This sample Notebook demonstrates how to create EMR-Hive Tables and import local data into EMR-Hive Tables. It then reads data from the EMR-Hive Tables and converts it into a pandas DataFrame for data preparation.
    2. After completing data preparation, you can use the Prophet Time Series Algorithm to train a Predictive Model, followed by Model Accuracy Evaluation and prediction.
    Note:
    Using this tutorial, the Notebook workspace needs to bind the EMR engine.
    
    
    

    Tutorial 3 Creating Machine Learning Experiments and Managing Experiments

    This sample Notebook demonstrates how to use MLflow to create experiments, record data, and manage models. The experiment is based on the Iris dataset, using the KNeighborsClassifier algorithm for model training, and uses MLflow to record and trace experimental data, ultimately producing an optimal model for data classification and prediction.
    Note:
    Using this tutorial, the Notebook workspace needs to bind the DLC engine and check "Use Machine Learning Resource Group". The kernel should be set to DLC Resource Group, and WeData Notebook will remotely submit the Notebook file to DLC for execution.
    MLflow is an open-source machine learning platform that provides end-to-end support for the data science lifecycle, including experiment management, model versioning, model deployment, and model monitoring. If the current workspace has MLflow service enabled, you can record each experiment's parameters, indicators, and results by calling MLflow's related functions in the experiment, and view them in WeData Machine Learning Module > Experiment Management and Model Management, thus achieving experiment traceability and reproducibility.
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support