Feature Overview
WeData has newly launched the Notebook Exploration feature, which supports reading data from Tencent Cloud's Big Data Engine EMR and DLC through Jupyter Notebook. With the help of interactive data analysis, users can perform data exploration and machine learning.
Currently, Notebook exploration has been launched in the Beijing and Shanghai regions, and is now available for user testing. You can submit a ticket to apply.
Note:
During the invite-only testing period, users can try the Notebook Exploration feature for free. After the testing period ends, commercial billing will be implemented.
Features
One-click Workspace Creation
No need to manually install the Python environment and configure environment dependencies. It supports one-click creation of a Notebook workspace, including a complete Jupyter Notebook environment and commonly used dependency packages.
User and Resource Isolation
Each user has a dedicated workspace under different projects. The storage and computing resources of each workspace are isolated from one another. Users' tasks and file resources do not interfere with each other.
Integrated with Big Data Engine Base
Supports binding with EMR and DLC Big Data Engines. Data can be directly read from the big data storage and computing engines for interactive exploration, algorithm model training, and predictive data analysis.
Built-in Best Practices Tutorial
The Notebook workspace comes with out-of-the-box Big Data Tutorial Series and Andrew Ng AI Tutorial Series, allowing users to quickly get started.
Operation step
Enter the Data Exploration page
1. Log in to the Data Development and Governance Platform WeData Console.
2. click the Project List in the left menu, and find the target project for the Notebook Exploration feature.
3. After selecting a project, click to enter the Data Development module.
4. Click Notebook Exploration in the left menu.
Create Workspace
Each sub-user can create an independent Notebook Workspace, and workspaces are not interfered with by other users.
1. Click Create Workspace to enter the workspace configuration page.
2. Configuration Items
|
Basic Info | Configure the basic information of the Notebook Workspace to create a Notebook Workspace instance |
|
Workspace Name | Notebook workspace name, supports Chinese, English, numbers, underscores, hyphens, and a length of no more than 32 characters | Yes |
Description(Optional) | Notebook workspace description, supports Chinese, English, numbers, special characters, etc., and a length of no more than 255 characters | No |
Image | Default image of Jupyter Notebook | Yes |
Engine | Supports selecting the EMR or DLC computational storage engine bound to the current project. Once selected, it will pre-connect with the engine, allowing access to Notebook tasks using PySpark | No |
Network | When selecting the EMR engine, further network configuration is needed to establish network connectivity, defaulting to the VPC and subnet where the EMR engine is located | Yes |
Computing Resources | When selecting the DLC engine, further computational resources need to be selected for executing DLC PySpark tasks. Note: Only supports DLC Spark job type computational resources. | Yes |
RoleArn | When selecting the DLC engine, further selection of RoleArn is needed to authorize access permissions to the COS data storage Note: RoleArn is the data access policy (CAM role arn) for the DLC engine to access the COS. Users need to configure it in the DLC. | Yes |
Resource configuration | Configure the storage and computational resources for the workspace, used for executing Notebook tasks with CFS |
|
Specification Selection | Supported specifications include: 2 Cores 4GB Memory / 8GB Storage (Trial Version) 4 Cores 8GB Memory / 16GB Storage (Advanced Edition) 8 Cores 16GB Memory / 32GB Storage (Express Version) | Yes |
Launch Workspace
1. click Create Now to enter the Notebook workspace launch page.
2. During the startup process, the PySpark environment will be configured for you, and common Python packages such as numpy, pandas, and scikit-learn will be installed. The installation may take some time, please be patient and wait until the installation is complete.
3. Upon reaching the following page, the Notebook workspace has successfully launched, and you can begin creating Notebook tasks.
Note:
The kernel version only supports Python 3.11.1, please do not select other kernel versions.
Workspace Management
Exit Workspace
click the top left Exit button to exit the current workspace and return to the list page. The exited workspace will automatically stop after ten minutes. Resuming a stopped workspace will restore the development environment and data.
Editing Workspace
click the Edit button on the list page to modify the configuration information of the current workspace. Configurable items include: space name, description, resource configuration.
Deleting Workspace
click the Delete button on the list page to delete the current workspace.
Practical Tutorial
The Notebook workspace comes with out-of-the-box Big Data Tutorial Series and Andrew Ng AI Tutorial Series, allowing users to quickly get started.
Data analysis oriented towards DLC
This sample Notebook demonstrates how to analyze data in DLC (Data Lake Compute, DLC). The Notebook workspace already has built-in DLC Jupyter Plugins, which can be directly loaded and used. The sample syntax includes running Spark Code, SparkSQL Code, and using SparkML.
Read EMR data for Model Prediction
1. This sample Notebook demonstrates how to create EMR-Hive Tables and import local data into EMR-Hive Tables. It then reads data from the EMR-Hive Tables and converts it into a pandas DataFrame for data preparation.
2. After completing data preparation, you can use the Prophet Time Series Algorithm to train a Predictive Model, followed by Model Accuracy Evaluation and prediction.
Was this page helpful?