tencent cloud

Data Lake Compute

Product Introduction

Purchase Guide

Billing Overview

Payment Overdue

Configuration Adjustment Fees

Getting Started

Complete Process for New User Activation

DLC Data Import Guide

Quick Start with Data Analytics in Data Lake Compute

Quick Start with Permission Management in Data Lake Compute

Quick Start with Partition Table

Enabling Data Optimization

Cross-Source Analysis of EMR Hive Data

Standard Engine Configuration Guide

Operation Guide

Console Operation Introduction

Data Development and Exploration

Data Exploration

Data Query Task

Querying Partition Table

Querying JSON Data

Querying Data from Other Sources

Querying Script Parameters

Obtaining Task Results

Query Script Analysis

Data Job

Configuring Data Access Policy

Creating Data Job

Managing Data Job

PySpark Dependency Package Management

Resource Management

Engine Management

Network Connection Configuration

Storage Configuration

Managed Storage Configuration

Binding a Metadata Acceleration Bucket

Metadata Management

Data Catalogs and DMC

Data Table Management

Data View Management

Function Management

Partition Field Policy

Ops Management

Historical Task Instances

Historical task(Old version)

Session Management

Insight Management

System Management

User and Permission Management

Permission Overview

User and Work Group

Sub-Account Permission Management

Monitoring and Alarms

Data Engine Monitoring

Data Job Monitoring

Access Point Gateway Engine Monitoring

Monitoring Alarm Configuration

Development Guide

SparkJar Job Development Guide

PySpark Job Development Guide

Query Performance Optimization Guide

UDF Function Development Guide

Materialized View

System Restraints

Metadata Information

Client Access

JDBC Access

DLC JDBC Access

Hive JDBC Access

Presto JDBC Access

Configuring Public Access for Standard Engine

TDLC Command Line Interface Tool Access

Third-party Software Linkage

Practical Tutorial

Table Creation Practice

Using Apache Airflow to Schedule DLC Engine to Submit Tasks

Direct Query of DLC Internal Storage with StarRocks

DLC Native Table

DLC Source Table Core Capabilities

DLC Source Table Operation Configuration

DLC Source Table Lake Ingestion Practice

DLC Source Table FAQs

SQL Statement

SuperSQL Statement

Overview of SuperSQL Statement

Unified Statement

Common Data Types

DDL Statement

CREATE DATABASE

DESCRIBE DATABASE

ALTER DATABASE

ALTER DATABASE SET DBPROPERTIES

ALTER DATABASE SET LOCATION

REPLACE TABLE AS SELECT

SHOW CREATE TABLE

SHOW TBLPROPERTIES

SHOW COLUMNS IN TABLE

ALTER TABLE

ALTER TABLE ADD COLUMNS

ALTER TABLE ADD COLUMN AFTER/FIRST

ALTER TABLE DROP COLUMN

ALTER TABLE ADD PARTATION

SHOW PARTITIONS

ALTER TABLE DROP PARTITION

ALTER TABLE ADD PARTITION FIELD

ALTER TABLE DROP PARTITION FIELD

ALTER TABLE ... RENAME COLUMN

ALTER TABLE SET TBLPROPERTIES

ALTER TABLE SET LOCATION

ALTER TABLE ... WRITE ORDERED BY

ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION

ALTER TABLE ... SET IDENTIFIER FIELDS

ALTER TABLE ... DROP IDENTIFIER FIELDS

MSCK REPAIR TABLE

SHOW CREATE VIEW

SHOW COLUMNS IN VIEW

ALTER VIEW

ALTER VIEW RENAME TO

ALTER VIEW SET TBLPROPERTIES

CREATE FUNCTION

DML Statement

INSERT STATEMENT

INSERT OVERWRITE

DELETE STATEMENT

DQL Statement

SELECT STATEMENT

Iceberg Table Statement

Differences in Statement Between Iceberg External Tables and Native Tables

Materialized View Statement

SQL Implicit Conversion

Functions

Unified Functions

Overview of Unified Functions

Binary Functions

Bitwise Functions

Collection Functions

Date and Time Functions

Mathematical Functions

String Functions

Collection Functions

Window Functions

Other Functions

Presto Built-in Functions

Comparison of Hive Functions

Overview of Standard Spark Statement

Overview of Standard Presto Statement

API Documentation

Making API Requests

Request Structure

Data Table APIs

CreateInternalTable

DescribeLakeFsDirSummary

DescribeLakeFsInfo

GenerateCreateMangedTableSql

Task APIs

Metadata APIs

AlterDMSDatabase

CreateDMSDatabase

DescribeDMSDatabase

DescribeTablesName

DropDMSDatabase

DescribeForbiddenTablePro

DescribeDLCCatalogAccess

GrantDLCCatalogAccess

RevokeDLCCatalogAccess

DescribeDMSDatabaseList

DescribeDLCTableList

DescribeDLCTable

Service Configuration APIs

CreateCHDFSBindingProduct

DeleteCHDFSBindingProduct

DescribeOtherCHDFSBindingList

CreateStoreLocation

DescribeStoreLocation

ModifyDataEngineDescription

RollbackDataEngineImage

SwitchDataEngine

SwitchDataEngineImage

UpgradeDataEngineImage

DeleteThirdPartyAccessUser

DescribeDataEngineImageVersions

DescribeSubUserAccessPolicy

DescribeThirdPartyAccessUser

RegisterThirdPartyAccessUser

RestartDataEngine

UpdateUserDataEngineConfig

UpdateDataEngineConfig

Permission Management APIs

Database APIs

ModifyAdvancedStoreLocation

ModifyGovernEventRule

DescribeAdvancedStoreLocation

Data Source Connection APIs

CheckDataEngineImageCanBeRollback

CheckDataEngineImageCanBeUpgrade

DescribeDataEnginePythonSparkImages

Data Optimization APIs

GetOptimizerPolicy

Data Engine APIs

CreateDataEngine

DescribeDataEnginesScaleDetail

DeleteDataEngine

RenewDataEngine

SuspendResumeDataEngine

UpdateDataEngine

DescribeUpdatableDataEngines

DescribeDataEngine

DescribeUserDataEngineConfig

CheckDataEngineConfigPairsValidity

General Reference

Quotas and limits

Operation Guide on Connecting Third-Party Software to DLC

Connecting CBoard to DLC

DLC Policy

Data Privacy And Security Agreement

Service Level Agreement

DocumentationData Lake ComputeGetting StartedCross-Source Analysis of EMR Hive Data

Cross-Source Analysis of EMR Hive Data

Last updated: 2024-07-17 15:27:21

Cross-Source Analysis of EMR Hive Data

Last updated: 2024-07-17 15:27:21

Data Lake Compute allows you to configure an EMR Hive data source for multi-source federated data analysis.
Preparations
Get the EMR Hive address.
Use an account with the permission to create data catalogs. For more information on permissions, see Permission Overview.
Creating an EMR Hive data source
1. Log in to the Data Lake Compute console and select the service region.
2. Select Data Explore on the left sidebar, click + in the Database & table column, and select Create data catalog.
﻿
﻿
3. Select EMR Hive (HDFS) for Connection type and select the target EMR instance. The VPC information will be populated by default after the instance is selected. EMR versions supported by EMR Hive are 2.3.5, 2.3.7, 3.1.1, and 3.1.2.
Note: 
Relevant permissions are required for you to select the EMR Hive instance.
﻿
4. Select the Run cluster. Currently, you can only select a private data engine of Presto. If there is no engine, create one on the Data engine page. For more information on the purchase process, see Purchasing Private Data Engine.
Note: 
The IP range of the selected data engine cannot be the same as that of the EMR instance; otherwise, a network conflict will occur, and you cannot query or analyze data.
5. Click Confirm.
Querying the EMR Hive data
After the data catalog is created, you can switch to it from the Data catalog menu on the Data Explore page.
   
﻿

At this point, you can query and analyze the data catalog with SQL statements.
Select the data engine bound when the data catalog is created and click Run to get the query result.
Note: 
You can only query the data catalog with its bound data engine. To change the bound engine, click the set icon next to the data catalog.
﻿
﻿
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

No

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support