hadoop-cos-{hadoop.version}-{version}.jar
和 cos_api-bundle-{version}.jar
拷贝到 $HADOOP_HOME/share/hadoop/tools/lib
下。$HADOOP_HOME/etc/hadoop
目录,编辑 hadoop-env.sh 文件,增加以下内容,将 cosn 相关 jar 包加入 Hadoop 环境变量:for f in $HADOOP_HOME/share/hadoop/tools/lib/*.jar; doif [ "$HADOOP_CLASSPATH" ]; thenexport HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$felseexport HADOOP_CLASSPATH=$ffidone
属性键 | 说明 | 默认值 | 必填项 |
fs.cosn.userinfo.secretId/secretKey | 无 | 是 | |
fs.cosn.credentials.provider | 配置 SecretId 和 SecretKey 的获取方式。当前支持如下获取方式: 1. org.apache.hadoop.fs.auth.SessionCredentialProvider:从请求 URI 中获取 secret id 和 secret key。其格式为: cosn://{secretId}:{secretKey}@examplebucket-1250000000/ 2. org.apache.hadoop.fs.auth.SimpleCredentialProvider:从 core-site.xml 配置文件中读取 fs.cosn.userinfo.secretId 和 fs.cosn.userinfo.secretKey 来获取 SecretId 和 SecretKey。 3. org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider:从系统环境变量 COS_SECRET_ID 和 COS_SECRET_KEY 中获取。 4. org.apache.hadoop.fs.auth.SessionTokenCredentialProvider:使用 临时密钥形式 访问。 5. org.apache.hadoop.fs.auth.CVMInstanceCredentialsProvider:利用腾讯云云服务器(CVM)绑定的角色,获取访问 COS 的临时密钥。 6. org.apache.hadoop.fs.auth.CPMInstanceCredentialsProvider:利用腾讯云黑石物理机(CPM)绑定的角色,获取访问 COS 的临时密钥。 7. org.apache.hadoop.fs.auth.EMRInstanceCredentialsProvider:利用腾讯云 EMR 实例绑定的角色,获取访问 COS 的临时密钥。 8. org.apache.hadoop.fs.auth.RangerCredentialsProvider 使用 ranger 进行获取密钥。 | 如果不指定该配置项,默认会按照 以下顺序读取: 1. org.apache.hadoop.fs.auth.SessionCredentialProvider 2. org.apache.hadoop.fs.auth.SimpleCredentialProvider 3. org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider 4. org.apache.hadoop.fs.auth.SessionTokenCredentialProvider 5. org.apache.hadoop.fs.auth.CVMInstanceCredentialsProvider 6. org.apache.hadoop.fs.auth.CPMInstanceCredentialsProvider 7. org.apache.hadoop.fs.auth.EMRInstanceCredentialsProvider | 否 |
fs.cosn.useHttps | 配置是否使用 HTTPS 作为与 COS 后端的传输协议。 | true | 否 |
fs.cosn.impl | cosn 对 FileSystem 的实现类,固定为 org.apache.hadoop.fs.CosFileSystem。 | 无 | 是 |
fs.AbstractFileSystem.cosn.impl | cosn 对 AbstractFileSystem 的实现类,固定为 org.apache.hadoop.fs.CosN。 | 无 | 是 |
fs.cosn.bucket.region | 例如:ap-beijing、ap-guangzhou 等。兼容原有配置:fs.cosn.userinfo.region。 | 无 | 是 |
fs.cosn.bucket.endpoint_suffix | 指定要连接的 COS endpoint,该项为非必填项目。对于公有云 COS 用户而言,只需要正确填写上述的 region 配置即可。兼容原有配置:fs.cosn.userinfo.endpoint_suffix。配置该项时请删除 fs.cosn.bucket.region 配置项 endpoint 才能生效。 | 无 | 否 |
fs.cosn.tmp.dir | 请设置一个实际存在的本地目录,运行过程中产生的临时文件会暂时放于此处。 | /tmp/hadoop_cos | 否 |
fs.cosn.upload.part.size | CosN 文件系统每个 block 的大小,也是分块上传的每个 part size 的大小。由于 COS 的分块上传最多只能支持10000块,因此需要预估最大可能使用到的单文件大小。 例如,part size 为8MB时,最大能够支持78GB的单文件上传。 part size 最大可以支持到2GB,即单文件最大可支持19TB。 | 8388608(8MB) | 否 |
fs.cosn.upload.buffer | CosN 文件系统上传时依赖的缓冲区类型。当前支持三种类型的缓冲区:非直接内存缓冲区(non_direct_memory), 直接内存缓冲区(direct_memory),磁盘映射缓冲区(mapped_disk)。非直接内存缓冲 区使用的是 JVM 堆内存,直接内存缓冲区使用的是堆外内存,而磁盘映射缓冲区则是基于内存文件映射得到的缓冲区。 | mapped_disk | 否 |
fs.cosn.upload.buffer.size | CosN 文件系统上传时依赖的缓冲区大小,如果指定为-1,则表示不限制缓冲区。若不限制缓冲区大小,则缓冲区的类型必须为 mapped_disk。如果指定大小大于0,则要求该值至少大于等于一个 block 的大小。兼容原有配置 fs.cosn.buffer.size。 | -1 | 否 |
fs.cosn.block.size | CosN 文件系统 block size。 | 134217728(128MB) | 否 |
fs.cosn.upload_thread_pool | 文件流式上传到 COS 时,并发上传的线程数目。 | 10 | 否 |
fs.cosn.copy_thread_pool | 目录拷贝操作时,可用于并发拷贝和删除文件的线程数目。 | 3 | 否 |
fs.cosn.read.ahead.block.size | 预读块的大小。 | 1048576(1MB) | 否 |
fs.cosn.read.ahead.queue.size | 预读队列的长度。 | 8 | 否 |
fs.cosn.maxRetries | 访问 COS 出现错误时,最多重试的次数。 | 200 | 否 |
fs.cosn.retry.interval.seconds | 每次重试的时间间隔。 | 3 | 否 |
fs.cosn.server-side-encryption.algorithm | 配置 COS 服务端加密算法,支持 SSE-C 和 SSE-COS,默认为空,不加密。 | 无 | 否 |
fs.cosn.server-side-encryption.key | 当开启 COS 的 SSE-C 服务端加密算法时,必须配置 SSE-C 的密钥,密钥格式为 base64 编码的 AES-256 密钥,默认为空,不加密。 | 无 | 否 |
fs.cosn.client-side-encryption.enabled | 是否开启客户端加密,默认不开启。开启后必须配置客户端加密的公钥和私钥。此时无法使用 append、truncate 接口。 | false | 否 |
fs.cosn.client-side-encryption.public.key.path | 客户端加密公钥文件的绝对路径 | 无 | 否 |
fs.cosn.client-side-encryption.private.key.path | 客户端加密私钥文件的绝对路径 | 无 | 否 |
fs.cosn.crc64.checksum.enabled | 是否开启 CRC64 校验。默认不开启,此时无法使用 hadoop fs -checksum 命令获取文件的 CRC64 校验值。 | false | 否 |
fs.cosn.crc32c.checksum.enabled | 是否开启 CRC32C 校验。默认不开启,此时无法使用 hadoop fs -checksum 命令获取文件的 CRC32C 校验值,只能开启一种校验方式:crc32c 或 crc64。 | false | 否 |
fs.cosn.traffic.limit | 上传带宽的控制选项,819200 - 838860800 bits/s,默认值为-1,默认表示不限制。 | 无 | 否 |
$HADOOP_HOME/etc/hadoop/core-site.xml
,增加 COS 相关用户和实现类信息,例如:<configuration><property><name>fs.cosn.credentials.provider</name><value>org.apache.hadoop.fs.auth.SimpleCredentialProvider</value><description>This option allows the user to specify how to get the credentials.Comma-separated class names of credential provider classes which implementcom.qcloud.cos.auth.COSCredentialsProvider:1.org.apache.hadoop.fs.auth.SessionCredentialProvider: Obtain the secret id and secret key from the URI: cosn://secretId:secretKey@examplebucket-1250000000/;2.org.apache.hadoop.fs.auth.SimpleCredentialProvider: Obtain the secret id and secret keyfrom fs.cosn.userinfo.secretId and fs.cosn.userinfo.secretKey in core-site.xml;3.org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider: Obtain the secret id and secret keyfrom system environment variables named COS_SECRET_ID and COS_SECRET_KEY.If unspecified, the default order of credential providers is:1. org.apache.hadoop.fs.auth.SessionCredentialProvider2. org.apache.hadoop.fs.auth.SimpleCredentialProvider3. org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider4. org.apache.hadoop.fs.auth.SessionTokenCredentialProvider5. org.apache.hadoop.fs.auth.CVMInstanceCredentialsProvider6. org.apache.hadoop.fs.auth.CPMInstanceCredentialsProvider7. org.apache.hadoop.fs.auth.EMRInstanceCredentialsProvider</description></property><property><name>fs.cosn.userinfo.secretId</name><value>xxxxxxxxxxxxxxxxxxxxxxxxx</value><description>Tencent Cloud Secret Id</description></property><property><name>fs.cosn.userinfo.secretKey</name><value>xxxxxxxxxxxxxxxxxxxxxxxx</value><description>Tencent Cloud Secret Key</description></property><property><name>fs.cosn.bucket.region</name><value>ap-xxx</value><description>The region where the bucket is located.</description></property><property><name>fs.cosn.bucket.endpoint_suffix</name><value>cos.ap-xxx.myqcloud.com</value><description>COS endpoint to connect to.For public cloud users, it is recommended not to set this option, and only the correct area field is required.</description></property><property><name>fs.cosn.impl</name><value>org.apache.hadoop.fs.CosFileSystem</value><description>The implementation class of the CosN Filesystem.</description></property><property><name>fs.AbstractFileSystem.cosn.impl</name><value>org.apache.hadoop.fs.CosN</value><description>The implementation class of the CosN AbstractFileSystem.</description></property><property><name>fs.cosn.tmp.dir</name><value>/tmp/hadoop_cos</value><description>Temporary files will be placed here.</description></property><property><name>fs.cosn.upload.buffer</name><value>mapped_disk</value><description>The type of upload buffer. Available values: non_direct_memory, direct_memory, mapped_disk</description></property><property><name>fs.cosn.upload.buffer.size</name><value>134217728</value><description>The total size of the upload buffer pool. -1 means unlimited.</description></property><property><name>fs.cosn.upload.part.size</name><value>8388608</value><description>Block size to use cosn filesysten, which is the part size for MultipartUpload.Considering the COS supports up to 10000 blocks, user should estimate the maximum size of a single file.For example, 8MB part size can allow writing a 78GB single file.</description></property><property><name>fs.cosn.maxRetries</name><value>3</value><description>The maximum number of retries for reading or writing files toCOS, before we signal failure to the application.</description></property><property><name>fs.cosn.retry.interval.seconds</name><value>3</value><description>The number of seconds to sleep between each COS retry.</description></property><property><name>fs.cosn.server-side-encryption.algorithm</name><value></value><description>The server side encryption algorithm.</description></property><property><name>fs.cosn.server-side-encryption.key</name><value></value><description>The SSE-C server side encryption key.</description></property><property><name>fs.cosn.client-side-encryption.enabled</name><value></value><description>Enable or disable the client encryption function</description></property><property><name>fs.cosn.client-side-encryption.public.key.path</name><value>/xxx/xxx.key</value><description>The direct path to the public key</description></property><property><name>fs.cosn.client-side-encryption.private.key.path</name><value>/xxx/xxx.key</value><description>The direct path to the private key</description></property></configuration>
<property><name>fs.defaultFS</name><value>cosn://examplebucket-1250000000</value><description>This option is not advice to config, this only used for some special test cases.</description></property>
$HADOOP_HOME/etc/hadoop/core-site.xml
文件中,增加以下配置来进行实现 SSE-COS 加密。<property><name>fs.cosn.server-side-encryption.algorithm</name><value>SSE-COS</value><description>The server side encryption algorithm.</description></property>
$HADOOP_HOME/etc/hadoop/core-site.xml
文件中,增加以下配置来进行实现 SSE-C 加密。<property><name>fs.cosn.server-side-encryption.algorithm</name><value>SSE-C</value><description>The server side encryption algorithm.</description></property><property><name>fs.cosn.server-side-encryption.key</name><value>MDEyMzQ1Njc4OUFCQ0RFRjAxMjM0NTY3ODlBQkNERUY=</value> #用户需要自行配置 SSE-C 的密钥,密钥格式为 base64 编码的 AES-256 密钥。<description>The SSE-C server side encryption key.</description></property>
hadoop fs -cp
命令,会丢失加密信息。$HADOOP_HOME/etc/hadoop/core-site.xml
文件中,增加以下配置来进行实现 SSE-COS 加密。<property><name>fs.cosn.client-side-encryption.enabled</name><value>true</value><description>Enable or disable the client encryption function</description></property><property><name>fs.cosn.client-side-encryption.public.key.path</name><value>/xxx/xxx.key</value><description>The direct path to the public key</description></property><property><name>fs.cosn.client-side-encryption.private.key.path</name><value>/xxx/xxx.key</value><description>The direct path to the private key</description></property>
import java.io.FileOutputStream;import java.io.IOException;import java.security.KeyPair;import java.security.KeyPairGenerator;import java.security.NoSuchAlgorithmException;import java.security.PrivateKey;import java.security.PublicKey;import java.security.SecureRandom;import java.security.spec.PKCS8EncodedKeySpec;import java.security.spec.X509EncodedKeySpec;// 使用非对称秘钥 RSA 加密每次生成的随机对称秘钥public class BuildKey {private static final SecureRandom srand = new SecureRandom();private static void buildAndSaveAsymKeyPair(String pubKeyPath, String priKeyPath) throws IOException, NoSuchAlgorithmException {KeyPairGenerator keyGenerator = KeyPairGenerator.getInstance("RSA");keyGenerator.initialize(1024, srand);KeyPair keyPair = keyGenerator.generateKeyPair();PrivateKey privateKey = keyPair.getPrivate();PublicKey publicKey = keyPair.getPublic();X509EncodedKeySpec x509EncodedKeySpec = new X509EncodedKeySpec(publicKey.getEncoded());FileOutputStream fos = new FileOutputStream(pubKeyPath);fos.write(x509EncodedKeySpec.getEncoded());fos.close();PKCS8EncodedKeySpec pkcs8EncodedKeySpec = new PKCS8EncodedKeySpec(privateKey.getEncoded());fos = new FileOutputStream(priKeyPath);fos.write(pkcs8EncodedKeySpec.getEncoded());fos.close();}public static void main(String[] args) throws Exception {String pubKeyPath = "pub.key";String priKeyPath = "pri.key";buildAndSaveAsymKeyPair(pubKeyPath, priKeyPath);}}
hadoop fs -ls -R cosn://<BucketName-APPID>/<路径>
,或 hadoop fs -ls -R /<路径>
(需要配置 fs.defaultFS
选项为 cosn://BucketName-APPID
),下例中以名称为 examplebucket-1250000000 的 bucket 为例,可在其后面加上具体路径。hadoop fs -ls -R cosn://examplebucket-1250000000/-rw-rw-rw- 1 root root 1087 2018-06-11 07:49 cosn://examplebucket-1250000000/LICENSEdrwxrwxrwx - root root 0 1970-01-01 00:00 cosn://examplebucket-1250000000/hdfsdrwxrwxrwx - root root 0 1970-01-01 00:00 cosn://examplebucket-1250000000/hdfs/2018-rw-rw-rw- 1 root root 1087 2018-06-12 03:26 cosn://examplebucket-1250000000/hdfs/2018/LICENSE-rw-rw-rw- 1 root root 2386 2018-06-12 03:26 cosn://examplebucket-1250000000/hdfs/2018/ReadMedrwxrwxrwx - root root 0 1970-01-01 00:00 cosn://examplebucket-1250000000/hdfs/test-rw-rw-rw- 1 root root 1087 2018-06-11 07:32 cosn://examplebucket-1250000000/hdfs/test/LICENSE-rw-rw-rw- 1 root root 2386 2018-06-11 07:29 cosn://examplebucket-1250000000/hdfs/test/ReadMe
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount cosn://example/mr/input cosn://example/mr/output3
File System CountersCOSN: Number of bytes read=72COSN: Number of bytes written=40COSN: Number of read operations=0COSN: Number of large read operations=0COSN: Number of write operations=0FILE: Number of bytes read=547350FILE: Number of bytes written=1155616FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=0HDFS: Number of bytes written=0HDFS: Number of read operations=0HDFS: Number of large read operations=0HDFS: Number of write operations=0Map-Reduce FrameworkMap input records=5Map output records=7Map output bytes=59Map output materialized bytes=70Input split bytes=99Combine input records=7Combine output records=6Reduce input groups=6Reduce shuffle bytes=70Reduce input records=6Reduce output records=6Spilled Records=12Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=0Total committed heap usage (bytes)=653262848Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=36File Output Format CountersBytes Written=40
package com.qcloud.chdfs.demo;import org.apache.commons.io.IOUtils;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FSDataOutputStream;import org.apache.hadoop.fs.FileChecksum;import org.apache.hadoop.fs.FileStatus;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import java.io.IOException;import java.net.URI;import java.nio.ByteBuffer;public class Demo {private static FileSystem initFS() throws IOException {Configuration conf = new Configuration();// COSN 的配置项可参见 https://www.tencentcloud.com/document/product/436/6884?from_cn_redirect=1#hadoop-.E9.85.8D.E7.BD.AE// 以下配置是必填项conf.set("fs.cosn.impl", "org.apache.hadoop.fs.CosFileSystem");conf.set("fs.AbstractFileSystem.cosn.impl", "org.apache.hadoop.fs.CosN");conf.set("fs.cosn.tmp.dir", "/tmp/hadoop_cos");conf.set("fs.cosn.bucket.region", "ap-guangzhou");conf.set("fs.cosn.userinfo.secretId", "AKXXXXXXXXXXXXXXXXX");conf.set("fs.cosn.userinfo.secretKey", "XXXXXXXXXXXXXXXXXX");conf.set("fs.ofs.user.appid", "XXXXXXXXXXX");// 其他配置参考官网文档 https://www.tencentcloud.com/document/product/436/6884?from_cn_redirect=1#hadoop-.E9.85.8D.E7.BD.AE// 是否开启 CRC64 校验。默认不开启,此时无法使用 hadoop fs -checksum 命令获取文件的 CRC64 校验值conf.set("fs.cosn.crc64.checksum.enabled", "true");String cosnUrl = "cosn://f4mxxxxxxxx-125xxxxxxx";return FileSystem.get(URI.create(cosnUrl), conf);}private static void mkdir(FileSystem fs, Path filePath) throws IOException {fs.mkdirs(filePath);}private static void createFile(FileSystem fs, Path filePath) throws IOException {// 创建一个文件(如果存在则将其覆盖)// if the parent dir does not exist, fs will create it!FSDataOutputStream out = fs.create(filePath, true);try {// 写入一个文件String content = "test write file";out.write(content.getBytes());} finally {IOUtils.closeQuietly(out);}}private static void readFile(FileSystem fs, Path filePath) throws IOException {FSDataInputStream in = fs.open(filePath);try {byte[] buf = new byte[4096];int readLen = -1;do {readLen = in.read(buf);} while (readLen >= 0);} finally {IOUtils.closeQuietly(in);}}private static void queryFileOrDirStatus(FileSystem fs, Path path) throws IOException {FileStatus fileStatus = fs.getFileStatus(path);if (fileStatus.isDirectory()) {System.out.printf("path %s is dir\\n", path);return;}long fileLen = fileStatus.getLen();long accessTime = fileStatus.getAccessTime();long modifyTime = fileStatus.getModificationTime();String owner = fileStatus.getOwner();String group = fileStatus.getGroup();System.out.printf("path %s is file, fileLen: %d, accessTime: %d, modifyTime: %d, owner: %s, group: %s\\n",path, fileLen, accessTime, modifyTime, owner, group);}private static void getFileCheckSum(FileSystem fs, Path path) throws IOException {FileChecksum checksum = fs.getFileChecksum(path);System.out.printf("path %s, checkSumType: %s, checkSumCrcVal: %d\\n",path, checksum.getAlgorithmName(), ByteBuffer.wrap(checksum.getBytes()).getInt());}private static void copyFileFromLocal(FileSystem fs, Path cosnPath, Path localPath) throws IOException {fs.copyFromLocalFile(localPath, cosnPath);}private static void copyFileToLocal(FileSystem fs, Path cosnPath, Path localPath) throws IOException {fs.copyToLocalFile(cosnPath, localPath);}private static void renamePath(FileSystem fs, Path oldPath, Path newPath) throws IOException {fs.rename(oldPath, newPath);}private static void listDirPath(FileSystem fs, Path dirPath) throws IOException {FileStatus[] dirMemberArray = fs.listStatus(dirPath);for (FileStatus dirMember : dirMemberArray) {System.out.printf("dirMember path %s, fileLen: %d\\n", dirMember.getPath(), dirMember.getLen());}}// 递归删除标志用于删除目录// 如果递归为 false 并且 dir 不为空,则操作将失败private static void deleteFileOrDir(FileSystem fs, Path path, boolean recursive) throws IOException {fs.delete(path, recursive);}private static void closeFileSystem(FileSystem fs) throws IOException {fs.close();}public static void main(String[] args) throws IOException {// 初始化文件FileSystem fs = initFS();// 创建文件Path cosnFilePath = new Path("/folder/exampleobject.txt");createFile(fs, cosnFilePath);// 读取文件readFile(fs, cosnFilePath);// 查询文件或目录queryFileOrDirStatus(fs, cosnFilePath);// 获取文件校验和getFileCheckSum(fs, cosnFilePath);// 从本地复制文件Path localFilePath = new Path("file:///home/hadoop/ofs_demo/data/exampleobject.txt");copyFileFromLocal(fs, cosnFilePath, localFilePath);// 获取文件到本地Path localDownFilePath = new Path("file:///home/hadoop/ofs_demo/data/exampleobject.txt");copyFileToLocal(fs, cosnFilePath, localDownFilePath);listDirPath(fs, cosnFilePath);// 重命名mkdir(fs, new Path("/doc"));Path newPath = new Path("/doc/example.txt");renamePath(fs, cosnFilePath, newPath);// 删除文件deleteFileOrDir(fs, newPath, false);// 创建目录Path dirPath = new Path("/folder");mkdir(fs, dirPath);// 在目录中创建文件Path subFilePath = new Path("/folder/exampleobject.txt");createFile(fs, subFilePath);// 列出目录listDirPath(fs, dirPath);// 删除目录deleteFileOrDir(fs, dirPath, true);deleteFileOrDir(fs, new Path("/doc"), true);// 关闭文件系统closeFileSystem(fs);}}
本页内容是否解决了您的问题?