create ns
to mount specified file directories of COS and CHDFS to GooseFS, and then use the gfs://
unified schema to access data. Details are as follows:gfs://
, and the files are cached in the local file system of GooseFS.gfs://
(for example, hadoop fs ls gfs://BU_A
), and they can also be accessed through the namespace of each remote file system (for example, hadoop fs ls cosn://bucket-1/BU_A
).gfs://
because the files are not cached in the local file system of GooseFS, but they can still be accessed through the namespace of the underlying storage systems.create ns
instruction to create a namespace in GooseFS and map underlying storage systems to GooseFS. Currently supported underlying storage systems include COS, CHDFS, and local HDFS. The procedure for creating a namespace is similar to that for mounting a file volume disk in a Linux file system. With the namespace created, GooseFS can provide clients with a file system with uniform access semantics. The current operation instruction set for GooseFS namespaces is as follows:$ goosefs nsUsage: goosefs ns [generic options][create <namespace> <CosN/Chdfs path> <--wPolicy <1-6>> <--rPolicy <1-5>> [--readonly] [--shared] [--secret fs.cosn.userinfo.secretId=<AKIDxxxxxxx>] [--secret fs.cosn.userinfo.secretKey=<xxxxxxxxxx>] [--attribute fs.ofs.userinfo.appid=1200000000][--attribute fs.cosn.bucket.region=<ap-xxx>/fs.cosn.bucket.endpoint_suffix=<cos.ap-xxx.myqcloud.com>]][delete <namespace>][help [<command>]][ls [-r|--sort=option|--timestamp=option]][setPolicy [--wPolicy <1-6>] [--rPolicy <1-5>] <namespace>][setTtl [--action delete|free] <namespace> <time to live>][stat <namespace>][unsetPolicy <namespace>][unsetTtl <namespace>]
Instruction | Description |
create | Creates a namespace and maps a remote storage system (UFS) to the namespace. When creating the namespace, you can set cache read and write policies. You need to pass in an authorized key ( secretId and secretKey ). |
delete | Deletes a specified namespace. |
ls | Lists the detailed information of a specified namespace, including the UFS path, creation time, cache policy, and TTL information. |
setPolicy | Sets the cache policy of a specified namespace. |
setTtl | Sets TTL for a specified namespace. |
stat | Provides the description of a specified namespace, including the mount point, UFS path, creation time, cache policy, TTL information, persistence status, user group, ACL, last access time, and modification time. |
unsetPolicy | Resets the cache policy of a specified namespace. |
unsetTtl | Resets the TTL of a specified namespace. |
example-bucket
, the example-prefix
directory in the bucket, and the CHDFS to the test_cos
, test_cos_prefix
, and test_chdfs
namespaces respectively.# Map the COS bucket `example-bucket` to the `test_cos` namespace$ goosefs ns create test_cos cosn://example-bucket-1250000000/ --wPolicy 1 --rPolicy 1 --secret fs.cosn.userinfo.secretId=AKIDxxxxxxx --secret fs.cosn.userinfo.secretKey=xxxxxxxxxx --attribute fs.cosn.bucket.region=ap-guangzhou --attribute fs.cosn.bucket.endpoint_suffix=cos.ap-guangzhou.myqcloud.com# Map the `example-prefix` directory in the COS bucket `example-bucket` to the `test_cos_prefix` namespace$ goosefs ns create test_cos_prefix cosn://example-bucket-1250000000/example-prefix/ --wPolicy 1 --rPolicy 1 --secret fs.cosn.userinfo.secretId=AKIDxxxxxxx --secret fs.cosn.userinfo.secretKey=xxxxxxxxxx --attribute fs.cosn.bucket.region=ap-guangzhou --attribute fs.cosn.bucket.endpoint_suffix=cos.ap-guangzhou.myqcloud.com# Map the CHDFS `f4ma0l3qabc-Xy3` to the `test_chdfs` namespace$ goosefs ns create test_chdfs ofs://f4ma0l3qabc-Xy3/ --wPolicy 1 --rPolicy 1 --attribute fs.ofs.userinfo.appid=1250000000
goosefs fs ls
instruction to view the directory details:$ goosefs fs ls /test_cos
delete
instruction to delete unwanted namespaces:$ goosefs ns delete test_cosDelete the namespace: test_cos
setPolicy
and unsetPolicy
to set the cache policy of a namespace. The instruction set is as follows:$goosefs ns setPolicy [--wPolicy <1-6>] [--rPolicy <1-5>] <namespace>
Policy Name | Behavior | Corresponding Write_Type | Data Security | Write Efficiency |
MUST_CACHE (1) | Data is stored only in GooseFS and is not written to the remote storage system. | MUST_CACHE | Unreliable | High |
TRY_CACHE (2) | If the cache has space, data is written to GooseFS. Otherwise, data is written directly to underlying storage systems. | TRY_CACHE | Unreliable | Medium |
CACHE_THROUGH (3) | Data is cached as much as possible and simultaneously written to remote storage systems. | CACHE_THROUGH | Reliable | Low |
THROUGH (4) | Data is not stored in GooseFS, but written directly to the remote storage system. | THROUGH | Reliable | Medium |
ASYNC_THROUGH (5) | Data is written to GooseFS and asynchronously purged to remote storage systems. | ASYNC_THROUGH | Weak reliability | High |
Write_Type
indicates the file cache policy specified when the user calls the SDK or API to write data to GooseFS. It takes effect only for a single file.MUST_CACHE
to CACHE_THROUGH
, if the persist
command is not called to persist the data, the data that is about to be eliminated cannot be written to the underlying layer and will be lost.Policy Name | Behavior | Metadata Sync | Corresponding Read_Type | Data Consistency | Read Efficiency | Whether to Cache Data |
NO_CACHE (1) | Data is not cached and is directly read from remote storage systems instead. | NO | NO_CACHE | Strong | Low | No |
CACHE (2) | Metadata access behavior: if a cache is hit, the metadata in the master shall prevail. Metadata is not proactively synchronized from the underlying layer. Data stream access behavior: the data stream Read_Type is CACHE . | Once | CACHE | Weak | Hit: high Not hit: low | Yes |
CACHE_PROMOTE (3) | Metadata access behavior: same as CACHE .Data stream access behavior: the data stream Read_Type is CACHE_PROMOTE . | Once | CACHE_PROMOTE | Weak | Hit: high Not hit: low | Yes |
CACHE_CONSISTENT_PROMOTE (4) | Metadata access behavior: sync the metadata in the remote storage system (UFS) each time before a read operation. If the data does not exist in the UFS, report the Not Exists exception.Data stream access behavior: the data stream Read_Type is CACHE_PROMOTE . If a cache is hit, data is cached to the hottest cache medium. | Always | CACHE | Strong | Hit: medium Not hit: low | Yes |
CACHE_CONSISTENT (5) | Metadata access behavior: same as CACHE_CONSISTENT_PROMOTE . Data stream access behavior: the data stream Read_Type is CACHE . That is, when a cache is hit, data is not moved between different media layers. | Always | CACHE_PROMOTE | Strong | Hit: medium Not hit: low | Yes |
Read_Type
indicates the file cache policy specified when the user calls the SDK or API to read data from GooseFS. It takes effect only for a single file.Write Cache Policy | Read Cache Policy | Policy Combination Performance |
CACHE_THROUGH (3) | CACHE_CONSISTENT (5) | Strong data consistency between the cache and remote storage systems |
CACHE_THROUGH (3) | CACHE (2) | Write: strong consistency; read: eventual consistency |
ASYNC_THROUGH (5) | CACHE_CONSISTENT (5) | Write: eventual consistency; read: strong consistency |
ASYNC_THROUGH (5) | CACHE (2) | Read/Write: eventual consistency |
MUST_CACHE (1) | CACHE (2) | Data is read from the cache only. |
test_cos
namespace to CACHE_THROUGH
and CACHE_CONSISTENT` respectively:$ goosefs ns setPolicy --wPolicy 3 --rPolicy 5 test_cos
Read_Type
or Write_Type
for specific files when reading or writing files, or by using the Properties
configuration file. If multiple policies exist at the same time, their priority order is as follows: custom priority > namespace read and write policies > global cache policy configured in the configuration file. For the read policy, the combination of the custom Read_Type
and the namespace's DirReadPolicy
takes effect. That is, the custom Read_Type
is used as the data stream read policy, and the namespace policy is used for metadata. CACHE_CONSISTENT
and the namespace contains a test.txt
file. When the client reads the test.txt
file, Read_Type
is specified as CACHE_PROMOTE
. Then the entire read behavior is to sync metadata and perform CACHE_PROMOTE
.unsetPolicy
instruction. The following shows how to reset the read and write cache policies for the test_cos
namespace:$ goosefs ns unsetPolicy test_cos
delete
or free
, to be performed on the cached data after a specified period of time. The instruction for setting TTL is as follows:$ goosefs ns setTtl [--action delete|free] <namespace> <time to live>
delete
and free
are supported. The delete
operation deletes data from the cache and UFS, while the free
operation deletes data only from the cache.test_cos
namespace to delete data only from the cache after 60 seconds:$ goosefs ns setTtl --action free test_cos 60000
gfs://
path. You only need to specify the paths of the underlying storage systems. We recommend that you use GooseFS as a unified data access layer to uniformly read and write data from GooseFS to ensure metadata consistency.conf/goosefs-site.properties
configuration file:goosefs.user.file.metadata.sync.interval=<INTERVAL>
goosefs fs ls -R -Dgoosefs.user.file.metadata.sync.interval=0 <path to sync>
goosefs-site.properties
configuration file to batch configure the metadata synchronization interval for the master nodes in the cluster, and other nodes will adopt this interval by default.goosefs.user.file.metadata.sync.interval=1m
-1
, so that GooseFS will not automatically synchronize the metadata of the directories.Access Mode | | Metadata Synchronization Interval | Description |
All file requests go through GooseFS | | -1 | - |
Most file requests go through GooseFS | HDFS is used as UFS | Hot update or update by path is recommended | If the HDFS updates frequently, you are advised to set the update interval to `-1` to prohibit updates. |
| COS is used as UFS | Configuring update intervals by path is recommended | Configuring different update intervals for different directories can alleviate the pressure of metadata synchronization. |
| File upload requests generally do not go through GooseFS | HDFS is used as UFS | Configuring update intervals by path is recommended |
| | COS is used as UFS | Configuring update intervals by path is recommended |
Apakah halaman ini membantu?