Currently, only parameters in the following files can be customized: HDFS: core-site.xml, hdfs-site.xml, hadoop-env.sh, log4j.properties YARN: yarn-site.xml, mapred-site.xml, fair-scheduler.xml, capacity-scheduler.xml, yarn-env.sh, mapred-env.sh Hive: hive-site.xml, hive-env.sh, hive-log4j2.properties
[{"serviceName": "HDFS","classification": "hdfs-site.xml","serviceVersion": "2.8.4","properties": {"dfs.blocksize": "67108864","dfs.client.slow.io.warning.threshold.ms": "900000","output.replace-datanode-on-failure": "false"}},{"serviceName": "YARN","classification": "yarn-site.xml","serviceVersion": "2.8.4","properties": {"yarn.app.mapreduce.am.staging-dir": "/emr/hadoop-yarn/staging","yarn.log-aggregation.retain-check-interval-seconds": "604800","yarn.scheduler.minimum-allocation-vcores": "1"}},{"serviceName": "YARN","classification": "capacity-scheduler.xml","serviceVersion": "2.8.4","properties": {"content": "<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>\\n<?xml-stylesheet type=\\"text/xsl\\" href=\\"configuration.xsl\\"?>\\n<configuration><property>\\n <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>\\n <value>0.8</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.maximum-applications</name>\\n <value>1000</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.default.capacity</name>\\n <value>100</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>\\n <value>100</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>\\n <value>1</value>\\n</property>\\n<property>\\n <name>yarn.scheduler.capacity.root.queues</name>\\n <value>default</value>\\n</property>\\n</configuration>"}}]
capacity-scheduler.xml
or fair-scheduler.xml
, set key
in properties
to content
, and set value
to the content of the entire file.nameservice
of the external cluster to be accessed is HDFS8088
, and the access method is as follows:<property><name>dfs.ha.namenodes.HDFS8088</name><value>nn1,nn2</value></property><property><name>dfs.namenode.http-address.HDFS8088.nn1</name><value>172.21.16.11:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn1</name><value>172.21.16.11:4009</value></property><name>dfs.namenode.rpc-address.HDFS8088.nn1</name><value>172.21.16.11:4007</value><property><name>dfs.namenode.http-address.HDFS8088.nn2</name><value>172.21.16.40:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn2</name><value>172.21.16.40:4009</value></property><property><name>dfs.namenode.rpc-address.HDFS8088.nn2</name><value>172.21.16.40:4007</value><property>
[{"serviceName": "HDFS","classification": "hdfs-site.xml","serviceVersion": "2.7.3","properties": {"newNameServiceName": "newEmrCluster","dfs.ha.namenodes.HDFS8088": "nn1,nn2","dfs.namenode.http-address.HDFS8088.nn1": "172.21.16.11:4008","dfs.namenode.https-address.HDFS8088.nn1": "172.21.16.11:4009","dfs.namenode.rpc-address.HDFS8088.nn1": "172.21.16.11:4007","dfs.namenode.http-address.HDFS8088.nn2": "172.21.16.40:4008","dfs.namenode.https-address.HDFS8088.nn2": "172.21.16.40:4009","dfs.namenode.rpc-address.HDFS8088.nn2": "172.21.16.40:4007"}}]
nameservice
of the newly created cluster, which is optional. If this parameter is left empty, its value will be generated by the system; if it is not empty, its value must consist of a string, digits, and hyphen.Access to external clusters is supported only for high-availability clusters. Access to external clusters is supported only for clusters with Kerberos disabled.
nameservice
of the cluster is HDFS80238
(if it is not a high-availability cluster, the nameservice
will usually be masterIp:rpcport
, such as 172.21.0.11:4007).
The nameservice
of the external cluster to be accessed is HDFS8088
, and the access method is as follows:<property><name>dfs.ha.namenodes.HDFS8088</name><value>nn1,nn2</value></property><property><name>dfs.namenode.http-address.HDFS8088.nn1</name><value>172.21.16.11:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn1</name><value>172.21.16.11:4009</value></property><name>dfs.namenode.rpc-address.HDFS8088.nn1</name><value>172.21.16.11:4007</value><property><name>dfs.namenode.http-address.HDFS8088.nn2</name><value>172.21.16.40:4008</value></property><property><name>dfs.namenode.https-address.HDFS8088.nn2</name><value>172.21.16.40:4009</value></property><property><name>dfs.namenode.rpc-address.HDFS8088.nn2</name><value>172.21.16.40:4007</value><property>
/usr/local/service/hadoop/etc/hadoop/hdfs-site.xml
file.hdfs-site.xml
file of the HDFS component.dfs.nameservices
to HDFS80238,HDFS8088
.Configuration Item | Value |
dfs.ha.namenodes.HDFS8088 | nn1,nn2 |
fs.namenode.http-address.HDFS8088.nn1 | 172.21.16.11:4008 |
dfs.namenode.https-address.HDFS8088.nn1 | 172.21.16.11:4009 |
dfs.namenode.rpc-address.HDFS8088.nn1 | 172.21.16.11:4007 |
fs.namenode.http-address.HDFS8088.nn2 | 172.21.16.40:4008 |
dfs.namenode.https-address.HDFS8088.nn2 | 172.21.16.40:4009 |
dfs.namenode.rpc-address.HDFS8088.nn2 | 172.21.16.40:4007 |
dfs.client.failover.proxy.provider.HDFS8088 | org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider |
dfs.internal.nameservices | HDFS80238 |
dfs.internal.nameservice
needs to be added; otherwise, if the cluster is scaled out, the "datanode" may report an error and be marked as "dead" by thenamenode
.
Was this page helpful?