Hadoop 설치

CentOS 7 기준

Hadoop 은 현 시점(2023.03.01) 최신 버전인 3.3.4를 설치한다.

datanode의 경우 data01, data02 로 disk를 2개씩 가지고 있다.

참고

노드 구성


  • namenode : CPU : 4core / memory 16GB / Disk 100GB
  • secondary : CPU : 4core / memory 16GB / Disk 100GB
  • datanode1 : CPU : 2core / memory 8GB / Disk1 100GB / Disk2 100GB
  • datanode2 : CPU : 2core / memory 8GB / Disk1 100GB / Disk2 100GB
  • datanode3 : CPU : 2core / memory 8GB / Disk1 100GB / Disk2 100GB



1. 자바 설치


hadoop 3.3.4 는 jdk 11 버전을 권장하고 있으므로 java-11-openjdk-devel를 설치한다.

java-11-openjdk 를 설치하게 되면 jps 명령어가 동작하지 않는다. 반드시 java-11-openjdk-devel을 설치하도록 한다.

jps 동작 하지 않는 현상 관련 Stack overflow 참조

CentOS7 Java 설치

openjdk 와 openjdk-devel의 차이

자바 설치 후 "readlink -f $(which java)" 명령어를 사용하여 java 가 설치된 위치를 알아낼 수 있다.



2. 각 서버 계정 설정


# hdfs 그룹 및 계정 생성
addgroup --gid 2001 hdfs
adduser --create-home --shell /bin/bash --gid 2001 --uid 2001 hdfs
passwd hdfs

# yarn 그룹 및 계정 생성
addgroup --gid 2002 yarn
adduser --create-home --shell /bin/bash --gid 2002 --uid 2002 yarn
passwd yarn



3. Data directory 생성


namenode , secondarynode 서버

mkdir -p /data/hdfs/namenode
mkdir -p /data/hdfs/jornalnode
chown -R hdfs:hdfs /data/hdfs
mkdir -p /data/yarn
chown -R yarn:yarn /data/yarn


datanode1,2,3 서버

mkdir -p /data01/hdfs/dn
chown -R hdfs:hdfs /data01/hdfs

mkdir -p /data02/hdfs/dn
chown -R hdfs:hdfs /data02/hdfs

mkdir -p /data01/yarn
chown -R yarn:yarn /data01/yarn

mkdir -p /data02/yarn
chown -R yarn:yarn /data02/yarn



4. 각 서버 Hostname 설정 및 /etc/hosts 설정


# master node setting
hostnamectl set-hostname name01.devdap.com
hostnamectl set-hostname secondary01.devdap.com

# worker node setting
hostnamectl set-hostname data01.devdap.com
hostnamectl set-hostname data02.devdap.com
hostnamectl set-hostname data03.devdap.com

# 모든 노드 hosts 파일 변경
vi /etc/hosts

192.168.56.100 name01.devdap.com name01
192.168.56.150 secondary01.devdap.com secondary01

# worker node setting
192.168.56.200 data01.devdap.com data01
192.168.56.201 data02.devdap.com data02
192.168.56.202 data03.devdap.com data03



5. SSH Key 교환

name, secondary 노드의 hdfs, yarn 계정 모두 키를 생성하여 각각 노드마다 키 교환을 해준다.

# hdfs
su hdfs
ssh-keygen # 공개키 생성
ssh-copy-id hdfs@data01.devdap.com
ssh-copy-id hdfs@data02.devdap.com
ssh-copy-id hdfs@data03.devdap.com

# yarn
su yarn
ssh-keygen #공개키 생성
ssh-copy-id yarn@data01.devdap.com
ssh-copy-id yarn@data02.devdap.com
ssh-copy-id yarn@data03.devdap.com



6. 하둡 다운


모든 노드에 하둡을 설치해 준다.

# /opt 경로에 다운을 받아준다.
cd /opt

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz

tar -zxvf hadoop-3.3.4.tar.gz

sudo mkdir /opt/hadoop-3.3.4/pids
sudo mkdir /opt/hadoop-3.3.4/logs
sudo chown -R hdfs:hdfs /opt/hadoop-3.3.4
sudo chmod 757 /opt/hadoop-3.3.4/pids
sudo chmod 757 /opt/hadoop-3.3.4/logs



7. 하둡 설정(name, secondary node)


hadoop-env.sh

# master 노드 설정

cd /opt/hadoop-3.3.4/etc/hadoop

sudo vi hadoop-env.sh

# hadoop-env.sh(맨 아래에 저장)
export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
export HADOOP_HOME=/opt/hadoop-3.3.4
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export HADOOP_PID_DIR=${HADOOP_HOME}/pids
export HDFS_NAMENODE_USER="hdfs"
export HDFS_DATANODE_USER="hdfs"
export YARN_RESOURCEMANAGER_USER="yarn"
export YARN_NODEMANAGER_USER="yarn"

core-site.xml

core-site.xml default documentation

sudo vi core-site.xml
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://name01.devdap.com:9000</value>
		<description>NameNode URI</description>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
		<description>Buffer size</description>
	</property> <!-- HA Configuration -->
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>zookeeper-001:2181,zookeeper-002:2181,zookeeper-003:2181</value>
	</property>
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>
		<value>/home/hdfs/.ssh/id_rsa</value>
	</property>
</configuration>

hdfs-site.xml

hdfs-site.xml default documentation

sudo vi hdfs-site.xml
<configuration>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
	<property>
		<name>dfs.namenode.http-address</name>
		<value>name01.devdap.com:9870</value>
	</property>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>secondary01.devdap.com:9868</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/data/hdfs/namenode</value>
	</property>
</configuration>

mapred-site.xml

mapred-site.xml default documentation

sudo vi mapred-site.xml
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>

yarn-site.xml

yarn-site.xml default documentation

sudo vi yarn-site.xml
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>master1.dev.com</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>1024</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>1</value>
	</property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master1.dev.com:8080</value>
  </property>
  <property>
    <name>yarn.webapp.ui2.enable</name>
    <value>true</value>
  </property>
	<property>
		<name>yarn.nodemanager.env-whitelist</name>
		<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
	</property>
</configuration>

workers

sudo vi workers
master1.dev.com
workder1.dev.com



8. 하둡 설정(worker node)


hadoop-env.sh

cd /opt/hadoop-3.3.4/etc/hadoop

sudo vi hadoop-env.sh

export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
export HADOOP_HOME=/opt/hadoop-3.3.4
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
export HADOOP_PID_DIR=${HADOOP_HOME}/pids
export HDFS_NAMENODE_USER="hdfs"
export HDFS_DATANODE_USER="hdfs"
export YARN_RESOURCEMANAGER_USER="yarn"
export YARN_NODEMANAGER_USER="yarn"

core-site.xml

core-site.xml default documentation

sudo vi core-site.xml
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://name01.devdap.com:9000</value>
	</property>
</configuration>

hdfs-site.xml

hdfs-site.xml default documentation

sudo vi hdfs-site.xml
<configuration>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/data01/hdfs/dn,file:/data02/hdfs/dn</value>
	</property>
</configuration>

mapred-site.xml

mapred-site.xml default documentation

sudo vi mapred-site.xml
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>

yarn-site.xml

yarn-site.xml default documentation

sudo vi yarn-site.xml
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>name01.devdap.com</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>1024</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>1</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>name01.devdap.com:8080</value>
	</property>
	<property>
		<name>yarn.webapp.ui2.enable</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.nodemanager.env-whitelist</name>
		<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
	</property>
</configuration>



9. 하둡 실행


name, secondary 노드에서 아래와 같이 하둡 포맷을 시킨다.

/opt/hadoop-3.3.4/bin/hdfs namenode -format
/opt/hadoop-3.3.4/sbin/start-all.sh

아래로 접속하여 ui 를 확인해본다.

  • name01.devdap.com:9870
  • name01.devdap.com:8080/ui2