Spark 설치
CentOS 7 기준
Spark 3.4.0 설치 이다.
1. 계정 설정
groupadd --gid 4000 spark
adduser --shell /bin/bash --gid 4000 --uid 4001 spark
2. java 설치
3. Scala 설치 (root 권한)
cd /opt/
wget https://downloads.lightbend.com/scala/2.13.10/scala-2.13.10.tgz --no-check-certificate
tar -zxvf scala-2.13.10.tgz
ln -s /opt/scala-2.13.10 /usr/local/scala
chown -R spark:spark /usr/local/scala/*
vi /etc/profile
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export JAVA_HOME=/usr/local/java
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin
alias python='/usr/local/bin/python3.11'
alias pip='/usr/local/bin/pip3.11'
source /etc/profile
4. Apache spark 설치
cd /opt/
wget https://downloads.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz --no-check-certificate
tar -zvxf spark-3.4.0-bin-hadoop3.tgz
ln -s /opt/spark-3.4.0-bin-hadoop3 /usr/local/spark
chown -R spark:spark /usr/local/spark/*
vi /etc/profile
export SPARK_HOME=/usr/local/spark
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin
source /etc/profile
5. spark 실행 확인
spark-shell
6. python 설치
7. namenode 포트 확인
INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /10.150.1.123:8032
namenode
vi yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>0.0.0.0:8032</value>
</property>
8. Datanode 포트 확인
23/04/18 10:37:49 WARN DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.56.200:9866,DS-c189003c-4c21-47ae-a8d0-5a539aff8356,DISK]
9. pyspark / extraClassPath configuration
cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
spark.driver.extraClassPath /usr/lib/ojdbc8.jar # 라이브러리 별도 추가
spark.executor.extraClassPath /usr/lib/ojdbc8.jar # 라이브러리 별도 추가
spark.pyspark.driver.python /usr/local/bin/python3.11
spark.pyspark.python /usr/local/bin/python3.11