Spark 설치



CentOS 7 기준

Spark 3.4.0 설치 이다.



1. 계정 설정


groupadd --gid 4000 spark

adduser --shell /bin/bash --gid 4000 --uid 4001 spark



2. java 설치


CentOS 7 Java 설치



3. Scala 설치 (root 권한)


cd /opt/

wget https://downloads.lightbend.com/scala/2.13.10/scala-2.13.10.tgz --no-check-certificate

tar -zxvf scala-2.13.10.tgz

ln -s /opt/scala-2.13.10 /usr/local/scala

chown -R spark:spark /usr/local/scala/*

vi /etc/profile

export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export JAVA_HOME=/usr/local/java
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin

alias python='/usr/local/bin/python3.11'
alias pip='/usr/local/bin/pip3.11'

source /etc/profile



4. Apache spark 설치


cd /opt/

wget https://downloads.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz --no-check-certificate

tar -zvxf spark-3.4.0-bin-hadoop3.tgz

ln -s /opt/spark-3.4.0-bin-hadoop3 /usr/local/spark

chown -R spark:spark /usr/local/spark/*


vi /etc/profile

export SPARK_HOME=/usr/local/spark
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin

source /etc/profile



5. spark 실행 확인


spark-shell



6. python 설치



7. namenode 포트 확인


INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /10.150.1.123:8032

namenode 
vi yarn-site.xml

<property>
   <name>yarn.resourcemanager.address</name>
               <value>0.0.0.0:8032</value>
 </property>

8. Datanode 포트 확인

23/04/18 10:37:49 WARN DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.56.200:9866,DS-c189003c-4c21-47ae-a8d0-5a539aff8356,DISK]



9. pyspark / extraClassPath configuration


cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf

spark.driver.extraClassPath       /usr/lib/ojdbc8.jar # 라이브러리 별도 추가
spark.executor.extraClassPath     /usr/lib/ojdbc8.jar # 라이브러리 별도 추가
spark.pyspark.driver.python       /usr/local/bin/python3.11
spark.pyspark.python              /usr/local/bin/python3.11