Spark分布式集群搭建
所有操作默认非root用户,本文中用 lz
用户。
本文在上一篇Hadoop分布式集群搭建的基础上搭建Spark集群。
环境:
软件 | 版本 | 备注 |
---|---|---|
ubuntu | 18.04 | |
openjdk | 1.8.0_191 | $JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 |
hadoop | 2.7.7 | $HADOOP_HOME=/usr/local/hadoop |
spark | 2.3.3 | $SPARK_HOME=/usr/local/spark |
(docker) | 18.09.5 |
节点:
主机 | 主机名 | hadoop角色 | spark角色 | IP |
---|---|---|---|---|
物理机 | dblab-lz | namenode | master | 172.17.0.1 |
docker1 | worker1 | datanode | worker | 172.17.0.2 |
docker2 | worker2 | datanode | worker | 172.17.0.3 |
docker3 | worker3 | datanode | worker | 172.17.0.4 |
docker4 | worker4 | datanode | worker | 172.17.0.5 |
在物理机上进行Spark配置:
$ cd $SPARK_HOME/conf
$ cp ./slaves.template ./slaves #注意新版本spark已将slaves的说法更改为workers
将slaves
修改为:
worker1
worker2
worker3
worker4
配置spark-env.sh
:
cp ./spark-env.sh.template ./spark-env.sh
在spark-env.sh
首部添加:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_HOST=dblab-lz
将配置文件分发到所有worker节点:
$ scp -r /usr/local/spark/conf/* worker1:/usr/local/spark/conf/
$ scp -r /usr/local/spark/conf/* worker2:/usr/local/spark/conf/
$ scp -r /usr/local/spark/conf/* worker3:/usr/local/spark/conf/
$ scp -r /usr/local/spark/conf/* worker4:/usr/local/spark/conf/
启动spark master节点(默认此时hadoop集群已启动):
$ start-master.sh # 可在浏览器输入localhost:8080访问web页面
对应jps进程:
进程名 | 节点 |
---|---|
Master | master |
启动spark worker节点:
$ start-slaves.sh
对应jps进程:
进程名 | 节点 |
---|---|
Worker | worker |
测试运行:
$ run-example SparkPi 2>&1 | grep "Pi is"
应该得到类似:
Pi is roughly 3.143315716578583
关闭spark worker节点:
$ stop-slaves.sh
关闭spark master节点:
$ stop-master.sh