Docker+Hadoop+Spark简单环境搭建
Docker
安装
1 | sudo dnf -y install dnf-plugins-core |
获取镜像
这里使用新国大的Docker镜像,具体配置如下:
- ubuntu
- jdk 1.8.0_191 (/usr/java)
- Hadoop 2.8.5 (/usr/local/hadoop)
- Spark 2.2.0 (/usr/local/spark)
创建容器
1 | docker run -it -h master --name master nusbigdatacs4225/ubuntu-with-hadoop-spark |
其它操作
- 退出container:
exit
- 查看containers:
sudo docker ps [-a]
- 重启container:
sudo docker container start [name]
- 进入container命令行:
sudo docker attach [name]
Hadoop
查看容器IP
使用ifconfig
查看三个容器各自的ip,假设得到的ip如下:
- master: 172.17.0.2
- slave01:172.17.0.3
- slave02:172.17.0.4
配置容器IP
将对应ip填入/etc/hosts配置中:
1 | vi /etc/hosts |
vi /usr/local/hadoop/etc/hadoop/slaves
,增加slave01 slave02
运行HDFS
初始化hdfs并且运行:
1 | cd /usr/local/hadoop |
Spark
配置hadoop与Java路径:
1 | vi /usr/local/spark/conf/spark-env.sh |
vi /usr/local/spark/conf/slaves
, 配置从节点localhost slave01 slave02
WordCount
MapReduce运行
- 创建用户目录:
/usr/local/hadoop/bin/hdfs dfs -mkdir input /user/
- 上传input(自行在其中增加需要进行wordcount的文件):
/usr/local/hadoop/bin/hdfs dfs -put input /user/
- 运行wordcount示例:
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /user/input /user/output
Spark运行
要求hdfs已启动,并将input文件上传
在master中安装pyspark:
1 | apt-get update |
- 创建一个python3的软链接pytho:
1 | cd /usr/bin |
- python实现wordcount
1 | from pyspark import SparkContext |
- 运行:
python wordcount.py