This document would share an experience on setting up dockerized master-slave hadoop and spark on top of them. Then config your environment and listen to steaming data. It will also help to automate your environment setup for development and production stages.
This experiment case is contain; a) private a master container, and b) provide 2 slave hadoop containers and c) prepare spark on top of hadoop infrustructure.
Requirement
Quick outline
prepare a docker for hadoop main node
Creating docker file for hadoop main node:
Create a directory by name hadoop-master
and then create a Dockerfile inside that contain following info:
As you can see in Dockerfile, customized bootstrap configuration file is must be located beside Dockerfile
.
Therefore, inside same directory, create a bash file named bootstrap.sh
and then enter the following script within that:
prepare a docker for hadoop slave nodes
Creating docker file for hadoop main node:
Create a directory by name hadoop-slave
and then create a Dockerfile inside that contain following info:
As you can see in Dockerfile, slave containers also required customized bootstrap configuration file is must be located beside Dockerfile
.
Therefore, inside same directory, create a bash file named bootstrap.sh
and then enter the following script within that:
prepare a docker for spark and config the nodes
Now it is turn to config and create a container for spark including it’s dependencies:
Create a directory by name spark
and then create a Dockerfile inside that contain following info:
As you can see within docker file few dependencies is there which need to be satisfied including spark configuration and node manager: Therefore we go with creating one by one:
we need to customize the bootstrap, and for that reason we are going to update the current bootstrap file with the file.
create a bash file bootstrap.sh
contain following content and place beside the dockerfile:
create another bash file start-master.sh
to start master node:
create another bash file start-worker
to start workers:
create another bash spark-shell.sh
file to start spark :
one of the important files we need is to customize configuration by injecting a config file spark-defaults.conf
:
and finaly clean up the host alias and remove the redundancies remove_alias.sh
:
prepare a docker compose file to manage the package under same network
Creating docker compose file for whole package:
Within root directory, create a docker-compose.yml
file. The file must be contain 4 sections, master service, 2 slaves and finaly spark container.
Therefore, the file would be comprised of:
run docker-compose up --build -d
enjoy your environment.