The last time I set up a Spark cluster, I installed Spark manually and configured each node directly. For the small scale cluster I had, that was fine. This time time the cluster is still relatively small scale. However, I do want to take advantage of a prebuilt containers, if possible. The standard for that is Docker. So in this post I will set up a Docker Swarm on the Personal Compute Cluster.
Installing Docker
Before we begin, if you set up SETI@Home as the end of the last post, we need to stop it first (skip if you didn’t set up SETI@Home):
parallel-ssh -i -h ~/cluster/all.txt -l root "service boinc-client stop"
We need to ensure that swap is off, as various applications we will be working with later do not like it:
parallel-ssh -i -h ~/cluster/all.txt -l root "swapoff -a"
Now we install all the needed software:
parallel-ssh -i -h ~/cluster/all.txt -l root "apt-get update -y" parallel-ssh -i -h ~/cluster/all.txt -l root "apt-get upgrade -y" parallel-ssh -i -h ~/cluster/all.txt -l root "apt-get install apt-transport-https software-properties-common ca-certificates -y" parallel-ssh -i -h ~/cluster/all.txt -l root "wget https://download.docker.com/linux/ubuntu/gpg && apt-key add gpg" parallel-ssh -i -h ~/cluster/all.txt -l root "echo 'deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable' >> /etc/apt/sources.list" parallel-ssh -i -h ~/cluster/all.txt -l root "sudo apt-get update -y" parallel-ssh -i -h ~/cluster/all.txt -l root "apt-get install docker-ce -y"
The installation of the software should have created the user group docker
. Now we need to add our user account to that group so we can run docker without sudo
(change the user account name as needed):
parallel-ssh -i -h ~/cluster/all.txt -l root "usermod -aG docker michael"
Now we need to configure Docker to use our nodes’ /mnt/data
partitions for all its storage. We also want to tell Docker that it is OK to use the local registry that we will be setting up later in this post, and force it to use the Google DNS. To do this, we need to create a configuration file for Docket. Start by creating or editing the /etc/docker/daemon.json
file:
sudo vi /etc/docker/daemon.json
If this daemon.json
file was new, make the contents look like this:
{ "data-root":"/mnt/data/docker/", "insecure-registries": ["master:5000"], "dns" : ["8.8.8.8"] "metrics-addr" : "0.0.0.0:9323", "experimental" : true }
If the file already existed, simply add the element shown above to the file. Now distribute this file to the slave nodes:
parallel-scp -h ~/cluster/slaves.txt -l root /etc/docker/daemon.json /etc/docker/daemon.json
Finally, start docker on all machines:
parallel-ssh -i -h ~/cluster/all.txt -l root "systemctl restart docker && systemctl enable docker"
You need to log out of the master node and log back in for the group membership to take effect. Once doing so, try docker out:
docker run hello-world
You should get a pleasant message from Docker. Now lets set up the Docker swarm. On the master node, if you did not allow all traffic into the cluster via the ufw
firewall, we first need to open the firewall for docker:
sudo ufw allow 2376/tcp sudo ufw allow 7946/udp sudo ufw allow 7946/tcp sudo ufw allow 80/tcp sudo ufw allow 2377/tcp sudo ufw allow 4789/udp sudo ufw reload sudo ufw enable sudo systemctl restart docker
Now we init the swarm:
docker swarm init --advertise-addr 10.1.1.1
That should give you a message which has a command to run on the slave nodes for them to join the swarm. You can do that easily with pssh
. Update for the join command you got from the prior command:
parallel-ssh -i -h ~/cluster/slaves.txt -l michael "docker swarm join --token SWMTKN-1-16bn159k6y9rtyitecha7q5cts8ru1qkojlykual4119fsppxm-bavnczg8eiutgmv0r34zzswgx 10.1.1.1:2377"
Check that it worked and the nodes are all in the swarm:
docker node ls
Now let’s create a simple service just to prove it all worked:
docker service create --name webserver -p 80:80 httpd docker service ls
You can now go to a web browser on your development computer, and visit the URL of the public IP address for your cluster, which for me would be http://192.168.1.60
. You should see a web page with the phrase “It Works!”. Once you are satisfied with your accomplishment, we can remove the web service:
docker service rm webserver docker service ls
The web page should no longer work if you try to refresh it in your browser.
Monitoring Docker Swarm
If you want to set up a nice lightweight management UI for your docker swarm, install Portainer:
sudo mkdir -p /mnt/data/portainer/ docker service create \ --name portainer \ --publish 9999:9000 \ --replicas=1 \ --limit-memory=128M \ --constraint 'node.role == manager' \ --mount=type=bind,src=/mnt/data/portainer/,dst=/data \ --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \ portainer/portainer
Now point a browser at http://master-node-ip:9999
(replacing master-node-ip
with the public WAN address of your master node), create an account, and enjoy!
Creating a Local Docker Registry
We will be running a Docker registry within the swarm in order to manage the images we will be creating later. The registry is needed to easily distribute your custom docker images to all nodes when a service is create. Normally registries are set up with security. However, because our cluster is private and we do not have a signed certificate for from a certificate authority, we need to set up the registry is insecure with a self-signed certificate. On the master node do the following:
cd mkdir certs cd certs/ openssl req -newkey rsa:4096 -nodes -sha256 -keyout registry.key -x509 -days 365 -out registry.crt
The certificate generation will pose a few questions to you. Fill them in as you see fit, except for the question on “Common Name”. For that, you should enter the IP address of the master node, which is master
. Once you do that, we need to distribute the certificate to the docker engine on all nodes:
parallel-ssh -i -h ~/cluster/all.txt -l root "mkdir -p /etc/docker/certs.d/10.1.1.1:5000" parallel-scp -h ~/cluster/all.txt -l root registry.crt /etc/docker/certs.d/10.1.1.1:5000/ca.crt parallel-ssh -i -h ~/cluster/all.txt -l root "mkdir -p /etc/docker/certs.d/master:5000" parallel-scp -h ~/cluster/all.txt -l root registry.crt /etc/docker/certs.d/master:5000/ca.crt parallel-ssh -i -h ~/cluster/all.txt -l root "service docker restart"
Now, start the registry in the swarm with the following, but adjust the path /home/michael/certs
to be where the certs folder is in your home directory is on the master node:
docker service create --name registry --publish=5000:5000 \ --constraint=node.role==manager \ --mount=type=bind,src=/home/michael/certs,dst=/certs \ --limit-memory=500m \ -e REGISTRY_HTTP_ADDR=0.0.0.0:5000 \ -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/registry.crt \ -e REGISTRY_HTTP_TLS_KEY=/certs/registry.key \ registry:latest
In order to track want’s in the registry, we can load a simply registry visualizer:
docker service create --name registry-browser -p 5050:8080 \ -e DOCKER_REGISTRY_URL=https://10.1.1.1:5000 \ -e NO_SSL_VERIFICATION=true \ --limit-memory=128m \ klausmeyer/docker-registry-browser
Now point you browser at http://master-node-ip:5050
(change the IP address to your cluster’s WAN address), and you can see that your local docker registry is empty. But we will be changing that in the upcoming posts.
Fun Things to do on your Docker Swarm
Run SETI@Home on Docker Swarm
We still have some work to do before we are ready to run Spark. As with the last post, that will be another day. For now, we can now leverage our new Docker Swarm to run and manager a SETI@Home processes. A Docker image for BOINC, the software that enables SETI@Home, has been prebuilt and available in the registry, and you can use that. There is even a version optimized for Intel CPUs, which the Personal Cluster users. To launch the service into the Docker Swarm such that each node gets an instance to run, use this command:
docker network create -d overlay --attachable boinc parallel-ssh -i -h ~/cluster/all.txt -l root "mkdir -p /mnt/data/boinc" parallel-ssh -i -h ~/cluster/all.txt -l root "chgrp docker /mnt/data/boinc" docker service create \ --mode global \ --name boinc \ --network=boinc \ --hostname="boinc-{{.Node.Hostname}}" \ --mount type=bind,src=/mnt/data/boinc,dst=/var/lib/boinc \ -p 31416:31416 \ -e BOINC_GUI_RPC_PASSWORD="123" \ -e BOINC_CMD_LINE_OPTIONS="--allow_remote_gui_rpc" \ --limit-memory=1G \ --limit-cpu=8 \ boinc/client:intel docker run \ --rm \ --network boinc \ boinc/client \ boinccmd_swarm \ --passwd 123 \ --project_attach http://setiathome.berkeley.edu \ <*insert your account key here*>
Replace <*insert your account key here*>
with your SETI@Home account key. One thing I will point out is the use of the --limit-cpu
option in the service creation command. Without it, each instance of the service will use 100% of the CPUs. The option limits the effective number of CPUs each instance can utilize. Since each of our nodes has 12 virtual CPUs (6 cores, 2 threads per core), setting --limit-cpu
to 6 has the effect of limiting the process to 50% of the CPU capacity. The reason I did this was at 100%, the nodes’ cooling fans are running at full blast. While these EGLOBAL S200 computers are pretty quiet, at 100% CPU utilization and 100% fan speed, things do get noticeably noisy. So, I limited CPU usage to 50% and the nodes run without the fans turning on.
You can read more about the BOINC client and operating it within Docker here.
Run a Nethack Server
If you can recollect the days of crawling a dungeon made up of ASCII characters, you might enjoy this one: you can easily deploy a Nethack server for anyone on your home network to play on. This really shows off the power of Docker in a fun way: it makes it easy for people to share whole software deployments. In this case, we can pull Matsuu Takuto’s image for a Nethack server and set it up to run on our swarm:
docker service create --replicas 1 --publish=23:23 --name nethack matsuu/nethack-server
Then to the play the game, simply telnet into port 23 on the cluster’s master node using its external IP address.
telnet 192.168.1.60 23
Best wishes on your ascension!