Please enable Javascript in your browser.
NOW LOADING

Setup Slurm for Plexus

Setup Slurm for Plexus

This article describes the requirements to configure a Slurm cluster on a Plexus instance using the CentOS 7 operating system.

Libraries

sudo yum -y groupinstall "Development tools"
sudo yum -y install wget openssl-devel libuuid-devel cryptsetup-devel yum-utils device-mapper-persistent-data lvm2 nfs-utils bind-utils


Selinux

Disable the Selinux configuration:

sudo sed -i "s/enforcing/disabled/g" /etc/selinux/config


Go

Download and Install the latest version of Go (necessary to work with Singularity)

sudo wget https://dl.google.com/go/go1.13.8.linux-amd64.tar.gz -O /tmp/go-latest.tar.gz
sudo tar -C /usr/local -xzf /tmp/go-latest.tar.gz
rm -f /tmp/go-latest.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >>/home/${USER}/.bashrc
echo 'export PATH=$PATH:/usr/local/go/bin' >>/root/.bashrc


Singularity

Download and install Singularity, version 3.5.2n https://sylabs.io/guides/3.5/user-guide/.

wget https://github.com/sylabs/singularity/releases/download/v3.5.2/singularity-3.5.2.tar.gz -O /opt/singularity-3.5.2.tar.gz && tar -C /opt/ -xzvf /opt/singularity-3.5.2.tar.gz && rm /opt/singularity-3.5.tar.gz -f
cd /opt/singularity
sudo ln -s /usr/local/go/bin/* /bin/
sudo /opt/singularity/mconfig && sudo make -C builddir && sudo make -C builddir install
sudo ln -s /usr/local/bin/singularity /usr/bin/singularity


Slurm / Munge

we use for clusters the 20-02 version See the Slurm Workload Manager documentation for more details: https://slurm.schedmd.com/quickstart.html.

sudo yum -y install munge munge-libs munge-devel rng-tools python3 perl-devel readline-devel pam-devel mariadb-server mariadb-devel perl-Switch
sudo systemctl enable mariadb
sudo systemctl start mariadb
MUNGEUSER=997
sudo groupadd -g ${MUNGEUSER} munge
sudo useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u ${MUNGEUSER} -g munge -s /sbin/nologin munge
SLURMUSER=992
sudo groupadd -g ${SLURMUSER} slurm
sudo useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u ${SLURMUSER} -g slurm -s /bin/bash slurm
sudo rngd -r /dev/urandom
sudo /usr/sbin/create-munge-key -r
sudo dd if=/dev/urandom bs=1 count=1024 >/etc/munge/munge.key
sudo chown munge: /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo chown -R munge: /etc/munge/ /var/log/munge/
sudo chmod 0700 /etc/munge/ /var/log/munge/
sudo systemctl enable munge
sudo systemctl start munge


Slurm packages
cd /opt
sudo wget https://download.schedmd.com/slurm/slurm-20.02.0.tar.bz2
sudo tar --bzip -x -f slurm-20.02.0.tar.bz2
sudo rpmbuild -ta slurm-20.02.0.tar.bz2
sudo mv /root/rpmbuild /opt
sudo yum -y --nogpgcheck localinstall /opt/rpmbuild/RPMS/x86_64/*
sudo mkdir /var/spool/slurmctld
sudo chown slurm: /var/spool/slurmctld
sudo chmod 755 /var/spool/slurmctld
sudo touch /var/log/slurmctld.log
sudo chown slurm: /var/log/slurmctld.log
sudo touch /var/log/slurm_jobacct.log /var/log/slurm_jobcomp.log
sudo chown slurm: /var/log/slurm_jobacct.log /var/log/slurm_jobcomp.log


OpenMPI Libraries

Download the OpenMPI Libraries from the GithHub repo at: alfredo-api/api/adapters/rm/scripts/openmpi-install.sh, and install it with the Slurm flag.

sudo sh /opt/openmpi-install.sh --with-slurm

Nvidia libraries

Install Nvidia libraries to enable Cuda support

sudo yum -y install kernel-headers kernel-devel --disablerepo=updates
sudo yum -y install gcc
sudo wget -q http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-rhel7-10-2-local-10.2.89-440.33.01-1.0-1.x86_64.rpm
sudo rpm -i cuda-repo-rhel7-10-2-local-10.2.89-440.33.01-1.0-1.x86_64.rpm
sudo yum clean all
sudo yum -y install nvidia-driver-latest-dkms cuda
sudo yum -y install cuda-drivers
sudo rm -f cuda-repo-rhel7-10-2-local-10.2.89-440.33.01-1.0-1.x86_64.rpm


Docker

Install docker and add nvidia support

sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum -y install docker-ce
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
sudo curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-container-toolkit
sudo yum -y install nvidia-docker2
sudo systemctl enable docker
sudo systemctl restart docker
sudo groupadd docker
sudo usermod -aG docker ${USER}


Share the home directory

To share the home directory folder between master and worker nodes, create an NFS disk or export the home directory from master to compute nodes. Click the following link

https://www.howtoforge.com/tutorial/setting-up-an-nfs-server-and-client-on-centos-7/