Difference between revisions of "Iomart"

From Define Wiki
Jump to navigation Jump to search
m (Fix formatting for one of the listings)
m (Add TOC)
 
Line 1: Line 1:
 +
__TOC__
 +
 
== Bartech ==
 
== Bartech ==
 
=== Adding a compute node from scratch ===
 
=== Adding a compute node from scratch ===

Latest revision as of 12:10, 29 September 2020

Bartech

Adding a compute node from scratch

First add your node to Ironic and provision it with a regular centos7 image.

Copy public SSH key of the root user from the controller to /root/.ssh/authorized_keys on the node so Ansible can log into the node from the controller.

Install Quobyte client:

yum -y install java-1.8.0-openjdk-headless wget
cd /etc/yum.repos.d/
wget https://packages.quobyte.com/repo/3/8acxjFCHCQ7YMvxKmNEzhYTQ1kr9xA2e/rpm/CentOS_7/quobyte.repo
yum -y install quobyte-client

Grab /etc/quobyte/client.cfg from one of the existing compute nodes (log in to it though the controller) and copy the file to the new node.

Make sure the ib0 interface is set up with a static IP by editing/creating /etc/sysconfig/network-scripts/ifcfg-ib0 like the following:

DEVICE=ib0
BOOTPROTO=static
IPADDR=192.168.100.<node-number>
NETMASK=255.255.255.0
ONBOOT=yes
NM_CONTROLLED=no

where <node-number> is the number in the name of the node, e.g. for node0004 this will be 4.

Bring the interface up by running ifup ib0

Create a mount point and mount tell Quobyte to mount the volume by restarting the client:

mkdir /vscaler/home
systemctl restart quobyte-client.service
systemctl enable quobyte-client.service

You should see the volume being mounted and with files and directories in it:

# mount | grep vscaler/home
quobyte @ 192.168.100.201:7861|192.168.100.202:7861|192.168.100.203:7861 on /vscaler/home type fuse.quobyte (rw,nosuid,nodev,noatime,user_id=0,group_id=0,allow_other)
# ls -lah /vscaler/home/
total 428G
drwxr-xr-x. 1 root      root         0 Nov 18 10:00 .
drwxr-xr-x. 4 root      root        32 Nov 15 16:41 ..
drwx------. 1 nfsnobody nfsnobody    0 Nov 15 17:49 acaldas
drwx------. 1 nfsnobody nfsnobody    0 Nov 15 17:49 ccairoli
drwx------. 1      7872      7872    0 Aug 28 13:57 cfd_biosit
drwx--x---. 1 nfsnobody nfsnobody    0 Nov 15 17:49 rems
drwx------. 1 nfsnobody nfsnobody    0 Nov 18 09:51 rsupport
drwxrwx---. 1 nfsnobody nfsnobody    0 Nov  7 15:59 shared-BARTech
-rw-r--r--. 1 centos    centos    416G Oct 15 01:47 shared-BARTech.tgz
-rw-r--r--. 1 root      root       10G Nov  6 09:16 testfile
drwx------. 1 nfsnobody nfsnobody    0 Nov 15 17:48 tgratton

Log into the controller and add your node's IP with its appropriate nodeXXXX name to /etc/hosts. Then run deployment playbooks:

cd /opt/vScaler/site/
ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook controller.yml --skip-tags=luna -t bind
ansible-playbook static-compute.yml -l nodeXXXX

where nodeXXXX is node name added to hosts.

When this is done, log back to the node and make sure OFED is installed. A specific version, for the specific node kernel, has to be installed. SCP the MLNX_OFED_LINUX-4.7-1.0.0.1-rhel7.6-x86_64.tgz from the /root on the head node to the compute node. SSH in, extract, cd into the new dir and run ./mlnxofedinstall and reboot Umount the /vscaler/home share and restart the Quobyte client so it'll start using LDAP:

umount /vscaler/home/
systemctl restart quobyte-client.service

At this point you should see proper users (instead of nfsnobody) on files in this shared home directory, like so:

# ls -lah /vscaler/home/
total 428G
drwxr-xr-x  1 root     root       0 Nov 18 10:00 .
drwxr-xr-x. 4 root     root      32 Oct 11 14:15 ..
drwx------  1 acaldas  BARTech    0 Nov 15 17:49 acaldas
drwx------  1 ccairoli BARTech    0 Nov 15 17:49 ccairoli
drwx------  1     7872    7872    0 Aug 28 13:57 cfd_biosit
drwx--x---  1 rems     BARTech    0 Nov 15 17:49 rems
drwx------  1 rsupport admins     0 Nov 18 09:51 rsupport
drwxrwx---  1 rems     BARTech    0 Nov  7 15:59 shared-BARTech
-rw-r--r--  1 centos   centos  416G Oct 15 01:47 shared-BARTech.tgz
-rw-r--r--  1 root     root     10G Nov  6 09:16 testfile
drwx------  1 tgratton BARTech    0 Nov 15 17:48 tgratton

Set up StarCCM+:

ln -s /vscaler/home/shared-BARTech/SIEMENS /opt/SIEMENS
yum install redhat-lsb-core -y 

Finally, add this line with IP of the license server to /etc/hosts on the node:

62.7.66.229   BARTech-lic

Testing

A job for testing OFED and RDMA:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module add  gnu8/8.3.0  openmpi3/3.1.4 imb/2018.1
mpirun IMB-MPI1 pingpong

TODO: Add a basic Slurm job checking if StarCCM+ is set up correctly.