Difference between revisions of "Iomart"
m (→Bartech) |
m (Add TOC) |
||
| (One intermediate revision by the same user not shown) | |||
| Line 1: | Line 1: | ||
| + | __TOC__ | ||
| + | |||
== Bartech == | == Bartech == | ||
=== Adding a compute node from scratch === | === Adding a compute node from scratch === | ||
| Line 39: | Line 41: | ||
You should see the volume being mounted and with files and directories in it: | You should see the volume being mounted and with files and directories in it: | ||
| − | <nowiki> | + | <nowiki> |
# mount | grep vscaler/home | # mount | grep vscaler/home | ||
| − | quobyte@192.168.100.201:7861|192.168.100.202:7861|192.168.100.203:7861 on /vscaler/home type fuse.quobyte (rw,nosuid,nodev,noatime,user_id=0,group_id=0,allow_other) | + | quobyte @ 192.168.100.201:7861|192.168.100.202:7861|192.168.100.203:7861 on /vscaler/home type fuse.quobyte (rw,nosuid,nodev,noatime,user_id=0,group_id=0,allow_other) |
# ls -lah /vscaler/home/ | # ls -lah /vscaler/home/ | ||
total 428G | total 428G | ||
Latest revision as of 12:10, 29 September 2020
Bartech
Adding a compute node from scratch
First add your node to Ironic and provision it with a regular centos7 image.
Copy public SSH key of the root user from the controller to /root/.ssh/authorized_keys on the node so Ansible can log into the node from the controller.
Install Quobyte client:
yum -y install java-1.8.0-openjdk-headless wget cd /etc/yum.repos.d/ wget https://packages.quobyte.com/repo/3/8acxjFCHCQ7YMvxKmNEzhYTQ1kr9xA2e/rpm/CentOS_7/quobyte.repo yum -y install quobyte-client
Grab /etc/quobyte/client.cfg from one of the existing compute nodes (log in to it though the controller) and copy the file to the new node.
Make sure the ib0 interface is set up with a static IP by editing/creating /etc/sysconfig/network-scripts/ifcfg-ib0 like the following:
DEVICE=ib0 BOOTPROTO=static IPADDR=192.168.100.<node-number> NETMASK=255.255.255.0 ONBOOT=yes NM_CONTROLLED=no
where <node-number> is the number in the name of the node, e.g. for node0004 this will be 4.
Bring the interface up by running ifup ib0
Create a mount point and mount tell Quobyte to mount the volume by restarting the client:
mkdir /vscaler/home systemctl restart quobyte-client.service systemctl enable quobyte-client.service
You should see the volume being mounted and with files and directories in it:
# mount | grep vscaler/home quobyte @ 192.168.100.201:7861|192.168.100.202:7861|192.168.100.203:7861 on /vscaler/home type fuse.quobyte (rw,nosuid,nodev,noatime,user_id=0,group_id=0,allow_other) # ls -lah /vscaler/home/ total 428G drwxr-xr-x. 1 root root 0 Nov 18 10:00 . drwxr-xr-x. 4 root root 32 Nov 15 16:41 .. drwx------. 1 nfsnobody nfsnobody 0 Nov 15 17:49 acaldas drwx------. 1 nfsnobody nfsnobody 0 Nov 15 17:49 ccairoli drwx------. 1 7872 7872 0 Aug 28 13:57 cfd_biosit drwx--x---. 1 nfsnobody nfsnobody 0 Nov 15 17:49 rems drwx------. 1 nfsnobody nfsnobody 0 Nov 18 09:51 rsupport drwxrwx---. 1 nfsnobody nfsnobody 0 Nov 7 15:59 shared-BARTech -rw-r--r--. 1 centos centos 416G Oct 15 01:47 shared-BARTech.tgz -rw-r--r--. 1 root root 10G Nov 6 09:16 testfile drwx------. 1 nfsnobody nfsnobody 0 Nov 15 17:48 tgratton
Log into the controller and add your node's IP with its appropriate nodeXXXX name to /etc/hosts. Then run deployment playbooks:
cd /opt/vScaler/site/ ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook controller.yml --skip-tags=luna -t bind ansible-playbook static-compute.yml -l nodeXXXX
where nodeXXXX is node name added to hosts.
When this is done, log back to the node and make sure OFED is installed. A specific version, for the specific node kernel, has to be installed. SCP the MLNX_OFED_LINUX-4.7-1.0.0.1-rhel7.6-x86_64.tgz from the /root on the head node to the compute node. SSH in, extract, cd into the new dir and run ./mlnxofedinstall and reboot
Umount the /vscaler/home share and restart the Quobyte client so it'll start using LDAP:
umount /vscaler/home/ systemctl restart quobyte-client.service
At this point you should see proper users (instead of nfsnobody) on files in this shared home directory, like so:
# ls -lah /vscaler/home/ total 428G drwxr-xr-x 1 root root 0 Nov 18 10:00 . drwxr-xr-x. 4 root root 32 Oct 11 14:15 .. drwx------ 1 acaldas BARTech 0 Nov 15 17:49 acaldas drwx------ 1 ccairoli BARTech 0 Nov 15 17:49 ccairoli drwx------ 1 7872 7872 0 Aug 28 13:57 cfd_biosit drwx--x--- 1 rems BARTech 0 Nov 15 17:49 rems drwx------ 1 rsupport admins 0 Nov 18 09:51 rsupport drwxrwx--- 1 rems BARTech 0 Nov 7 15:59 shared-BARTech -rw-r--r-- 1 centos centos 416G Oct 15 01:47 shared-BARTech.tgz -rw-r--r-- 1 root root 10G Nov 6 09:16 testfile drwx------ 1 tgratton BARTech 0 Nov 15 17:48 tgratton
Set up StarCCM+:
ln -s /vscaler/home/shared-BARTech/SIEMENS /opt/SIEMENS yum install redhat-lsb-core -y
Finally, add this line with IP of the license server to /etc/hosts on the node:
62.7.66.229 BARTech-lic
Testing
A job for testing OFED and RDMA:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 module add gnu8/8.3.0 openmpi3/3.1.4 imb/2018.1 mpirun IMB-MPI1 pingpong
TODO: Add a basic Slurm job checking if StarCCM+ is set up correctly.