Genomics England
Dragen nodes
General info
- also known as "Edico" nodes
- no raid in Dragen nodes (only one disk)
- a local NVMe drive in each (used as cache for data)
- boot interface on
enp134s0f0
Dragen migration to P2 (Helix)
First, wait for GEL to physically move Dragen boxes from P1 to P2.
To proceed you will need the following:
- iDRAC IP addresses of nodes accessible from P2
- provisioning network interfaces connected to the
public304provisioning network - a confirmation of whether the node has to go to the dev or prod cluster
When the above requirements are fulfilled, you can add Dragen nodes to Ironic:
# openstack image list ... | 36a1c5fc-ff1c-40dc-9b87-75197e73257a | ironic-deploy_kernel | active | | e73198d9-605b-45a8-84cb-c828599e59ca | ironic-deploy_ramdisk | active | ... # openstack network list ... | ab24b469-e07d-44ca-8bee-5c24d6c455e4 | public304 | a00b07c1-0a1b-4a36-91cb-5e0d51ec9258 | ... # openstack baremetal node create --name edico01 --driver idrac --driver-info drac_address=10.6.6.44 --driver-info drac_username=<idrac-user> --driver-info drac_password=<idrac-password> --driver-info cleaning_network=ab24b469-e07d-44ca-8bee-5c24d6c455e4 --driver-info provisioning_network=ab24b469-e07d-44ca-8bee-5c24d6c455e4 --driver-info deploy_kernel=36a1c5fc-ff1c-40dc-9b87-75197e73257a --driver-info deploy_ramdisk=e73198d9-605b-45a8-84cb-c828599e59ca --resource-class baremetal --network-interface flat # openstack baremetal node list | grep edico # openstack baremetal port create 98:03:9b:8c:61:b2 --node ca6bd4f3-54f2-4877-8fba-db86f691a849 # openstack baremetal node manage edico01 # openstack baremetal node provide edico01
Then provision the newly added node with an operating system:
# openstack server create --image centos7-1907-dhcp-on-enp134s0f0-raid --flavor baremetal.small --security-group ping-and-ssh --key-name mykey --network public304 <t/p>hpgridzdragXXX
After provisioning, log into the instance as the centos user and add this public key to root's ~/.ssh/authorized_keys for passwordless SSH from the HPC controller:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBDEKTyKSRBpHcjgG16LF5mav11lEwbot1lmTPjvZPr6 cluster key
The next step is to run Dragen deployment using the Dragen role (https://gitlab.vscaler.com/mkarpiarz/ansible-dragen-role) and trinityX playbooks residing on the HPC controller (vcontroller) in the main environment.
For this, you will have to add the new baremetal instance to DNS first. From GEL's controller001 get into the HPC controller and run the following commands:
# ssh vc $ cd /opt/vScaler/site/ $ vim /etc/hosts
Here add the IP and the name of the new Dragen node in the "Dragen boxes" section.
$ vim hosts
Here add only the name of the node in the dragen_dev (dev cluster) or dragen (production cluster) group.
And then run this playbook to update the DNS server:
$ ansible-playbook controller.yml -t bind
Ping the new node from the controller using its name to confirm the new DNS entry.
With the node added to DNS, run the Dragen deployment with this command:
$ ansible-playbook dragen.yml
Then run the compute playbook to add mounts, slurm workers, etc on Dragen nodes:
$ ansible-playbook static-compute.yml -l dragen,dragen_dev
Add node to LSF
Next up, set up LSF on the node -- this is currently a manual procedure and there is no playbook for it:
TODO: Add commands for this.
Set up SSSD for AD
Finally, set up SSSD (also a manual step). Copy /etc/krb5.keytab, /etc/sssd/sssd.conf and /etc/pki/ca-trust/source/anchors/cluster-ca.crt from the HPC controller (or any compute node) and put them in the same location on the Dragen node. Then run (as root) the following commands:
# yum install sssd realmd -y # chmod 600 /etc/sssd/sssd.conf # update-ca-trust # chown root:root /etc/krb5.keytab # restorecon krb5.keytab # systemctl restart sssd # systemctl enable sssd