Genomics England

From Define Wiki
Revision as of 08:46, 2 July 2020 by Mariusz (talk | contribs) (→‎Dragen migration to P2 (Helix): Use a different name for Dragen Ironic nodes)
Jump to navigation Jump to search

Dragen nodes

General info

  • also known as "Edico" nodes
  • no raid in Dragen nodes (only one disk)
  • a local NVMe drive in each (used as cache for data)
  • boot interface on enp134s0f0


Dragen migration to P2 (Helix)

First, wait for GEL to physically move Dragen boxes from P1 to P2.

To proceed you will need the following:

  • iDRAC IP addresses of nodes accessible from P2
  • provisioning network interfaces connected to the public304 provisioning network
  • a confirmation of whether the node has to go to the dev or prod cluster

When you get the iDRAC address and credentials, log in to https://<idrac-address> and change the boot order so that PXE/network boot from a 10G card is first on the list. Also, write down the MAC address of the first PCI network interface (the one this node will be booting from).

When the rest of requirements are fulfilled, you can add Dragen nodes to Ironic:

# openstack image list
...
| 36a1c5fc-ff1c-40dc-9b87-75197e73257a | ironic-deploy_kernel                                | active |
| e73198d9-605b-45a8-84cb-c828599e59ca | ironic-deploy_ramdisk                               | active |
...
# openstack network list
...
| ab24b469-e07d-44ca-8bee-5c24d6c455e4 | public304                                          | a00b07c1-0a1b-4a36-91cb-5e0d51ec9258 |
...
# openstack baremetal node create --name edico-dragen016 --driver idrac --driver-info drac_address=10.6.6.44 --driver-info drac_username=<idrac-user> --driver-info drac_password=<idrac-password> --driver-info cleaning_network=ab24b469-e07d-44ca-8bee-5c24d6c455e4 --driver-info provisioning_network=ab24b469-e07d-44ca-8bee-5c24d6c455e4 --driver-info deploy_kernel=36a1c5fc-ff1c-40dc-9b87-75197e73257a --driver-info deploy_ramdisk=e73198d9-605b-45a8-84cb-c828599e59ca --resource-class baremetal --network-interface flat
# openstack baremetal node list | grep edico
| ca6bd4f3-54f2-4877-8fba-db86f691a849 | edico-dragen016 | None                                 | None        | enroll             | False       |
# openstack baremetal port create <10g-interface-mac> --node ca6bd4f3-54f2-4877-8fba-db86f691a849
# openstack baremetal node manage edico-dragen016
# openstack baremetal node provide edico-dragen016

Then provision the newly added node with an operating system:

# openstack server create --image centos7-1907-dhcp-on-enp134s0f0-raid --flavor baremetal.small --security-group ping-and-ssh --key-name mykey --network public304 <t/p>hpgridzdragXXX

After provisioning, log into the instance as the centos user and add this public key to root's ~/.ssh/authorized_keys for passwordless SSH from the HPC controller:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBDEKTyKSRBpHcjgG16LF5mav11lEwbot1lmTPjvZPr6 cluster key

The next step is to run Dragen deployment using the Dragen role (https://gitlab.vscaler.com/mkarpiarz/ansible-dragen-role) and trinityX playbooks residing on the HPC controller (vcontroller) in the main environment.

For this, you will have to add the new baremetal instance to DNS first. From GEL's controller001 get into the HPC controller and run the following commands:

# ssh vc
$ cd /opt/vScaler/site/
$ vim /etc/hosts

Here add the IP and the name of the new Dragen node in the "Dragen boxes" section.

$ vim hosts

Here add only the name of the node in the dragen_dev (dev cluster) or dragen (production cluster) group. And then run this playbook to update the DNS server:

$ ansible-playbook controller.yml -t bind

Ping the new node from the controller using its name to confirm the new DNS entry.

With the node added to DNS, run the Dragen deployment with this command:

$ ansible-playbook dragen.yml

Then run the compute playbook to add mounts, slurm workers, etc on Dragen nodes:

$ ansible-playbook static-compute.yml -l dragen,dragen_dev

Add node to LSF

Next up, set up LSF on the node -- this is currently a manual procedure and there is no playbook for it:

TODO: Add commands for this.

Set up SSSD for AD

Finally, set up SSSD (also a manual step). Copy /etc/krb5.keytab, /etc/sssd/sssd.conf and /etc/pki/ca-trust/source/anchors/cluster-ca.crt from the HPC controller (or any compute node) and put them in the same location on the Dragen node. Then run (as root) the following commands:

# yum install sssd realmd -y
# chmod 600 /etc/sssd/sssd.conf
# update-ca-trust
# chown root:root /etc/krb5.keytab
# restorecon krb5.keytab
# systemctl restart sssd
# systemctl enable sssd