OpenHPC:Flexboot Configuration
This page covers specific configuration needed when provisioning over FlexBoot with Warewulf.
General setup
The configuration in this sections applies to both ethernet and ib interfaces.
Provisioning Interface
Set the 'network device' entry in /etc/warewulf/provision.conf to the correct interface. For example if using infiniband and the interface on the headnode is ib0, set it as follows
network device = ib0Bootstrap
The Bootstrap image needs to contain any drivers needed for the compute node to recognise the Mellanox adapter. To ensure the necessary drivers are included, edit /etc/warewulf/bootstrap.conf and uncomment the Mellanox and Infiniband drivers.
[root@test-node ~]# cat /etc/warewulf/bootstrap.conf
# List of drivers that should be included in the built initramfs. This could
# be a list of relative paths starting from /lib/modules/`uname -r`/, or
# specific driver names that you want to integrate.
drivers += kernel/drivers/net/, kernel/drivers/scsi/, kernel/drivers/ata/
drivers += kernel/drivers/virtio/, kernel/drivers/message/, kernel/drivers/md/
drivers += kernel/drivers/block/, kernel/drivers/usb/host/, kernel/drivers/firmware/efi/
drivers += nfs, nfsd, nfs_common, nfsv4
drivers += fuse, ext2, ext3, ext4
drivers += fat, vfat, nls_cp437, nls_ascii, efivarfs
drivers += ipmi_si, ipmi_devintf
# Infiniband drivers and Mellanox drivers
drivers += ib_ipath, ib_iser, ib_srpt, ib_sdp, ib_mthca, ib_qib, iw_cxgb3, cxgb3
drivers += iw_nes, mlx4_ib, ib_srp, ib_ipoib, ib_addr, rdma_cm, ib_ucm
drivers += ib_ucm, ib_uverbs, ib_umad, ib_cm, ib_mad, iw_cm, ib_core
drivers += rdma_ucm, ib_sa, mlx4_en, mlx4_core,mlx5_core
drivers += rds, rds_rdma, rds_tcp, mlx4_vnic, mlx4_vnic_helper
drivers += ib_isert, xprtrdma
# Required for getting CPU frequency scaling to work right (SpeedStep/Turbo)
drivers += kernel/drivers/cpufreq/, kernel/arch/x86/kernel/cpu/cpufreq/
# What Firmware images should be included in the bootstrap image.
firmware += 3com/*, acenic/*, bnx2*, cxgb3/*, e100/*, myricom/*, ti_*
firmware += tigon/*
# Modules that should be called directly by /sbin/modprobe (including
# options if any exist). Modules will be loaded in the order given, before
# automatic hardware detection.
modprobe += uhci-hcd, ohci-hcd, ehci-hcd, whci-hcd, isp116x-hcd, isp1362-hcd
modprobe += xhci-hcd, sl811-hcd, sd_mod
modprobe += mlx4_core log_num_mtts=20 log_mtts_per_seg=6, ib_srp,mlx5_core
#OpenHPC additions for SLES12
drivers += af_packet, dns_resolver, auth_rpcgss, lockd, sunrpc
drivers += dm_mod, dcdbas, dell_rbu, ipmi_msghandler, shpchp, ehci_pci
drivers += wmi, processor, button, sg, igb, i2c_algo_bit, dca, ptp
drivers += pps_core, ahci, libahci, libata, scsi_mod
drivers += nfs_acl, nfsv3, nfsv4, coretemp
drivers += updates/kernel/KARGS
During the PXE process, the bootstrap kernel will attempt to run, but it might not load the Mellanox kernel modules so provisioning will fail. To force the kernel to load the modules set the KARGS variable. At this point in time I am not sure which modules exactly are needed so I just give them all.
wwsh provision set <node> --kargs="ib_addr ib_core ib_mad ib_sa ib_cm ib_uverbs ib_ucm ib_umad iw_cm rdma_cm rdma_ucm mlx4_core mlx4_ib ib_mthca ib_ipoib mlx5_core mlx5_ib"If booting over ethernet then the ib_* modules should not be needed, but there is no harm in including them anyway. From what I have been able to gather, just mlx4_core or mlx5_core is needed for ethernet.
DHCP
The default DHCP configuration provided by Warewulf won't work with FlexBoot. Despite FlexBoot being modified iPXE, the dhcp server won't recognise it as such and so will provide incorrect configuration. To get around this edit /etc/warewulf/dhcpd-template.conf to look like below
[root@test-node ~]# cat /etc/warewulf/dhcpd-template.conf
allow booting;
allow bootp;
ddns-update-style interim;
authoritative;
option space ipxe;
# Tell iPXE to not wait for ProxyDHCP requests to speed up boot.
option ipxe.no-pxedhcp code 176 = unsigned integer 8;
option ipxe.no-pxedhcp 1;
option architecture-type code 93 = unsigned integer 16;
filename "http://%{IPADDR}/WW/ipxe/cfg/${mac}";
subnet %{NETWORK} netmask %{NETMASK} {
not authoritative;
# option interface-mtu 9000;
option subnet-mask %{NETMASK};
}
# Node entries will follow belowNote that the only change made is commenting out the architecture detection and forcing dhcp to provide each node with 'filename "http://%{IPADDR}/WW/ipxe/cfg/${mac}";'
These files are in /srv/warewulf/ipxe/cfg' on the headnode and contain the PXE configuration for each node. Without this change FlexBoot is detecting incorrectly and tries to use undionly.kpxe which does not work, since FlexBoot does not support chainloading PXE (http://www.mellanox.com/related-docs/prod_software/FlexBoot-3_5_305_for_ConnectX4_ConnectX-5_release_notes.pdf see table 7)
Infiniband
There are some other changes that need to be made when booting over infiniband.
HWPREFIX and HWADDR
Infiniband devices have a 20 byte GUID which is comprised of a 12 byte prefix and 8 byte hardware address. In warewulf set the hwprefix on each node to 'ff:00:00:00:00:00:02:00:00:02:c9:00'. This prefix is hard coded in Flexboot and should be constant across all installations. Set the hwaddr to the last 8 bytes of the GUID.
[root@head ~]# wwsh node print compute01
#### compute01 ################################################################
compute01: ID = 14
compute01: NAME = compute01
compute01: NODENAME = compute01
compute01: ARCH = x86_64
compute01: CLUSTER = UNDEF
compute01: DOMAIN = UNDEF
compute01: GROUPS = UNDEF
compute01: ENABLED = TRUE
compute01: ib0.HWADDR = e4:1d:2d:03:00:01:09:91
compute01: ib0.HWPREFIX = ff:00:00:00:00:00:02:00:00:02:c9:00
compute01: ib0.IPADDR = 192.168.20.11
compute01: ib0.NETMASK = 255.255.255.0
compute01: ib0.NETWORK = UNDEF
compute01: ib0.GATEWAY = 192.168.20.1
compute01: ib0.MTU = UNDEF
compute01: ib0.FQDN = UNDEFWith these set update the pxe and dhcp configurations by running
wwsh pxe update
wwsh dhcp update
systemctl restart dhcpdTo confirm that the changes are applied correctly look at /etc/dhcp/dhcpd.conf. Each host entry show now contain an option called dhcp-client-identifier that is set to the full 20 Byte GUID.
host compute01-ib0 {
option host-name compute01;
option routers 192.168.20.1;
option dhcp-client-identifier = ff:00:00:00:00:00:02:00:00:02:c9:00:e4:1d:2d:03:00:01:09:91;
fixed-address 192.168.20.11;
next-server 192.168.20.1;
}Spoof 6 byte address configs
FlexBoot uses 8 byte addresses but PXE only recognises 6 byte addresses. To ensure that pxe can grab the correct configuration, create symbolic links in /srv/warewulf/ipxe/cfg with the 6 byte address that points to the config file for the 8 byte address. To convert from 8 byte addresses to 6 byte addresses, keep the first and last 3 bytes but remove the middle 2 bytes (most likely 03:00).
lrwxrwxrwx 1 root root 23 Mar 12 09:30 e4:1d:2d:01:09:91 -> e4:1d:2d:03:00:01:09:91A simple bash script to create these symbolic links
#!/bin/bash
PXE_CONF_DIR=/srv/warewulf/ipxe/cfg
macs=$(ls ${PXE_CONF_DIR})
for mac in ${macs[@]}; do
ln -s ${PXE_CONF_DIR}/$mac ${PXE_CONF_DIR}/${mac:0:9}${mac:15}
done