Difference between revisions of "OpenHPC:Headnode install"
(Prevent slow ssh into c0.) |
m (Use variable for gateway.) |
||
| Line 237: | Line 237: | ||
<syntaxhighlight> | <syntaxhighlight> | ||
| − | # Gateway is the private IP of headnode. | + | # Gateway, -G, is the private IP of headnode. |
| − | wwsh node new c0 --ipaddr=some.ip.address.here -M 255.255.255.0 -G | + | wwsh node new c0 --ipaddr=some.ip.address.here -M 255.255.255.0 -G ${sms_ip} --domain=ohpc.net --hwaddr=some.mac.address.here -D eno1 |
echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$ | echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$ | ||
wwsh -y file import /tmp/network.$$ --name network | wwsh -y file import /tmp/network.$$ --name network | ||
Revision as of 09:36, 7 October 2016
Basic Initial System Configuration:
Prep: Make a note of MAC addresses of interfaces of computenodes. MAC address stickers were found facing the front of the drawer which was misleading in terms of the left-right order of the interfaces. In this particular case, the stickers should have been facing the rear of the drawer. Conclusion: the MAC addresses may be the reverse of what you expect if the compute node is unable to connect to the head node.
Note; RPMS/repos; http://build.openhpc.community/OpenHPC:/
OpenHPC is designed to deploy CentOS 7.x based clusters, please install a fresh copy of CentOS 7.x onto a system. Please modify the partitioning and make the “/” partition a reasonable size.
service NetworkManager stop
service iptables stop
chkconfig NetworkManager off
chkconfig iptables off
chkconfig firewalld off
setenforce 0
vi /etc/selinux/configModify the system Hostname to use a fully qualified domain name, also modify the network interfaces to have 1x Private interface and 1x Public interface, eno1 must be the private (Provisioning) interface and eno2 can be the public interface. There are alternate ways to specify what interface to use during provision, however with CentOS 7.1 I have been unable to find them.
echo “head.ohpc.net” > /etc/hostname
hostnamectl set-hostname head.ohpc.net
# Private interface must take a "static" (not "dhcp") address IPADDR that will be used later. NETMASK=255.255.255.0 should also be set.
vi /etc/sysconfig/network-scripts/ifcfg-eno1
vi /etc/sysconfig/network-scripts/ifcfg-eno2Yum update the system to the latest package versions, install additional packages and reboot.
yum -y install kernel* tk* tcl* tigervnc* ipmitool* freeipmi* cairo* perl* gcc* glibc* screen epel-release vim ntp libnl lsof libxml2-python python mlocate numactl* yum-utils stop xinitd
yum -y groupinstall "Development Tools" "X Windows System" “Base”
yum -y updateAdd the following alias into your bashrc for ease as you will be typing them frequently.
CHROOT=/opt/ohpc/admin/images/centos7.1
ohpc_repo=http://build.openhpc.community/OpenHPC:/1.1/CentOS_7.2/OpenHPC:1.1.repo
sms_name=head.ohpc.net # Hostanem of Headnode
sms_ip=10.10.10.1 # Private Interface IP of Headnode
sms_eth_internal=eno1 # Private Interface of Headnode
eth_provision=eno1 # Provisioning Interface of Headnode
internal_netmask=255.255.255.0 # Netmaks of Private Interface
ntp_server=0.centos.pool.ntp.org # Some NTP Server
bmc_username=ADMIN
bmc_password=ADMIN
sms_ipoib=10.10.20.1 # IPoIB Address of Headnode
ipoib_netmask=255.255.255.0 # IPoIB Netmask of Headnode
source /root/.bashrc
wget -P /etc/yum.repos.d ${ohpc_repo}
yum clean allSetup Network Time Protocol
service ntp stop
ntpdate 0.centos.pool.ntp.org
vi /etc/ntp.conf #Modify with your ntp server
service ntp restart
chkconfig ntp onInstalling and Patching the OpenHPC Base components
Basic OpenHPC Component install and patching to make it work correctly with Grub2 (The patching component of this step is for stateful provisioning only) If deploying systems as only RAM disks this process is not necessary.
yum groupinstall ohpc-base ohpc-warewulf
yum -y groupinstall ohpc-slurm-server
useradd slurm
== Modify Warewulf core Configuration Files to provision Correctly ==
'''Modify warewulf provision.conf and bootstrap.conf to correctly include the correct kernel modules and configuration'''
<syntaxhighlight>
vi /etc/warewulf/vnfs.conf #Ensure exclude looks like this.
exclude += /tmp/*
exclude += /var/log/*
exclude += /var/chroots/*
#exclude += /var/cache
exclude += /usr/src
#exclude += /usr/share
#exclude += /home/*vi /etc/warewulf/bootstrap.conf #Hash out all Infiniband drivers
# Infiniband drivers and Mellanox drivers
#drivers += ib_ipath, ib_iser, ib_srpt, ib_sdp, ib_mthca, ib_qib, iw_cxgb3, cxgb3
#drivers += iw_nes, mlx4_ib, ib_srp, ib_ipoib, ib_addr, rdma_cm, ib_ucm
#drivers += ib_ucm, ib_uverbs, ib_umad, ib_cm, ib_mad, iw_cm, ib_core
#drivers += rdma_ucm, ib_sa, mlx4_en, mlx4_core
#drivers += rds, rds_rdma, rds_tcp, mlx4_vnic, mlx4_vnic_helper
#Unhash the modprobe for the Mellanox Modules
modprobe += mlx4_core log_num_mtts=20 log_mtts_per_seg=6, ib_srpModify some Warewulf provisioning files to use the correct interfaces and some general Warewulf files to allow provisioning to work.
perl -pi -e "s/device = eth1/device = ${sms_eth_internal}/" /etc/warewulf/provision.conf
perl -pi -e "s/^\s+disable\s+= yes/ disable = no/" /etc/xinetd.d/tftp
export MODFILE=/etc/httpd/conf.d/warewulf-httpd.conf
perl -pi -e "s/cgi-bin>\$/cgi-bin>\n Require all granted/" $MODFILE
perl -pi -e "s/Allow from all/Require all granted/" $MODFILE
perl -ni -e "print unless /^\s+Order allow,deny/" $MODFILE
perl -pi -e "s/ControlMachine=\S+/ControlMachine=head.ohpc.net/" /etc/slurm/slurm.confsystemctl restart xinetd # This failed
systemctl enable mariadb.service
systemctl restart mariadb
systemctl enable httpd.service
systemctl restart httpd.service
systemctl restart rpcbind.service
systemctl enable rpcbind.service
systemctl restart nfs-server.service
systemctl enable nfs-server.serviceBuild and Configure the Chroot
Make Initial VNFS (Chroot, compute Node template) and install some Base components into the chroot operating system
wwmkchroot centos-7 $CHROOT
yum -y --installroot=$CHROOT groupinstall Base
yum -y --installroot=$CHROOT install kernel* grub* sudo ipmitool* epel-release htop nano tk* tcl* tigervnc* ipmitool* freeipmi* cairo* perl* gcc* glibc* screen yum-utils vim ntp libnl
lsof libxml2-python python mlocate numactl* lmod-ohpc ohpc-slurm-client lmod-ohpc ganglia-gmond-ohpc enviroment-modules hwlock-libs libfabric libpsm2 intel-clck-ohpc
## Our X7 1U Twin hardware seems to have issues with Mellanox OFED, OpenIB seems to be the way to go...
yum -y --installroot=$CHROOT install openib ibutils infiniband-diagsSetup SSH Keys for the Cluster – this is required for the root user only, /home will be exported so user ssh keys will be available.
wwinit ssh_keys
cat ~/.ssh/cluster.pub >> $CHROOT/root/.ssh/authorized_keys
# Comment out GSSAPI lines on head and in CHROOT.
sed -i 's/^\(GSSAPI.\)/#\1/g' {,${CHROOT}}/etc/ssh/sshd_config
# Don't use DNS on head or CHROOT.
sed -i 's/#UseDNS yes/UseDNS no/' {,${CHROOT}}/etc/ssh/sshd_configSetup NFS exports and FSTAB on the compute image
echo "${sms_ip}:/home /home nfs nfsvers=3,rsize=1024,wsize=1024,cto 0 0" >> $CHROOT/etc/fstab
echo "${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,rsize=1024,wsize=1024,cto 0 0" >> $CHROOT/etc/fstab
echo "/home *(rw,no_subtree_check,fsid=10,no_root_squash)" >> /etc/exports
echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports
exportfs -a
systemctl restart rpcbind
systemctl restart nfs-server.serviceCopy over resolv.conf to the chroot and modify the contents to point to the headnode and google dns.
cp /etc/resolv.conf $CHROOT/etc/
vi $CHROOT/etc/resolv.confModify Limits to unlimited on headnode and compute nodes..
echo "* soft memlock unlimited" >> /etc/security/limits.conf
echo "* hard memlock unlimited" >> /etc/security/limits.conf
echo "* soft memlock unlimited" >> $CHROOT/etc/security/limits.conf
echo "* hard memlock unlimited" >> $CHROOT/etc/security/limits.confImport warewulf files to the database, theses will be synced to compute nodes all of the time.
wwsh file import /etc/passwd
wwsh file import /etc/shadow
wwsh file import /etc/group
wwsh file import /etc/slurm/slurm.conf
wwsh file import /etc/munge/munge.key
wwsh file import /opt/ohpc/pub/examples/network/centos/ifcfg-ib0.ww
wwsh -y file set ifcfg-ib0.ww --path=/etc/sysconfig/network-scripts/ifcfg-ib0Building the bootstrap and vnfs images
# -T may need to be removed from head of wwbootstrap and wwvnfs scripts.
wwbootstrap 3.10.0-229.20.1.el7.x86_64
wwvnfs -y --chroot $CHROOTNB: This will need to be updated depending on the kernel version being used. For example:
[root@head setup-filesystems]# wwbootstrap 3.10.0-327.10.1.el7.x86_64
Number of drivers included in bootstrap: 433
Number of firmware images included in bootstrap: 93
Building and compressing bootstrap
Integrating the Warewulf bootstrap: 3.10.0-327.10.1.el7.x86_64
Including capability: provision-adhoc
Including capability: provision-files
Including capability: provision-selinux
Including capability: provision-vnfs
Including capability: setup-filesystems
Including capability: transport-http
Compressing the initramfs
Locating the kernel object
Bootstrap image '3.10.0-327.10.1.el7.x86_64' is ready
Done.
[root@head setup-filesystems]# wwvnfs -y --chroot $CHROOT
Using 'centos7.1' as the VNFS name
Creating VNFS image from centos7.1
Building new chroot...
Building and compressing the final image
Cleaning temporary files
VNFS 'centos7.1' has been imported
Done.
Wrote a new configuration file at: /etc/warewulf/vnfs/centos7.1.confCompute Node Configuration for Stateful Provisioning
Adding the compute nodes to the database with the correct parameters to allow for statefull provisioning
# Gateway, -G, is the private IP of headnode.
wwsh node new c0 --ipaddr=some.ip.address.here -M 255.255.255.0 -G ${sms_ip} --domain=ohpc.net --hwaddr=some.mac.address.here -D eno1
echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$
wwsh -y file import /tmp/network.$$ --name network
wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0
wwsh -y provision set c0 --files=dynamic_hosts,passwd,group,shadow,slurm.conf,munge.key,network
wwsh -y provision set c0 --vnfs=centos7.1 --bootstrap=kernel.version.hereSetup Bootloader and Partitions
wwsh -y object modify -s bootloader=sda c0
wwsh -y object modify -s diskpartition=sda c0
wwsh -y object modify -s diskformat=sda1,sda2,sda3 c0
wwsh -y object modify -s filesystems="mountpoint=/boot:dev=sda1:type=ext4:size=500,dev=sda2:type=swap:size=32768,mountpoint=/:dev=sda3:type=ext4:size=fill" c0systemctl restart dhcpd
wwsh pxe update
wwsh dhcp update
wwsh node list # Should show all the nodes you just added