Difference between revisions of "Orch:Headnode install"

From Define Wiki
Jump to navigation Jump to search
Line 68: Line 68:
  
 
== Add genders ==
 
== Add genders ==
! This is not required for basic installation
+
This is not required for basic installation

Revision as of 15:42, 23 January 2017

In the following we assume the availability of a single head node master, some -at least one- compute nodes, and the Intel Orchestrator ISO file (if you don't have it, contact David). The master node is provisioned with Centos7.2, and serves as the overall system management server (SMS). In its role as an SMS, the master node is configured to provision the remaining compute in a stateless configuration using Warewulf.

We assume the ISO file is copied in a location like /tmp/Intel_HPC_Orchestrator-rhel7.2-16.01.004.ga.iso

Enable local Intel® HPC Orchestrator repository

On the head node, mount the image and enable Orchestrator as a local repository using the "hpc-orch-release" package rpm.

mkdir -p /mnt/hpc_orch_iso
mount -o loop /tmp/Intel-HPC-Orchestrator-rhel7.2-16.01.004.ga.iso /mnt/hpc_orch_iso ; echo $?
rpm -Uvh /mnt/hpc_orch_iso/x86_64/Intel_HPC_Orchestrator_release-*.x86_64.rpm
rpm --import /etc/pki/pgp/HPC-Orchestrator*.asc
rpm --import /etc/pki/pgp/PSXE-keyfile.asc

Add provisioning services to master node

With the Intel® HPC Orchestrator repository enabled, we proceed by adding the orch-base and Warewul provisioning packaage onto the master node.

yum -y groupinstall orch-base
yum -y groupinstall orch-warewulf

Provisioning services with Warewulf rely on DHCP, TFTP, and HTTP network protocols. Default firewall rules may prohibit these services. Therefore we will disable the firewall (Once installed it's highly recommended to re-enable it on the head node and configure it to only allow access on port 22 from the external interface, while still allowing traffic on the internal interfaces to the system)

rpm -q firewalld && systemctl disable firewalld
rpm -q firewalld && systemctl stop firewalld

Intel® HPC Orchestrator relies on synchronized clocks throughout the system and uses the NTP protocol to facilitate this synchronization. To enable NTP services on the head node with a specific server ${ntp_server}, issue the following on the heas node:

systemctl enable ntpd.service
# Disable default external servers
sed -i 's|^server|#server|' /etc/ntp.conf
echo "server ${ntp_server}" >> /etc/ntp.conf
echo "server 127.127.1.0 # local clock" >> /etc/ntp.conf
echo "fudge 127.127.1.0 stratum 10" >> /etc/ntp.conf
systemctl restart ntpd


Add resource management services to the master node

The following command adds the SLURM workload manager server components to the head node. Later on, client-side components will be added to the compute image.

yum -y groupinstall orch-slurm-server
# Add PDSH support to determine the nodelist of a Slurm job and run a command on those nodes
yum -y install pdsh-mod-slurm-orch

SLURM requires the designation of a system user that runs the underlying resource management daemons. The default configuration file that is supplied with the Intel® HPC Orchestrator build of SLURM identifies this SlurmUser to be a dedicated user named "slurm" and this user must exist.

getent passwd slurm || useradd slurm

SLURM can also be configured to control which local resource limits get propagated to a user's allocated resources by enabling SLURM's PAM support.

perl -pi -e "s|^#UsePAM=|UsePAM=1|" /etc/slurm/slurm.conf
cat <<- HERE > /etc/pam.d/slurm
      account required pam_unix.so
      account required pam_slurm.so
      auth required pam_localuser.so
      session required pam_limits.so
HERE

By default all resource limits are propagated from the session a user submitted a job from. With PAM support enabled configuration can be added to SLURM's configuration, e.g. adding PropagateResourceLimitsExcept=NOFILE will prevent the user's resource limit on open files from being set on their allocated nodes.

Add genders

This is not required for basic installation