Setup HPC HA (Platform HPC 3.0)
Jump to navigation
Jump to search
System Setup
- Detailed instructions are on the platform_hpl_install.pdf file
- Only tested by enabling HA after system installation
In short:
- Install a cluster OS
- Install Platform HPC on the OS (dont enable HA)
- Install a cluster NFS server
- Install the failover nodes
- Setup HA
Install OS
- Note:
- System must have FQDN
- Configure both eth0 and eth1
- Stick with installation defaults (didnt add server packages)
- Stop IPTables (service iptables stop)
- Disable SE Linux (on first boot)
Suggested Partition Table
Size Mounted on
50G / # space for /opt
6G /var
50G /depot # space for repos and updates
REST /data # local scratch on all nodesInstall Platform HPC
- Copy over the hpc30-xxx.iso file, the OS-DVD.iso and license file
mount ᆳ-o loop hpc30-1234.rhel.iso /mnt
/mnt/pcm-installer- Questions are fairly straight forward, NOTE Say no to HA at this stage
- reboot headnode after installation
Add NFS Server Group
- Use combination of ngedit / netedit (if additional networking required)
- CFM to create the exports file
# file: /etc/cfm/nfs-centos-5.6-x86_64/etc/export
/data/home 172.20.0.0/255.255.0.0(rw,async,no_root_squash)
/data/app 172.20.0.0/255.255.0.0(rw,async,no_root_squash)- Post script to enable nfs on boot
chkconfig --level 345 nfs onEnable HA on headnode
- Install hpc-ha-1.0.2.rpm on headnode
hpc-ha-tool setup
# provide the necessary details:
# virtual external IP address
# virtual internal IP address
# NFS home directory location, e.g. nas000:/data/home
# NFS application directory location, e.g. nas000:/data/app
#
# See example below:
[root@hpcha1 ~]# hpc-ha-tool setup
Do you wish to enable HPC HA (y/n) [n] y
Please input virtual IP address for network 172.28.0.0/255.255.0.0
172.28.10.67
Please input virtual IP address for network 172.20.0.0/255.255.0.0
172.20.0.5
Please input a NFS path for setting up /home directory:
nas000:/data/home
Please input a NFS path for setting up APP directory:
nas000:/data/app
Generating configuration files...
succeed!
Syncing up configurations to HPC nodes...
done!- Install failover headnode using addhost (Option should now be present for failover node)
- Apply license to the failover headnode (installer001 default name)
- Then turn on HA and test it
# turn auto HA on
kusu-failmode -m auto
# Also had to setup LSF for failover
/etc/rc.kusu.d/S11lsf-genconfigVerify HA is working
[root@pcmha ~]# kusu-failinfo
Installer node is currently set to: pcmha [Online]
Failover node is currently set to: installer001 [Online]
Failover mode is currently set to: Auto
KusuInstaller services currently running on: pcmha[root@pcmha ~]# hpc-ha-tool status
Testing whether HPC HA enabled ... ok
Testing HPC HA configures ... ok
Testing failover backup node ... ok
Testing heartbeat status ... ok
Testing pacemaker status ... ok
Testing HPC database ... ok
Testing float IP addresses ... ok
Testing NFS mount points ... ok
Testing failover mode ... ok
Testing Kusu resource status ... ok
Testing isf-ac daemon status ... ok
Testing LSF daemon status ... ok
HPC HA is ready.Test HA is working
kusu-failtoProblems!
- IPs have to be above headnode IP
- Disk partitions seems a bit funky for the failover node
- kusu-failto reported failure:
[root@pcmha ~]# kusu-failto
Are you sure you wish to failover from node 'pcmha' to node 'installer001'? [<y/N>]: y
Installer Services running on 'pcmha'
Syncing and configuring database...
Starting kusu. This may take a while...
Starting initial network configuration [ OK ]
Generating hosts, hosts.equiv, and resolv.conf [ OK ]
Config mail mechanism for kusu [ OK ]
Setting up SSH host file [ OK ]
Setting up user skel files [ OK ]
Setting up network routes [ OK ]
Setting up syslog on PCM installer [ OK ]
Running S11lsf-genconfig [ OK ]
Increasing ulimit memlock [ OK ]
Setting npm service for HPC HA [ OK ]
Running S70SetupPCMGUI.sh [ OK ]
Post actions when failover [ OK ]
Setting up fstab for home directories [ OK ]
Synchronizing System configuration files [FAILED]
Checking compatibility of OFED Kernel module [ OK ]
Starting initial configuration procedure [ OK ]
Installer Services now running on 'installer001'
[root@installer001 ~]# bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
compute000 closed - 1 0 0 0 0 0
installer001 ok - 12 0 0 0 0 0
pcmha closed - 1 0 0 0 0 0
[root@pcmha kusu]# bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
compute000 ok - 12 0 0 0 0 0
installer001 closed - 1 0 0 0 0 0
pcmha ok - 12 0 0 0 0 0
==> kusu-nodeheartbeatd.log <==
2011-07-13 16:19:19 ERROR Failed to report 'run' operation state to installer.