<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://wiki.define-technology.com/mediawiki-1.35.0/index.php?action=history&amp;feed=atom&amp;title=Setup_HPC_HA_%28Platform_HPC_3.0%29</id>
	<title>Setup HPC HA (Platform HPC 3.0) - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://wiki.define-technology.com/mediawiki-1.35.0/index.php?action=history&amp;feed=atom&amp;title=Setup_HPC_HA_%28Platform_HPC_3.0%29"/>
	<link rel="alternate" type="text/html" href="http://wiki.define-technology.com/mediawiki-1.35.0/index.php?title=Setup_HPC_HA_(Platform_HPC_3.0)&amp;action=history"/>
	<updated>2026-05-04T18:52:58Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.0</generator>
	<entry>
		<id>http://wiki.define-technology.com/mediawiki-1.35.0/index.php?title=Setup_HPC_HA_(Platform_HPC_3.0)&amp;diff=1366&amp;oldid=prev</id>
		<title>Michael: Created page with &quot;===== System Setup ===== * Detailed instructions are on the platform_hpl_install.pdf file * Only tested by enabling HA after system installation  In short: # Install a cluster...&quot;</title>
		<link rel="alternate" type="text/html" href="http://wiki.define-technology.com/mediawiki-1.35.0/index.php?title=Setup_HPC_HA_(Platform_HPC_3.0)&amp;diff=1366&amp;oldid=prev"/>
		<updated>2012-12-05T16:47:37Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;===== System Setup ===== * Detailed instructions are on the platform_hpl_install.pdf file * Only tested by enabling HA after system installation  In short: # Install a cluster...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;===== System Setup =====&lt;br /&gt;
* Detailed instructions are on the platform_hpl_install.pdf file&lt;br /&gt;
* Only tested by enabling HA after system installation&lt;br /&gt;
&lt;br /&gt;
In short:&lt;br /&gt;
# Install a cluster OS&lt;br /&gt;
# Install Platform HPC on the OS (dont enable HA)&lt;br /&gt;
# Install a cluster NFS server&lt;br /&gt;
# Install the failover nodes&lt;br /&gt;
# Setup HA&lt;br /&gt;
&lt;br /&gt;
===== Install OS =====&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Note&amp;#039;&amp;#039;&amp;#039;:&lt;br /&gt;
** System must have FQDN&lt;br /&gt;
** Configure both eth0 and eth1&lt;br /&gt;
** Stick with installation defaults (didnt add server packages) &lt;br /&gt;
** Stop IPTables (service iptables stop)&lt;br /&gt;
** Disable SE Linux (on first boot)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Suggested Partition Table&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
Size  Mounted on&lt;br /&gt;
50G   /           # space for /opt&lt;br /&gt;
6G    /var&lt;br /&gt;
50G   /depot      # space for repos and updates&lt;br /&gt;
REST  /data       # local scratch on all nodes&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== Install Platform HPC =====&lt;br /&gt;
* Copy over the hpc30-xxx.iso file, the OS-DVD.iso and license file&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
mount ﾭ-o loop hpc30-1234.rhel.iso /mnt&lt;br /&gt;
/mnt/pcm-installer&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
* Questions are fairly straight forward, &amp;#039;&amp;#039;&amp;#039;NOTE&amp;#039;&amp;#039;&amp;#039; Say no to HA at this stage&lt;br /&gt;
* reboot headnode after installation&lt;br /&gt;
&lt;br /&gt;
===== Add NFS Server Group =====&lt;br /&gt;
* Use combination of ngedit / netedit (if additional networking required)&lt;br /&gt;
* CFM to create the exports file&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# file: /etc/cfm/nfs-centos-5.6-x86_64/etc/export&lt;br /&gt;
/data/home 172.20.0.0/255.255.0.0(rw,async,no_root_squash)&lt;br /&gt;
/data/app 172.20.0.0/255.255.0.0(rw,async,no_root_squash)&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Post script to enable nfs on boot&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
chkconfig --level 345 nfs on&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== Enable HA on headnode =====&lt;br /&gt;
* Install hpc-ha-1.0.2.rpm on headnode&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
hpc-ha-tool setup &lt;br /&gt;
# provide the necessary details: &lt;br /&gt;
#   virtual external IP address&lt;br /&gt;
#   virtual internal IP address&lt;br /&gt;
#   NFS home directory location, e.g. nas000:/data/home&lt;br /&gt;
#   NFS application directory location, e.g. nas000:/data/app&lt;br /&gt;
#&lt;br /&gt;
# See example below:&lt;br /&gt;
[root@hpcha1 ~]# hpc-ha-tool setup&lt;br /&gt;
Do you wish to enable HPC HA (y/n) [n] y&lt;br /&gt;
Please input virtual IP address for network 172.28.0.0/255.255.0.0&lt;br /&gt;
    172.28.10.67&lt;br /&gt;
Please input virtual IP address for network 172.20.0.0/255.255.0.0&lt;br /&gt;
    172.20.0.5&lt;br /&gt;
Please input a NFS path for setting up /home directory:&lt;br /&gt;
    nas000:/data/home&lt;br /&gt;
Please input a NFS path for setting up APP directory:&lt;br /&gt;
    nas000:/data/app&lt;br /&gt;
&lt;br /&gt;
Generating configuration files...&lt;br /&gt;
succeed!&lt;br /&gt;
&lt;br /&gt;
Syncing up configurations to HPC nodes...&lt;br /&gt;
done!&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Install failover headnode using &amp;#039;&amp;#039;&amp;#039;addhost&amp;#039;&amp;#039;&amp;#039; (Option should now be present for failover node)&lt;br /&gt;
* Apply license to the failover headnode (installer001 default name)&lt;br /&gt;
* Then turn on HA and test it&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# turn auto HA on&lt;br /&gt;
kusu-failmode -m auto&lt;br /&gt;
&lt;br /&gt;
# Also had to setup LSF for failover&lt;br /&gt;
/etc/rc.kusu.d/S11lsf-genconfig&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== Verify HA is working =====&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
[root@pcmha ~]# kusu-failinfo &lt;br /&gt;
Installer node is currently set to: pcmha [Online]&lt;br /&gt;
Failover node is currently set to: installer001 [Online]&lt;br /&gt;
Failover mode is currently set to: Auto&lt;br /&gt;
KusuInstaller services currently running on: pcmha&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
[root@pcmha ~]# hpc-ha-tool status&lt;br /&gt;
Testing whether HPC HA enabled ... ok&lt;br /&gt;
Testing HPC HA configures ... ok&lt;br /&gt;
Testing failover backup node ... ok&lt;br /&gt;
Testing heartbeat status ... ok&lt;br /&gt;
Testing pacemaker status ... ok&lt;br /&gt;
Testing HPC database ... ok&lt;br /&gt;
Testing float IP addresses ... ok&lt;br /&gt;
Testing NFS mount points ... ok&lt;br /&gt;
Testing failover mode ... ok&lt;br /&gt;
Testing Kusu resource status ... ok&lt;br /&gt;
Testing isf-ac daemon status ... ok&lt;br /&gt;
Testing LSF daemon status ... ok&lt;br /&gt;
&lt;br /&gt;
HPC HA is ready.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== Test HA is working =====&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
kusu-failto&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== Problems! =====&lt;br /&gt;
* IPs have to be above headnode IP&lt;br /&gt;
* Disk partitions seems a bit funky for the failover node&lt;br /&gt;
* kusu-failto reported failure:&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
[root@pcmha ~]# kusu-failto   &lt;br /&gt;
Are you sure you wish to failover from node &amp;#039;pcmha&amp;#039; to node &amp;#039;installer001&amp;#039;? [&amp;lt;y/N&amp;gt;]: y&lt;br /&gt;
Installer Services running on &amp;#039;pcmha&amp;#039;&lt;br /&gt;
Syncing and configuring database...&lt;br /&gt;
Starting kusu. This may take a while...&lt;br /&gt;
   Starting initial network configuration                  [  OK  ] &lt;br /&gt;
   Generating hosts, hosts.equiv, and resolv.conf          [  OK  ] &lt;br /&gt;
   Config mail mechanism for kusu                          [  OK  ] &lt;br /&gt;
   Setting up SSH host file                                [  OK  ] &lt;br /&gt;
   Setting up user skel files                              [  OK  ] &lt;br /&gt;
   Setting up network routes                               [  OK  ] &lt;br /&gt;
   Setting up syslog on PCM installer                      [  OK  ] &lt;br /&gt;
   Running S11lsf-genconfig                                [  OK  ] &lt;br /&gt;
   Increasing ulimit memlock                               [  OK  ] &lt;br /&gt;
   Setting npm service for HPC HA                          [  OK  ] &lt;br /&gt;
   Running S70SetupPCMGUI.sh                               [  OK  ] &lt;br /&gt;
   Post actions when failover                              [  OK  ] &lt;br /&gt;
   Setting up fstab for home directories                   [  OK  ] &lt;br /&gt;
   Synchronizing System configuration files                [FAILED] &lt;br /&gt;
   Checking compatibility of OFED Kernel module            [  OK  ] &lt;br /&gt;
   Starting initial configuration procedure                [  OK  ] &lt;br /&gt;
&lt;br /&gt;
Installer Services now running on &amp;#039;installer001&amp;#039;&lt;br /&gt;
&lt;br /&gt;
[root@installer001 ~]# bhosts &lt;br /&gt;
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV &lt;br /&gt;
compute000         closed          -      1      0      0      0      0      0&lt;br /&gt;
installer001       ok              -     12      0      0      0      0      0&lt;br /&gt;
pcmha              closed          -      1      0      0      0      0      0&lt;br /&gt;
&lt;br /&gt;
[root@pcmha kusu]# bhosts &lt;br /&gt;
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV &lt;br /&gt;
compute000         ok              -     12      0      0      0      0      0&lt;br /&gt;
installer001       closed          -      1      0      0      0      0      0&lt;br /&gt;
pcmha              ok              -     12      0      0      0      0      0&lt;br /&gt;
&lt;br /&gt;
==&amp;gt; kusu-nodeheartbeatd.log &amp;lt;==&lt;br /&gt;
2011-07-13 16:19:19 ERROR Failed to report &amp;#039;run&amp;#039; operation state to installer.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>Michael</name></author>
	</entry>
</feed>