LSF Multicluster

Note, see the lsf.shared bug at bottom. Changes not maintained in PCM 3.0 (and previous)

Multicluster License

Ensure your license includes a line with lsf_multicluster, otherwise request from platform

FEATURE lsf_multicluster lsf_ld 7.000 31-JUL-2011 0 AD3E3C81D267A3C1C0B6 "Platform" DEMO

Configuration Files

The configuration below:

uses two PCM 3.0 clusters (pcm30, pcm-mctest)
pcm30 will forward jobs on to pcm-mctest
pcm-mctest will receive jobs from pcm30
All change were made in /etc/cfm/templates/lsf/

lsf.cluster (or default.lsf.cluster), two additions RemoteCluster and PRODUCTS

# Update the PRODUCT line to include multicluster
PRODUCTS=LSF_Base LSF_Manager LSF_MultiCluster

# drop in an the end of the file
# dp multicluster, note: names are the cluster names as defined by LSF (typically hostname_cluster1)
Begin RemoteClusters
CLUSTERNAME
pcm30_cluster1
pcm-mctest_cluster1
End RemoteClusters

lsf.shared (or default.lsf.shared)

# Note: replaced XXX_clustername_XXX with the cluster name. Guess this is ok provided the cluster name doesnt change
Begin Cluster
ClusterName             Servers
pcm30_cluster1          pcm30             
pcm-mctest_cluster1     pcm-mctest
End Cluster
###### NOTE PROBLEM, INFO NOT SYNCd AFTER ADDHOST -U ###########
###### SEE RESOLUTION at the bottom of this page     ###########

lsf.conf (or default.lsf.conf)

# Multicluster enable, append to end of file
MC_PLUGIN_REMOTE_RESOURCE=y

Multicluster Model

Job forwarding model

In this model, the cluster that is starving for resources sends jobs over to the cluster that has resources to spare. To work together, two clusters must set up compatible send-jobs and receive-jobs queues. With this model, scheduling of MultiCluster jobs is a process with two scheduling phases: the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it; then the execution cluster selects a suitable host and dispatches the job to it. This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.

Resource leasing model

In this model, the cluster that is starving for resources takes resources away from the cluster that has resources to spare. To work together, the provider cluster must export resources to the consumer, and the consumer cluster must configure a queue to use those resources. In this model, each cluster schedules work on a single system image, which includes both borrowed hosts and local hosts.

Choosing a model
- Consider your own goals and priorities when choosing the best resource-sharing model for your site.
- The job forwarding model can make resources available to jobs from multiple clusters, this flexibility allows maximum throughput when each clusters resource usage fluctuates.
- The resource leasing model can allow one cluster exclusive control of a dedicated resource, this can be more efficient when there is a steady amount of work.
- The lease model is the most transparent to users and supports the same scheduling features as a single cluster.
- The job forwarding model has a single point of administration, while the lease model shares administration between provider and consumer clusters.

Job Forwarding Model

lsb.queues (or lsbatch/default/configdir/lsb.queues)
On the host sending jobs, create a queue with SNDJOBS_TO (pcm30)

Begin Queue
QUEUE_NAME   = sendq
PRIORITY     = 40
HOSTS        = none
SNDJOBS_TO   = receiveq@pcm-mctest_cluster1
End Queue

On the cluster receiving jobs, create a queue with RCVJOBS_FROM (pcm-mctest)

Begin Queue
QUEUE_NAME   = receiveq
RCVJOBS_FROM = sendq@pcm30_cluster1
HOSTS        = all
End Queue

Resource Sharing Model

In this example, cluster pcmtest is exporting a single node to vhpchead
lsb.resources file on pcmtest

Begin HostExport
PER_HOST     = pcmcomp000               # export host list
SLOTS        = 12                       # for each host, export 5 job slots
DISTRIBUTION = [vhpchead_cluster1, 6]   # share distribution for remote clusters:
                                        # cluster <vhpchead_cluster1> has 6 shares, 
End HostExport

lsb.queues file on vhpchead

# resource borrow queue
Begin Queue
QUEUE_NAME   = resourceborrowq
PRIORITY     = 40
HOSTS        = compute005 pcmcomp000@pcm30_cluster1   # 2 hosts on this queue, one remote host pcmcomp000
DESCRIPTION  = Resource Borrow Queue
End Queue

Verify jobs are being run correctly

[root@pcmcomp000 ~]# bclusters 
[Job Forwarding Information ]
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS    
receiveq        recv       -          vhpchead_c ok        

[Resource Lease Information ]
REMOTE_CLUSTER  RESOURCE_FLOW   STATUS     
vhpchead_cluste EXPORT          ok        
# Check the hosts that are being exported: 
[root@pcmcomp000 ~]# bhosts -e
HOST_NAME             MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
pcmcomp000             12      3      3      0      0      0

Also check output from vhpchead

[david@vhpchead multicluster]$ bjobs 
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
21472   david   RUN   resourcele vhpchead    compute005  sleep 60   Aug  3 17:20
21474   david   RUN   resourcele vhpchead    compute005  sleep 60   Aug  3 17:20
21476   david   RUN   resourcele vhpchead    compute005  sleep 60   Aug  3 17:20
21473   david   RUN   resourcele vhpchead    pcmcomp000@ sleep 60   Aug  3 17:20
21475   david   RUN   resourcele vhpchead    pcmcomp000@ sleep 60   Aug  3 17:20
21477   david   PEND  resourcele vhpchead                sleep 60   Aug  3 17:20


[david@vhpchead multicluster]$ bclusters 
[Job Forwarding Information ]
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS    
sendq           send       receiveq   pcm30_clus ok        

[Resource Lease Information ]
REMOTE_CLUSTER  RESOURCE_FLOW   STATUS     
pcm30_cluster1  IMPORT          ok


[david@vhpchead ~]$ bhosts -w
HOST_NAME          STATUS          JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
compute000         ok              -     12      0      0      0      0      0
compute001         ok              -     12      0      0      0      0      0
compute002         ok              -     24      0      0      0      0      0
compute003         ok              -      8      0      0      0      0      0
compute004         ok              -      8      0      0      0      0      0
compute005         ok              -      8      0      0      0      0      0
compute007         ok              -      8      0      0      0      0      0
compute008         ok              -      8      0      0      0      0      0
compute009         ok              -      8      0      0      0      0      0
pcmcomp000@pcm30_cluster1 ok              -     12      0      0      0      0      0
vhpchead           ok              -      8      0      0      0      0      0

Update System

addhost -u
# Or if you edited files in /opt/lsf/conf
lsadmin reconfig
badmin mbdrestart

Note, if the configuration doesn't apply correctly run the lsadmin and badmin commands listed above to verify the configuration files (addhost -u does not report configuration errors correctly!)

Setup IPtables

The cluster LSF processes will try and communication, by default only ssh traffic is allowed on eth1, update iptables on both servers

# Generated by iptables-save v1.3.5 on Fri Jul  1 17:40:27 2011
*nat
:PREROUTING ACCEPT [1339:165189]
:POSTROUTING ACCEPT [205:14830]
:OUTPUT ACCEPT [516:36221]
-A POSTROUTING -o eth1 -j MASQUERADE 
COMMIT
# Completed on Fri Jul  1 17:40:27 2011
# Generated by iptables-save v1.3.5 on Fri Jul  1 17:40:27 2011
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [43133:352914090]
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 8080 -j ACCEPT 
-A INPUT -i eth0 -p tcp -m state --state NEW -m tcp --dport 8080 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 53 -j ACCEPT 
-A INPUT -i eth1 -p udp -m state --state NEW -m udp --dport 53 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 873 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 5432 -j ACCEPT 
# multicluster
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 7869 -j ACCEPT
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 6878 -j ACCEPT
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 6881 -j ACCEPT
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 6882 -j ACCEPT
# end multicluster
-A INPUT -i eth0 -j ACCEPT 
-A INPUT -i lo -j ACCEPT 
-A INPUT -p icmp -m icmp --icmp-type any -j ACCEPT 
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A INPUT -i eth1 -j REJECT --reject-with icmp-port-unreachable 
-A FORWARD -i eth0 -o eth1 -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A FORWARD -i eth0 -j ACCEPT 
COMMIT
# Completed on Fri Jul  1 17:40:27 2011

/etc/hosts - add clusters

On each node, add the cluster headnode to the hosts file (if not using external DNS which can resolve both hostname)
As these are cluster external hosts, add to /etc/hosts/append

# /etc/hosts.append on pcm-mctest
172.28.10.69   	pcm30.viglen.co.uk	pcm30

Then update the hosts file and sync across cluster

kusu-genconfig hosts > /etc/hosts
cfmsync -f

Check MultiCluster Status

Use bclusters and lsclusters
Status should be ok, if you see disc there may be some communication problems

root@pcm-mctest lsf]# bclusters 
[Job Forwarding Information ]
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS    
receiveq        recv       -          pcm30_clus ok        

[Resource Lease Information ]
No resources have been exported or borrowed

[root@pcm-mctest lsf]# lsclusters 
CLUSTER_NAME   STATUS   MASTER_HOST               ADMIN    HOSTS  SERVERS
pcm-mctest_clu ok       pcm-mctest             hpcadmin        2        2
pcm30_cluster1 ok       pcmtest.viglen.co.     hpcadmin        2        2

Move Files between Clusters

LSF will use lsrcp
Need to setup both cluster to use SSH for [lsrcp|rsh|rcp], replace/create link for the binaries on the headnode. Create these files in /etc/cfm/[compute-group]
All need to ensure ssh keys are setup between the clusters (passwordless access)

# Either change lsrcp:
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/lsrcp -> [scp] #mkdir /etc/cfm/compute-centos-5.6-x86_64/$LSF_BINDIR

# Or change rcp (the default lsrcp will fall back on rcp)
/usr/kerberos/bin/rsh -> [ssh]
/usr/kerberos/bin/rcp -> [scp]

# ssh keys
cat ~/.ssh/id_rsa.pub | ssh user@remote.machine.com 'cat >> .ssh/authorized_keys'

NOTE: Output files created on remote cluster at not automatically copied back

# Sample scripts that copied an input file across, and then all output files back.
#BSUB -q sendq
#BSUB -o sendq_output.%J.txt
#BSUB -e sendq_error.%J.txt
#BSUB -f "/home/david/test_input.inp > /home/david/copied_across.inp"
#BSUB -f "/home/david/result_copied.out < /home/david/result.out"
#BSUB -f "/home/david/sendq_output_copied.%J.txt < /home/david/sendq_output.%J.txt"
#BSUB -f "/home/david/sendq_error_copied.%J.txt < /home/david/sendq_error.%J.txt"

echo "hi"
hostname 
id
cat /home/david/copied_across.inp

hostname >> result.out
id >> result.out

sleep 30

lsf.shared bug

/etc/cfm/templates/default.lsf.shared Cluster section gets overwrote on sync. The following changes need to be made:

vi /opt/kusu/lib/plugins/genconfig/lsfshared_7_0_6.py

# Change from: 
 84             if re.compile("^End.*Cluster").search(instr):
 85                 inClusterSection = False
 86             else:
 87                 if re.compile("^ClusterName").search(instr):
 88                     pass
 89                 else:
 90                     if inClusterSection:
 91                         print clusterName
 92                         continue

# Change to:
 84             if re.compile("^End.*Cluster").search(instr):
 85                 inClusterSection = False
 86             else:
 87                 if re.compile("^ClusterName").search(instr):
 88                     pass
 89                 else:
 90                     if inClusterSection and re.compile("^XXX_clustername_XXX").search(instr): # <---- This line!
 91                         print clusterName
 92                         continue
 93 
 94             print instr,

Verify the update has been applied correctly:

kusu-genconfig lsfshared_7_0_6 'insert-cluster-name'
e.g: kusu-genconfig lsfshared_7_0_6 pcm30_cluster1

# once confirmed as updating correctly:
addhost -u