LSF Multicluster

From Define Wiki
Revision as of 09:31, 1 May 2013 by Michael (talk | contribs) (Created page with "* Note, see the lsf.shared bug at bottom. Changes not maintained in PCM 3.0 (and previous) ===== Multicluster License ===== * Ensure your license includes a line with lsf_mul...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
  • Note, see the lsf.shared bug at bottom. Changes not maintained in PCM 3.0 (and previous)
Multicluster License
  • Ensure your license includes a line with lsf_multicluster, otherwise request from platform
FEATURE lsf_multicluster lsf_ld 7.000 31-JUL-2011 0 AD3E3C81D267A3C1C0B6 "Platform" DEMO
Configuration Files

The configuration below:

  • uses two PCM 3.0 clusters (pcm30, pcm-mctest)
  • pcm30 will forward jobs on to pcm-mctest
  • pcm-mctest will receive jobs from pcm30
  • All change were made in /etc/cfm/templates/lsf/


  • lsf.cluster (or default.lsf.cluster), two additions RemoteCluster and PRODUCTS
# Update the PRODUCT line to include multicluster
PRODUCTS=LSF_Base LSF_Manager LSF_MultiCluster
# drop in an the end of the file
# dp multicluster, note: names are the cluster names as defined by LSF (typically hostname_cluster1)
Begin RemoteClusters
CLUSTERNAME
pcm30_cluster1
pcm-mctest_cluster1
End RemoteClusters
  • lsf.shared (or default.lsf.shared)
# Note: replaced XXX_clustername_XXX with the cluster name. Guess this is ok provided the cluster name doesnt change
Begin Cluster
ClusterName             Servers
pcm30_cluster1          pcm30             
pcm-mctest_cluster1     pcm-mctest
End Cluster
###### NOTE PROBLEM, INFO NOT SYNCd AFTER ADDHOST -U ###########
###### SEE RESOLUTION at the bottom of this page     ###########
  • lsf.conf (or default.lsf.conf)
# Multicluster enable, append to end of file
MC_PLUGIN_REMOTE_RESOURCE=y
Multicluster Model
  • Job forwarding model

In this model, the cluster that is starving for resources sends jobs over to the cluster that has resources to spare. To work together, two clusters must set up compatible send-jobs and receive-jobs queues. With this model, scheduling of MultiCluster jobs is a process with two scheduling phases: the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it; then the execution cluster selects a suitable host and dispatches the job to it. This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.

  • Resource leasing model

In this model, the cluster that is starving for resources takes resources away from the cluster that has resources to spare. To work together, the provider cluster must export resources to the consumer, and the consumer cluster must configure a queue to use those resources. In this model, each cluster schedules work on a single system image, which includes both borrowed hosts and local hosts.

  • Choosing a model
    • Consider your own goals and priorities when choosing the best resource-sharing model for your site.
    • The job forwarding model can make resources available to jobs from multiple clusters, this flexibility allows maximum throughput when each clusters resource usage fluctuates.
    • The resource leasing model can allow one cluster exclusive control of a dedicated resource, this can be more efficient when there is a steady amount of work.
    • The lease model is the most transparent to users and supports the same scheduling features as a single cluster.
    • The job forwarding model has a single point of administration, while the lease model shares administration between provider and consumer clusters.
Job Forwarding Model
  • lsb.queues (or lsbatch/default/configdir/lsb.queues)
  • On the host sending jobs, create a queue with SNDJOBS_TO (pcm30)
Begin Queue
QUEUE_NAME   = sendq
PRIORITY     = 40
HOSTS        = none
SNDJOBS_TO   = receiveq@pcm-mctest_cluster1
End Queue
  • On the cluster receiving jobs, create a queue with RCVJOBS_FROM (pcm-mctest)
Begin Queue
QUEUE_NAME   = receiveq
RCVJOBS_FROM = sendq@pcm30_cluster1
HOSTS        = all
End Queue
Resource Sharing Model
  • In this example, cluster pcmtest is exporting a single node to vhpchead
  • lsb.resources file on pcmtest
Begin HostExport
PER_HOST     = pcmcomp000               # export host list
SLOTS        = 12                       # for each host, export 5 job slots
DISTRIBUTION = [vhpchead_cluster1, 6]   # share distribution for remote clusters:
                                        # cluster <vhpchead_cluster1> has 6 shares, 
End HostExport
  • lsb.queues file on vhpchead
# resource borrow queue
Begin Queue
QUEUE_NAME   = resourceborrowq
PRIORITY     = 40
HOSTS        = compute005 pcmcomp000@pcm30_cluster1   # 2 hosts on this queue, one remote host pcmcomp000
DESCRIPTION  = Resource Borrow Queue
End Queue
  • Verify jobs are being run correctly
[root@pcmcomp000 ~]# bclusters 
[Job Forwarding Information ]
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS    
receiveq        recv       -          vhpchead_c ok        

[Resource Lease Information ]
REMOTE_CLUSTER  RESOURCE_FLOW   STATUS     
vhpchead_cluste EXPORT          ok        
# Check the hosts that are being exported: 
[root@pcmcomp000 ~]# bhosts -e
HOST_NAME             MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
pcmcomp000             12      3      3      0      0      0
  • Also check output from vhpchead
[david@vhpchead multicluster]$ bjobs 
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
21472   david   RUN   resourcele vhpchead    compute005  sleep 60   Aug  3 17:20
21474   david   RUN   resourcele vhpchead    compute005  sleep 60   Aug  3 17:20
21476   david   RUN   resourcele vhpchead    compute005  sleep 60   Aug  3 17:20
21473   david   RUN   resourcele vhpchead    pcmcomp000@ sleep 60   Aug  3 17:20
21475   david   RUN   resourcele vhpchead    pcmcomp000@ sleep 60   Aug  3 17:20
21477   david   PEND  resourcele vhpchead                sleep 60   Aug  3 17:20


[david@vhpchead multicluster]$ bclusters 
[Job Forwarding Information ]
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS    
sendq           send       receiveq   pcm30_clus ok        

[Resource Lease Information ]
REMOTE_CLUSTER  RESOURCE_FLOW   STATUS     
pcm30_cluster1  IMPORT          ok


[david@vhpchead ~]$ bhosts -w
HOST_NAME          STATUS          JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
compute000         ok              -     12      0      0      0      0      0
compute001         ok              -     12      0      0      0      0      0
compute002         ok              -     24      0      0      0      0      0
compute003         ok              -      8      0      0      0      0      0
compute004         ok              -      8      0      0      0      0      0
compute005         ok              -      8      0      0      0      0      0
compute007         ok              -      8      0      0      0      0      0
compute008         ok              -      8      0      0      0      0      0
compute009         ok              -      8      0      0      0      0      0
pcmcomp000@pcm30_cluster1 ok              -     12      0      0      0      0      0
vhpchead           ok              -      8      0      0      0      0      0
Update System
addhost -u
# Or if you edited files in /opt/lsf/conf
lsadmin reconfig
badmin mbdrestart
  • Note, if the configuration doesn't apply correctly run the lsadmin and badmin commands listed above to verify the configuration files (addhost -u does not report configuration errors correctly!)
Setup IPtables
  • The cluster LSF processes will try and communication, by default only ssh traffic is allowed on eth1, update iptables on both servers
# Generated by iptables-save v1.3.5 on Fri Jul  1 17:40:27 2011
*nat
:PREROUTING ACCEPT [1339:165189]
:POSTROUTING ACCEPT [205:14830]
:OUTPUT ACCEPT [516:36221]
-A POSTROUTING -o eth1 -j MASQUERADE 
COMMIT
# Completed on Fri Jul  1 17:40:27 2011
# Generated by iptables-save v1.3.5 on Fri Jul  1 17:40:27 2011
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [43133:352914090]
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 8080 -j ACCEPT 
-A INPUT -i eth0 -p tcp -m state --state NEW -m tcp --dport 8080 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 53 -j ACCEPT 
-A INPUT -i eth1 -p udp -m state --state NEW -m udp --dport 53 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 873 -j ACCEPT 
-A INPUT -i eth1 -p tcp -m state --state NEW -m tcp --dport 5432 -j ACCEPT 
# multicluster
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 7869 -j ACCEPT
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 6878 -j ACCEPT
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 6881 -j ACCEPT
-A INPUT -i eth1 --source 172.28.10.0/24 -p tcp -m state --state NEW -m tcp --dport 6882 -j ACCEPT
# end multicluster
-A INPUT -i eth0 -j ACCEPT 
-A INPUT -i lo -j ACCEPT 
-A INPUT -p icmp -m icmp --icmp-type any -j ACCEPT 
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A INPUT -i eth1 -j REJECT --reject-with icmp-port-unreachable 
-A FORWARD -i eth0 -o eth1 -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A FORWARD -i eth0 -j ACCEPT 
COMMIT
# Completed on Fri Jul  1 17:40:27 2011
/etc/hosts - add clusters
  • On each node, add the cluster headnode to the hosts file (if not using external DNS which can resolve both hostname)
  • As these are cluster external hosts, add to /etc/hosts/append
# /etc/hosts.append on pcm-mctest
172.28.10.69   	pcm30.viglen.co.uk	pcm30
  • Then update the hosts file and sync across cluster
kusu-genconfig hosts > /etc/hosts
cfmsync -f
Check MultiCluster Status
  • Use bclusters and lsclusters
  • Status should be ok, if you see disc there may be some communication problems
root@pcm-mctest lsf]# bclusters 
[Job Forwarding Information ]
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS    
receiveq        recv       -          pcm30_clus ok        

[Resource Lease Information ]
No resources have been exported or borrowed
[root@pcm-mctest lsf]# lsclusters 
CLUSTER_NAME   STATUS   MASTER_HOST               ADMIN    HOSTS  SERVERS
pcm-mctest_clu ok       pcm-mctest             hpcadmin        2        2
pcm30_cluster1 ok       pcmtest.viglen.co.     hpcadmin        2        2
Move Files between Clusters
  • LSF will use lsrcp
  • Need to setup both cluster to use SSH for [lsrcp|rsh|rcp], replace/create link for the binaries on the headnode. Create these files in /etc/cfm/[compute-group]
  • All need to ensure ssh keys are setup between the clusters (passwordless access)
# Either change lsrcp:
/opt/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/lsrcp -> [scp] #mkdir /etc/cfm/compute-centos-5.6-x86_64/$LSF_BINDIR

# Or change rcp (the default lsrcp will fall back on rcp)
/usr/kerberos/bin/rsh -> [ssh]
/usr/kerberos/bin/rcp -> [scp]

# ssh keys
cat ~/.ssh/id_rsa.pub | ssh user@remote.machine.com 'cat >> .ssh/authorized_keys'
  • NOTE: Output files created on remote cluster at not automatically copied back
# Sample scripts that copied an input file across, and then all output files back.
#BSUB -q sendq
#BSUB -o sendq_output.%J.txt
#BSUB -e sendq_error.%J.txt
#BSUB -f "/home/david/test_input.inp > /home/david/copied_across.inp"
#BSUB -f "/home/david/result_copied.out < /home/david/result.out"
#BSUB -f "/home/david/sendq_output_copied.%J.txt < /home/david/sendq_output.%J.txt"
#BSUB -f "/home/david/sendq_error_copied.%J.txt < /home/david/sendq_error.%J.txt"

echo "hi"
hostname 
id
cat /home/david/copied_across.inp

hostname >> result.out
id >> result.out

sleep 30
lsf.shared bug

/etc/cfm/templates/default.lsf.shared Cluster section gets overwrote on sync. The following changes need to be made:

vi /opt/kusu/lib/plugins/genconfig/lsfshared_7_0_6.py

# Change from: 
 84             if re.compile("^End.*Cluster").search(instr):
 85                 inClusterSection = False
 86             else:
 87                 if re.compile("^ClusterName").search(instr):
 88                     pass
 89                 else:
 90                     if inClusterSection:
 91                         print clusterName
 92                         continue

# Change to:
 84             if re.compile("^End.*Cluster").search(instr):
 85                 inClusterSection = False
 86             else:
 87                 if re.compile("^ClusterName").search(instr):
 88                     pass
 89                 else:
 90                     if inClusterSection and re.compile("^XXX_clustername_XXX").search(instr): # <---- This line!
 91                         print clusterName
 92                         continue
 93 
 94             print instr,
  • Verify the update has been applied correctly:
kusu-genconfig lsfshared_7_0_6 'insert-cluster-name'
e.g: kusu-genconfig lsfshared_7_0_6 pcm30_cluster1

# once confirmed as updating correctly:
addhost -u