Rocks: Jobs deferred in torque/maui

From Define Wiki
Jump to navigation Jump to search
Job Deferred
  • Run checkjob to see why the job is deferred
checkjob job_id

[root@fotcluster2 ~]# checkjob 3177
...
'job is deferred.  Reason:  NoResources  (cannot create reservation for job 3177 (intital reservation attempt)'
...
  • Diagnose the job
diagnose -j job_id
diagnose -q
  • Check the state of the compute nodes
checknode -v compute-0-0
  • Check the torque server logs
vi /opt/torque/server_logs/YYYYMMDD
  • Check the maui server logs
vi /opt/maui/log/maui.log
  • If you have added properties/resources to the pbs nodes file, ensure this has not been overwrote by 'rocks sync config'
pbsnodes -a [ensure correct host properties]
/opt/torque/server_priv/nodes
  • Problem solved? Re-queue the deferred job:
runjob -c jobid
releasehold jobid
Cannot set hostlist
  • Experienced the following: job cannot be started - cannot set hostlist
  • From rocks list: indicates that maui hasn't got the privileges it needs to do the scheduling
  • Verify psb config and check
qmgr -c "print server"
# your looking for the following line:
set server managers = maui@clustername
set server managers += root@clustername
  • You can reset the queue configuration using the following:
/opt/torque/bin/qterm -t quick
/opt/torque/sbin/pbs_server -t create
(answer yes)
/opt/torque/bin/qmgr < /opt/torque/pbs.default

# /opt/torque/pbs.default contains the default setup for the pbs-roll.
# note: had to create server_priv/nodes files manually
  • Ended up being a problem with UPPER-CASE letters in the FQDN, avoid!
qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host:
LRC-PS8-RFHEAD
qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host:
LRC-PS8-RFHEAD