Difference between revisions of "Rocks: Jobs deferred in torque/maui"
Jump to navigation
Jump to search
(Created page with "===== Job Deferred ===== * Run checkjob to see why the job is deferred <syntaxhighlight> checkjob job_id [root@fotcluster2 ~]# checkjob 3177 ... 'job is deferred. Reason: NoResources ...") |
(No difference)
|
Latest revision as of 10:00, 13 May 2013
Job Deferred
- Run checkjob to see why the job is deferred
checkjob job_id
[root@fotcluster2 ~]# checkjob 3177
...
'job is deferred. Reason: NoResources (cannot create reservation for job 3177 (intital reservation attempt)'
...- Diagnose the job
diagnose -j job_id
diagnose -q- Check the state of the compute nodes
checknode -v compute-0-0- Check the torque server logs
vi /opt/torque/server_logs/YYYYMMDD- Check the maui server logs
vi /opt/maui/log/maui.log- If you have added properties/resources to the pbs nodes file, ensure this has not been overwrote by 'rocks sync config'
pbsnodes -a [ensure correct host properties]
/opt/torque/server_priv/nodes- Problem solved? Re-queue the deferred job:
runjob -c jobid
releasehold jobidCannot set hostlist
- Experienced the following: job cannot be started - cannot set hostlist
- From rocks list: indicates that maui hasn't got the privileges it needs to do the scheduling
- Verify psb config and check
qmgr -c "print server"
# your looking for the following line:
set server managers = maui@clustername
set server managers += root@clustername- You can reset the queue configuration using the following:
/opt/torque/bin/qterm -t quick
/opt/torque/sbin/pbs_server -t create
(answer yes)
/opt/torque/bin/qmgr < /opt/torque/pbs.default
# /opt/torque/pbs.default contains the default setup for the pbs-roll.
# note: had to create server_priv/nodes files manually- Ended up being a problem with UPPER-CASE letters in the FQDN, avoid!
qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host:
LRC-PS8-RFHEAD
qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host:
LRC-PS8-RFHEAD