Rocks: Torque 5.4 - Rocks 5.4

From Define Wiki
Jump to navigation Jump to search

Torque 5.4 and Rock 5.4

  • Possible bug with Torque 5.4 ?
  The following were observed after Torque 5.4 installation on Rocks 5.4:

  On installing compute node, compute node name were added to /opt/torque/server_priv/nodes file
  once the compute node received the rocks installation image. The node name added without np=XX
  associated with the compute node.

  ie:
     compute-0-0

  The correct entry should be:
     compute-0-0 np=8

  Without np=xx associated with compute-0-0 in the nodes file, pbsnodes list it with only 1 core!

  To generates the /opt/torque/server_priv/nodes file correctly (after node installation), runs:

      rocks sync config
  • Job Subbmited and just get queued and will not run/execute
   - checkjob <jobid> shows no resource available. 
   - pbsnodes show nodes are free
   
   Cause:
   - this was caused by in /opt/torque/pbs.default, server name in uppper case:
     set server managers = maui@LRC-PS8-RFHEAD.SEEC.LOCAL
     set server managers += root@LRC-PS8-RFHEAD.SEEC.LOCAL

   Change it to:
     set server managers = maui@lrc-ps8-rfhead.seec.local
     set server managers += root@lrc-ps8-rfhead.seec.local
  
   job should submited and ran/execute as expected