Slurm: Jobs fail with lmod not found error

From Define Wiki
Jump to navigation Jump to search

Problem

With even a basic script, the job may fail with a message stating lmod does not exist.

Example Script

#!/bin/bash

#SBATCH --time=30
#SBATCH -o out
#SBATCH -e err

module load bonnie++

Output in error file

[Jon@head-Boston scripts]$ cat err 
/cm/local/apps/slurm/var/spool/job01047/slurm_script: /usr/share/lmod/lmod/libexec/lmod: No such file or directory

Cause

  • When executing module load the head node calls /usr/share/lmod/lmod/libexec/lmod to load the module files
  • One the compute nodes, module load calls /cm/local/apps/environment-modules/current/bin/modulecmd and lmod is not present

Resolution

  • It can be seen that module uses environment variable LMOD_CMD:
[Jon@head-Boston scripts]$ type module 
module is a function
module () 
{ 
    eval $($LMOD_CMD bash "$@");
    [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
[Jon@head-Boston scripts]$ echo $LMOD_CMD
/usr/share/lmod/lmod/libexec/lmod
[Jon@head-Boston scripts]$ export LMOD_CMD=
[Jon@head-Boston scripts]$ type module 
module is a function
module () 
{ 
    eval $($LMOD_CMD bash "$@");
    [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
  • And, by default, LMOD_CMD points to /usr/share/lmod/lmod/libexec/lmod:
[Jon@head-Boston scripts]$ echo $LMOD_CMD
/usr/share/lmod/lmod/libexec/lmod
  • Set LMOD_CMD to use /cm/local/apps/environment-modules/current/bin/modulecmd which is present on both head and compute nodes:
export LMOD_CMD=/cm/local/apps/environment-modules/current/bin/modulecmd