Slurm: Kill jobs in SLURM when memory usage exceeds requested amount
Jump to navigation
Jump to search
I finally manage to get jobs to be terminated after exceeding their memory location. Here is the configuration I used
slurm.conf
EnforcePartLimits=ALL
TaskPlugin=task/cgroup
JobAcctGatherType=jobacct_gather/cgroup
SelectTypeParameters=CR_CPU_Memory
MemLimitEnforce=yes
KillOnBadExit=1cgroup.conf
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
TaskAffinity=no
MaxSwapPercent=10Running a job that simply allocates RAM in a loop:
#! /bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=1024MB
./bigmem 100000Produced the following error once, the job exceeded 1GB RSS:
slurmstepd: error: Detected 1 oom-kill event(s) in step 125.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.