Linux: Memory Limits on Mellonox OFED
Jump to navigation
Jump to search
The Problem
By default OFED only allows a small amount of memory to be used. This can result in warnings when running MPI programs
>>>> WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.
>>>>
>>>> This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine.
>>>>
>>>> See this Open MPI FAQ item for more information on these Linux
>>>> kernel module
>>>> parameters:
>>>>
>>>>
>>>> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>>>>
>>>> Local host: compute022
>>>> Registerable memory: 4096 MiB
>>>> Total memory: 65503 MiB
>>>>
>>>> Your MPI job will continue, but may be behave poorly and/or hang.
>>>> -------------------------------------------------------------------
>>>> -The Solution
The solution is to change the parameters as shown in this FAQ:
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
To change the mlx4_core parameters modify /etc/modprobe.d/mlx4_en.conf:
options mlx4_core pfctx=0 pfcrx=0 log_mtts_per_seg=5