Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI 1.6.3 problem
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-09-24 15:36:46


Just to be clear - are you saying the job fails to run? Or just that it emits this warning (not error) and then runs to completion?

This is a warning we added at some point because jobs were hanging due to exhausting registered memory, and people didn't know why. If you check out the link, I believe we tell you how to turn off the warning if you are sure your system is correctly configured.

On Sep 24, 2013, at 12:20 PM, "Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor" <clifton.W.bryan_at_[hidden]> wrote:

> Hi,
>
> We are having problems with OpenMPI 1.6.3 – it gives the below error message when trying to run:
>
>
> $ mpirun -np 32 ./mpi_test.x
>
> --------------------------------------------------------------------------
>
> WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.
>
>
> This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine.
>
>
> See this Open MPI FAQ item for more information on these Linux kernel module
>
> parameters:
>
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages>
>
>
> Local host: akutilm-0006.ors.hpc.mil
>
> Registerable memory: 131072 MiB
>
> Total memory: 258542 MiB
>
>
> Your MPI job will continue, but may be behave poorly and/or hang.
>
> --------------------------------------------------------------------------
>
> akutilm-0006.ors.hpc.mil
>
> akutilm-0006.ors.hpc.mil
>
> [akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [akutilm-0006.ors.hpc.mil:10970] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>
>
> Openmpi 1.4.3 works fine.
>
>
> Any help would be greatly appreciated.
>
>
> Thanks,
>
> Clif
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users