Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] users Digest, Vol 2689, Issue 1
From: Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor (clifton.W.bryan_at_[hidden])
Date: 2013-09-25 14:02:36


-----Original Message-----
From: users [mailto:users-bounces_at_[hidden]] On Behalf Of users-request_at_[hidden]
Sent: Wednesday, September 25, 2013 11:00 AM
To: users_at_[hidden]
Subject: users Digest, Vol 2689, Issue 1

Send users mailing list submissions to
        users_at_[hidden]

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
        users-request_at_[hidden]

You can reach the person managing the list at
        users-owner_at_[hidden]

When replying, please edit your Subject line so it is more specific than "Re: Contents of users digest..."

Today's Topics:

   1. OpenMPI 1.6.3 problem
      (Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor)
   2. Re: OpenMPI 1.6.3 problem (Ralph Castain)
   3. Re: OpenMPI 1.6.3 problem (Jeff Squyres (jsquyres))

----------------------------------------------------------------------

Message: 1
Date: Tue, 24 Sep 2013 19:20:36 +0000
From: "Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor"
        <clifton.W.bryan_at_[hidden]>
To: "'users_at_[hidden]'" <users_at_[hidden]>
Subject: [OMPI users] OpenMPI 1.6.3 problem
Message-ID:
        <8CCCC747FD74954AB8E26B1F2EFBA6E2078E72C4_at_[hidden]>
Content-Type: text/plain; charset="us-ascii"

Hi,

We are having problems with OpenMPI 1.6.3 - it gives the below error message when trying to run:

$ mpirun -np 32 ./mpi_test.x

--------------------------------------------------------------------------

WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module

parameters:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages>

  Local host: akutilm-0006.ors.hpc.mil

  Registerable memory: 131072 MiB

  Total memory: 258542 MiB

Your MPI job will continue, but may be behave poorly and/or hang.

--------------------------------------------------------------------------

akutilm-0006.ors.hpc.mil

akutilm-0006.ors.hpc.mil

[akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low [akutilm-0006.ors.hpc.mil:10970] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Openmpi 1.4.3 works fine.

Any help would be greatly appreciated.

Thanks,

Clif

-------------- next part --------------
HTML attachment scrubbed and removed

------------------------------

Message: 2
Date: Tue, 24 Sep 2013 12:36:46 -0700
From: Ralph Castain <rhc_at_[hidden]>
To: Open MPI Users <users_at_[hidden]>
Subject: Re: [OMPI users] OpenMPI 1.6.3 problem
Message-ID: <B4DD6235-B7FD-42DE-9D9D-D15D82460524_at_[hidden]>
Content-Type: text/plain; charset="windows-1252"

Just to be clear - are you saying the job fails to run? Or just that it emits this warning (not error) and then runs to completion?

This is a warning we added at some point because jobs were hanging due to exhausting registered memory, and people didn't know why. If you check out the link, I believe we tell you how to turn off the warning if you are sure your system is correctly configured.

On Sep 24, 2013, at 12:20 PM, "Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor" <clifton.W.bryan_at_[hidden]> wrote:

> Hi,
>
> We are having problems with OpenMPI 1.6.3 ? it gives the below error message when trying to run:
>
>
> $ mpirun -np 32 ./mpi_test.x
>
> ----------------------------------------------------------------------
> ----
>
> WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.
>
>
> This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine.
>
>
> See this Open MPI FAQ item for more information on these Linux kernel
> module
>
> parameters:
>
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages>
>
>
> Local host: akutilm-0006.ors.hpc.mil
>
> Registerable memory: 131072 MiB
>
> Total memory: 258542 MiB
>
>
> Your MPI job will continue, but may be behave poorly and/or hang.
>
> ----------------------------------------------------------------------
> ----
>
> akutilm-0006.ors.hpc.mil
>
> akutilm-0006.ors.hpc.mil
>
> [akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help
> message help-mpi-btl-openib.txt / reg mem limit low
> [akutilm-0006.ors.hpc.mil:10970] Set MCA parameter
> "orte_base_help_aggregate" to 0 to see all help / error messages
>
>
> Openmpi 1.4.3 works fine.
>
>
> Any help would be greatly appreciated.
>
>
> Thanks,
>
> Clif
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-------------- next part --------------
HTML attachment scrubbed and removed

------------------------------

Message: 3
Date: Tue, 24 Sep 2013 19:38:50 +0000
From: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]>
To: Open MPI Users <users_at_[hidden]>
Subject: Re: [OMPI users] OpenMPI 1.6.3 problem
Message-ID:
        <EF66BBEB19BADC41AC8CCF5F684F07FC4F8C5C60_at_[hidden]>
Content-Type: text/plain; charset="Windows-1252"

Have you visited the URL that is cited? :-)

It talks all about the issue, and describes how to fix it. Let us know if there's something unclear in that FAQ text.

On Sep 24, 2013, at 3:20 PM, "Bryan, Clifton W ERDC-RDE-MSRC-MS Contractor" <clifton.W.bryan_at_[hidden]> wrote:

> Hi,
>
> We are having problems with OpenMPI 1.6.3 ? it gives the below error message when trying to run:
>
>
> $ mpirun -np 32 ./mpi_test.x
>
> ----------------------------------------------------------------------
> ----
>
> WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.
>
>
> This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine.
>
>
> See this Open MPI FAQ item for more information on these Linux kernel
> module
>
> parameters:
>
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> <http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages>
>
>
> Local host: akutilm-0006.ors.hpc.mil
>
> Registerable memory: 131072 MiB
>
> Total memory: 258542 MiB
>
>
> Your MPI job will continue, but may be behave poorly and/or hang.
>
> ----------------------------------------------------------------------
> ----
>
> akutilm-0006.ors.hpc.mil
>
> akutilm-0006.ors.hpc.mil
>
> [akutilm-0006.ors.hpc.mil:10970] 31 more processes have sent help
> message help-mpi-btl-openib.txt / reg mem limit low
> [akutilm-0006.ors.hpc.mil:10970] Set MCA parameter
> "orte_base_help_aggregate" to 0 to see all help / error messages
>
>
> Openmpi 1.4.3 works fine.
>
>
> Any help would be greatly appreciated.
>
>
> Thanks,
>
> Clif
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
------------------------------
Subject: Digest Footer
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
------------------------------
End of users Digest, Vol 2689, Issue 1
**************************************