Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: guillaume ranquet (guillaume.ranquet_at_[hidden])
Date: 2010-06-02 11:14:28


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I snipped some parts of the exchange and responding to 2 mails in this
one. (this may not be proper netiquette on this ML?)

On 06/02/2010 03:54 PM, Jeff Squyres wrote:
> What happens if you run:
>
> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self ~/bwlat/mpi_helloworld
>
> (i.e., MX support is still compiled in, but remove MX from the run-time)

sadly, exactly the same thing :(
it doesn't seems to disable MX (as the Error message is still there, I'm
just guessing, as I said I don't really know anything about MPI :-/).

$ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self
~/bwlat/mpi_helloworld
[bordeplage-9.bordeaux.grid5000.fr:32664] Error in mx_init (error No MX
device entry in /dev.)
Hello world from process 0 of 1
[bordeplage-9:32664] *** Process received signal ***
[bordeplage-9:32664] Signal: Segmentation fault (11)
[bordeplage-9:32664] Signal code: Address not mapped (1)
[bordeplage-9:32664] Failing at address: 0x7f8410a1b360
- --------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 32664 on node
bordeplage-9.bordeaux.grid5000.fr exited on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------

> I'm still guessing that there's some weird interaction between the memory management of those two plugins (MX and verbs). I don't know of anyone else who has this kind of configuration where it could be tested / debugged. :-(
>
> Per the above suggestion, let's see what happens if you run without MX and/or without openib via mpirun command line option. If that fixes the problem, that would mean you only have to change command line params when you run -- not have 2 OMPI installs. Additionally, you might be able to leave both plugins enabled but setenv the OMPI_MCA_memory_ptmalloc2_disable environment variable to 1; this will disable the OMPI memory management stuff. Note that this is not a normal MCA parameter -- you cannot set it on the command line or in a file; it *must* be set as an environment variable (for boring, technical reasons -- I can explain if you care).

I can also confirm that setting the OMPI_MCA_memory_ptmalloc2_disable
variable to 1 effectively solves the segfault problem.

On 06/02/2010 04:24 PM, Scott Atchley wrote:
> Does the same error happen if he tries on a MX host that does not have IB?
this node only has a myrinet card,

$ mpirun ~/bwlat/mpi_helloworld
warning:regcache incompatible with malloc
Hello world from process 0 of 1

note that this is with openmpi-1.4.1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMBnVUAAoJEEzIl7PMEAliLqsIAOfUMffGmBVO2SOadd+roQ3x
HuqV6N0lhaevO4D1LPsyE6Q+mUtCWrvDgnIkJoBj0q7zAZvzGKxJM42cVNGFkAUp
3Xaz8oKwW3kZh8JyKLF9+sueuhEeBUhDxjr/25p0P7t2dOP0JeUnscky3hRFipM8
I9zg5LbOi3DusJ6H81nnttNcQYGtrnZSsJxoRfPKZK+51uyNOt9tfgKzzlh2DJBw
ddh0OP4cvWoqF3LcLGWBMfebZ16lo9iC8OIZ5xfyvQzVYKXjfX9E25eHH4DARD0j
Dc6UOvC3G7oqT4k02AYFmVNNou4423sfJ/27dkX+1+d06A2rb6Npg72ImNPD9Us=
=LxwM
-----END PGP SIGNATURE-----