Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: guillaume ranquet (guillaume.ranquet_at_[hidden])
Date: 2010-06-02 11:14:28


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I snipped some parts of the exchange and responding to 2 mails in this
one. (this may not be proper netiquette on this ML?)

On 06/02/2010 03:54 PM, Jeff Squyres wrote:
> What happens if you run:
>
> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self ~/bwlat/mpi_helloworld
>
> (i.e., MX support is still compiled in, but remove MX from the run-time)

sadly, exactly the same thing :(
it doesn't seems to disable MX (as the Error message is still there, I'm
just guessing, as I said I don't really know anything about MPI :-/).

$ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self
~/bwlat/mpi_helloworld
[bordeplage-9.bordeaux.grid5000.fr:32664] Error in mx_init (error No MX
device entry in /dev.)
Hello world from process 0 of 1
[bordeplage-9:32664] *** Process received signal ***
[bordeplage-9:32664] Signal: Segmentation fault (11)
[bordeplage-9:32664] Signal code: Address not mapped (1)
[bordeplage-9:32664] Failing at address: 0x7f8410a1b360
- --------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 32664 on node
bordeplage-9.bordeaux.grid5000.fr exited on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------

> I'm still guessing that there's some weird interaction between the memory management of those two plugins (MX and verbs). I don't know of anyone else who has this kind of configuration where it could be tested / debugged. :-(
>
> Per the above suggestion, let's see what happens if you run without MX and/or without openib via mpirun command line option. If that fixes the problem, that would mean you only have to change command line params when you run -- not have 2 OMPI installs. Additionally, you might be able to leave both plugins enabled but setenv the OMPI_MCA_memory_ptmalloc2_disable environment variable to 1; this will disable the OMPI memory management stuff. Note that this is not a normal MCA parameter -- you cannot set it on the command line or in a file; it *must* be set as an environment variable (for boring, technical reasons -- I can explain if you care).

I can also confirm that setting the OMPI_MCA_memory_ptmalloc2_disable
variable to 1 effectively solves the segfault problem.

On 06/02/2010 04:24 PM, Scott Atchley wrote:
> Does the same error happen if he tries on a MX host that does not have IB?
this node only has a myrinet card,

$ mpirun ~/bwlat/mpi_helloworld
warning:regcache incompatible with malloc
Hello world from process 0 of 1

note that this is with openmpi-1.4.1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMBnVUAAoJEEzIl7PMEAliLqsIAOfUMffGmBVO2SOadd+roQ3x
HuqV6N0lhaevO4D1LPsyE6Q+mUtCWrvDgnIkJoBj0q7zAZvzGKxJM42cVNGFkAUp
3Xaz8oKwW3kZh8JyKLF9+sueuhEeBUhDxjr/25p0P7t2dOP0JeUnscky3hRFipM8
I9zg5LbOi3DusJ6H81nnttNcQYGtrnZSsJxoRfPKZK+51uyNOt9tfgKzzlh2DJBw
ddh0OP4cvWoqF3LcLGWBMfebZ16lo9iC8OIZ5xfyvQzVYKXjfX9E25eHH4DARD0j
Dc6UOvC3G7oqT4k02AYFmVNNou4423sfJ/27dkX+1+d06A2rb6Npg72ImNPD9Us=
=LxwM
-----END PGP SIGNATURE-----