Subject: Re: [OMPI users] Segmentation fault in MPI_Finalize with IB hardware and memory manager.
From: guillaume ranquet (guillaume.ranquet_at_[hidden])
Date: 2010-06-02 11:14:28

I snipped some parts of the exchange and responding to 2 mails in this
one. (this may not be proper netiquette on this ML?)

On 06/02/2010 03:54 PM, Jeff Squyres wrote:
> What happens if you run:
> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self ~/bwlat/mpi_helloworld
> (i.e., MX support is still compiled in, but remove MX from the run-time)

sadly, exactly the same thing :(
it doesn't seems to disable MX (as the Error message is still there, I'm
just guessing, as I said I don't really know anything about MPI :-/).

$ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self
[] Error in mx_init (error No MX
device entry in /dev.)
Hello world from process 0 of 1
[bordeplage-9:32664] *** Process received signal ***
[bordeplage-9:32664] Signal: Segmentation fault (11)
[bordeplage-9:32664] Signal code: Address not mapped (1)
[bordeplage-9:32664] Failing at address: 0x7f8410a1b360
- --------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 32664 on node exited on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------

> I'm still guessing that there's some weird interaction between the memory management of those two plugins (MX and verbs). I don't know of anyone else who has this kind of configuration where it could be tested / debugged. :-(
> Per the above suggestion, let's see what happens if you run without MX and/or without openib via mpirun command line option. If that fixes the problem, that would mean you only have to change command line params when you run -- not have 2 OMPI installs. Additionally, you might be able to leave both plugins enabled but setenv the OMPI_MCA_memory_ptmalloc2_disable environment variable to 1; this will disable the OMPI memory management stuff. Note that this is not a normal MCA parameter -- you cannot set it on the command line or in a file; it *must* be set as an environment variable (for boring, technical reasons -- I can explain if you care).

I can also confirm that setting the OMPI_MCA_memory_ptmalloc2_disable
variable to 1 effectively solves the segfault problem.

On 06/02/2010 04:24 PM, Scott Atchley wrote:
> Does the same error happen if he tries on a MX host that does not have IB?
this node only has a myrinet card,

$ mpirun ~/bwlat/mpi_helloworld
warning:regcache incompatible with malloc
Hello world from process 0 of 1

note that this is with openmpi-1.4.1
